# Session 6 — Part 2: Analytics & Reporting Layer
Connect curated data to **BI tools**, design **semantic models** and **data marts**, and optimize refresh, cost, and security.

## 🧱 1️⃣ The Analytics Layer in the Stack
```
Curated → Data Mart → BI / Semantic Model → Dashboards / Alerts / APIs
```
This is the interface between data engineering and decision‑makers.

## 📊 2️⃣ BI & Reporting Tools Landscape
| Category | Tools | Notes |
|---|---|---|
| Enterprise BI | Power BI, Tableau, Qlik | Rich visuals, governance, RLS |
| Cloud‑native | QuickSight, Looker, Data Studio | Tight DW integration |
| Open Source | Superset, Metabase, Redash | Self‑hosted portals |
| Embedded/APIs | Mode, Plotly Dash | Product analytics / custom apps |
| Notebooks | Jupyter, Databricks SQL, Hex | Ad‑hoc + reproducible analysis |

## 🧠 3️⃣ How Data Engineers Enable BI
1. **Design data marts** (star schemas) for subject areas.
2. **Define semantic models**: measures (e.g., `total_sales`) & dimensions (date, region).
3. **Pre‑aggregate** heavy queries (materialized tables/extracts).
4. **Automate refresh** tied to pipeline completes.
5. **Secure** with RLS/OLS + audit logs.

## 📦 4️⃣ Analytical Model Design
**Star Schema:**
```
FactSales → DimCustomer, DimProduct, DimRegion, DimDate
```
| Table | Metrics/Keys | Linked Dims |
|---|---|---|
| FactSales | sales_amount, qty, customer_id, product_id | Customer, Product, Region, Date |

## ⚙️ 5️⃣ Example: Transform → Mart → Dashboard
**Retail case:**
1) Curated fact/dim built in Part 1 → 2) Create `sales_by_region_month` summary → 3) Publish to BI

**SQL pre‑aggregation:**
```sql
CREATE OR REPLACE TABLE mart.sales_by_region_month AS
SELECT date_trunc('month', order_date) AS month,
       region,
       SUM(amount) AS total_sales,
       COUNT(*) AS orders
FROM curated.fact_sales
GROUP BY 1,2;
```
Export this table/view to your BI workspace and set a scheduled refresh.

## 🚀 6️⃣ Performance & Cost Optimization in BI
| Technique | Why | Example |
|---|---|---|
| Incremental Refresh | Avoids full reloads | Partition by day/month |
| Columnar Storage | Faster + smaller | Parquet, VertiPaq |
| Caching/Extracts | Low‑latency reads | Tableau Extracts, SPICE |
| Model Pruning | Remove unused cols | Lean semantic model |
| DirectQuery vs Import | Balance freshness vs speed | Power BI modes |


## 🔐 7️⃣ Access Control & Governance in Reporting
- **Row‑Level Security (RLS):** restrict by region/department.
- **Object‑Level Security (OLS):** hide sensitive columns.
- **SSO & Roles:** Azure AD / IAM integration.
- **Auditing:** who viewed/exported what and when.

## 🌐 8️⃣ Warehouse ↔ BI Connectivity
| Warehouse | BI Integration | Notes |
|---|---|---|
| Synapse / SQL DB | Power BI DirectQuery | AAD passthrough |
| Redshift | QuickSight | Federated queries |
| BigQuery | Looker / Data Studio | LookML semantic layer |
| Snowflake | Tableau / Power BI | ODBC/JDBC connectors |

## 🖼️ 9️⃣ Visual — Analytics Serving Flow

In [None]:
import matplotlib.pyplot as plt
from matplotlib.patches import FancyBboxPatch

BG = '#e6f0ff'; FILL = '#e6f0ff'; EDGE = '#2563eb'; TXT = '#111827'
W, H, GAP, PAD = 0.2, 0.22, 0.06, 0.01
Y0 = 0.39
labels = [
    ('Curated', 'Facts & Dims'),
    ('Data Mart', 'Sales/Finance/CS'),
    ('Semantic Model', 'Measures + Dim'),
    ('BI', 'Dashboards/Alerts')
]
fig, ax = plt.subplots(figsize=(11, 3.8))
fig.patch.set_facecolor(BG); ax.set_facecolor(BG); ax.set_axis_off()
ax.set_xlim(0,1); ax.set_ylim(0,1)
total_w = len(labels)*W + (len(labels)-1)*GAP
x0 = (1-total_w)/2
xs = [x0 + i*(W+GAP) for i in range(len(labels))]
def box(x, t, s):
    r = FancyBboxPatch((x, Y0), W, H, boxstyle='round,pad=0.02,rounding_size=10', fc=FILL, ec=EDGE, lw=1.6)
    ax.add_patch(r)
    ax.text(x+W/2, Y0+H*0.62, t, ha='center', va='center', fontsize=10, color=TXT, fontweight='bold')
    ax.text(x+W/2, Y0+H*0.36, s, ha='center', va='center', fontsize=9, color=TXT)
for (t,s), x in zip(labels, xs):
    box(x,t,s)
y = Y0 + H/2
for i in range(len(xs)-1):
    ax.annotate('', xy=(xs[i+1]-PAD, y), xytext=(xs[i]+W+PAD, y),
                arrowprops=dict(arrowstyle='->', lw=2, color='#4b5563', mutation_scale=12))
plt.tight_layout(); plt.show()


## 🧩 🔟 Case Studies
- **Global Retail:** Glue ETL → Redshift → QuickSight; *30‑minute freshness*.
- **FinTech:** ADF → Synapse → Power BI; *real‑time payment risk*.
- **Healthcare:** IoT Hub → Databricks → Delta → Tableau; *patient risk dashboard*.

## 💡 1️⃣1️⃣ Practice
1) Design a star schema for a domain.
2) Build a monthly sales aggregation table and publish to BI.
3) Implement RLS for region‑scoped visibility.
4) Track dashboard refresh metrics in your monitoring layer.