# AI/BI Dashboard & Genie Space

**Databricks Solutions Architecture Demo**

This notebook programmatically creates:

1. **AI/BI Dashboard** — A Databricks native dashboard with pre-built visualizations on Gold tables
2. **Genie Space** — A natural-language Q&A interface for business users to self-serve analytics

### Databricks features demonstrated
- **Databricks AI/BI Dashboards** — code-driven dashboard definitions
- **Genie Spaces** — natural-language analytics for non-technical users
- **Databricks SDK** for programmatic workspace asset creation
- **SQL-first approach** — all widgets backed by SQL queries on Gold tables

### Prerequisites
Run notebooks 01–07 first.

## 1 — Configuration

In [None]:
%pip install databricks-sdk --upgrade --quiet
dbutils.library.restartPython()

In [None]:
from databricks.sdk import WorkspaceClient
from pyspark.sql import functions as F

w = WorkspaceClient()

CATALOG = spark.catalog.currentCatalog()
GOLD    = f"{CATALOG}.retail_gold"

current_user = spark.sql("SELECT current_user()").collect()[0][0]
DASHBOARD_PARENT = f"/Users/{current_user}"

print(f"Catalog    : {CATALOG}")
print(f"Gold schema: {GOLD}")
print(f"Dashboard  : {DASHBOARD_PARENT}")

---
## 2 — Define Dashboard Queries

Each query powers one widget in the AI/BI Dashboard.

In [None]:
dashboard_queries = {
    # ── KPI Cards ──
    "kpi_total_revenue": f"""
        SELECT ROUND(SUM(net_revenue), 0) AS total_revenue,
               SUM(num_orders) AS total_orders,
               ROUND(SUM(total_profit), 0) AS total_profit,
               ROUND(SUM(total_profit) / SUM(net_revenue) * 100, 1) AS overall_margin_pct
        FROM {GOLD}.gold_daily_sales
    """,

    # ── Revenue Trend (Line Chart) ──
    "revenue_trend_monthly": f"""
        SELECT year_month, region,
               ROUND(SUM(net_revenue), 0) AS net_revenue
        FROM {GOLD}.gold_monthly_sales
        GROUP BY year_month, region
        ORDER BY year_month
    """,

    # ── Revenue by Region (Bar Chart) ──
    "revenue_by_region": f"""
        SELECT region,
               ROUND(SUM(net_revenue), 0) AS net_revenue,
               ROUND(SUM(total_profit), 0) AS total_profit,
               ROUND(SUM(total_profit) / SUM(net_revenue) * 100, 1) AS margin_pct
        FROM {GOLD}.gold_daily_sales
        GROUP BY region
        ORDER BY net_revenue DESC
    """,

    # ── Customer Segment Distribution (Pie) ──
    "customer_segments": f"""
        SELECT rfm_segment,
               COUNT(*) AS customers,
               ROUND(SUM(monetary), 0) AS total_ltv
        FROM {GOLD}.gold_customer_rfm
        GROUP BY rfm_segment
        ORDER BY total_ltv DESC
    """,

    # ── Top Brands (Horizontal Bar) ──
    "top_brands": f"""
        SELECT brand,
               ROUND(SUM(net_revenue), 0) AS net_revenue,
               ROUND(AVG(profit_margin_pct), 1) AS avg_margin_pct
        FROM {GOLD}.gold_product_performance
        GROUP BY brand
        ORDER BY net_revenue DESC
        LIMIT 15
    """,

    # ── Shipping Performance (Table) ──
    "shipping_performance": f"""
        SELECT ship_mode,
               SUM(total_shipments) AS shipments,
               ROUND(AVG(avg_delivery_delay_days), 1) AS avg_delay,
               ROUND(SUM(total_shipments * on_time_pct) / SUM(total_shipments), 1) AS on_time_pct,
               ROUND(SUM(net_revenue), 0) AS revenue
        FROM {GOLD}.gold_shipping_analysis
        GROUP BY ship_mode
        ORDER BY revenue DESC
    """,

    # ── Quarterly Executive Trend ──
    "executive_quarterly": f"""
        SELECT year_quarter,
               total_orders,
               active_customers,
               ROUND(gross_order_value, 0) AS gross_order_value,
               ROUND(revenue_per_customer, 0) AS rev_per_customer,
               qoq_revenue_growth_pct
        FROM {GOLD}.gold_executive_summary
        ORDER BY year_quarter
    """,

    # ── Churn Risk Distribution ──
    "churn_risk_dist": f"""
        SELECT risk_tier,
               COUNT(*) AS customers,
               ROUND(AVG(churn_probability), 3) AS avg_churn_prob,
               ROUND(AVG(lifetime_value), 0) AS avg_ltv,
               ROUND(SUM(lifetime_value), 0) AS total_ltv_at_risk
        FROM {GOLD}.gold_churn_scores
        GROUP BY risk_tier
        ORDER BY avg_churn_prob DESC
    """,

    # ── YoY Growth Heatmap ──
    "yoy_growth": f"""
        SELECT year_month, region,
               ROUND(AVG(yoy_growth_pct), 1) AS yoy_growth_pct
        FROM {GOLD}.gold_monthly_sales
        WHERE yoy_growth_pct IS NOT NULL
        GROUP BY year_month, region
        ORDER BY year_month, region
    """,
}

print(f"Defined {len(dashboard_queries)} dashboard queries.")
for name in dashboard_queries:
    print(f"  • {name}")

## 3 — Validate All Queries Execute Successfully

In [None]:
print(f"{'Query':<30} {'Rows':>8}  {'Cols':>5}  Status")
print("=" * 60)

all_ok = True
for name, sql in dashboard_queries.items():
    try:
        df = spark.sql(sql)
        cnt = df.count()
        print(f"{name:<30} {cnt:>8}  {len(df.columns):>5}  ✓")
    except Exception as e:
        print(f"{name:<30} {'':>8}  {'':>5}  ✗ {str(e)[:60]}")
        all_ok = False

print(f"\n{'All queries passed!' if all_ok else 'Some queries failed — check Gold layer tables.'}")

## 4 — Create AI/BI Dashboard via Lakeview API

Databricks AI/BI Dashboards (Lakeview) can be created programmatically via the SDK.

In [None]:
import json

# Build the Lakeview dashboard definition
# Each dataset maps to a query; each page contains widget layouts.

datasets = []
for i, (name, sql) in enumerate(dashboard_queries.items()):
    datasets.append({
        "name": name,
        "displayName": name.replace("_", " ").title(),
        "query": sql.strip(),
    })

# Define the dashboard JSON (Lakeview format)
dashboard_def = {
    "pages": [
        {
            "name": "overview",
            "displayName": "Executive Overview",
            "layout": [
                {"widget": {"name": "kpi_cards", "queries": [{"name": "kpi_total_revenue", "query": {"datasetName": "kpi_total_revenue"}}], "spec": {"version": 3, "widgetType": "counter", "encodings": {"value": {"fieldName": "total_revenue", "displayName": "Total Revenue"}}}}},
                {"widget": {"name": "rev_trend", "queries": [{"name": "revenue_trend_monthly", "query": {"datasetName": "revenue_trend_monthly"}}], "spec": {"version": 3, "widgetType": "line", "encodings": {"x": {"fieldName": "year_month", "displayName": "Month"}, "y": {"fieldName": "net_revenue", "displayName": "Net Revenue"}, "color": {"fieldName": "region", "displayName": "Region"}}}}},
                {"widget": {"name": "rev_by_region", "queries": [{"name": "revenue_by_region", "query": {"datasetName": "revenue_by_region"}}], "spec": {"version": 3, "widgetType": "bar", "encodings": {"x": {"fieldName": "region", "displayName": "Region"}, "y": {"fieldName": "net_revenue", "displayName": "Net Revenue"}}}}},
                {"widget": {"name": "cust_segments", "queries": [{"name": "customer_segments", "query": {"datasetName": "customer_segments"}}], "spec": {"version": 3, "widgetType": "pie", "encodings": {"slice": {"fieldName": "rfm_segment", "displayName": "Segment"}, "value": {"fieldName": "customers", "displayName": "Customers"}}}}},
            ]
        },
        {
            "name": "products_suppliers",
            "displayName": "Products & Suppliers",
            "layout": [
                {"widget": {"name": "top_brands_chart", "queries": [{"name": "top_brands", "query": {"datasetName": "top_brands"}}], "spec": {"version": 3, "widgetType": "bar", "encodings": {"x": {"fieldName": "net_revenue", "displayName": "Revenue"}, "y": {"fieldName": "brand", "displayName": "Brand"}}}}},
                {"widget": {"name": "shipping_table", "queries": [{"name": "shipping_performance", "query": {"datasetName": "shipping_performance"}}], "spec": {"version": 3, "widgetType": "table"}}},
                {"widget": {"name": "churn_dist", "queries": [{"name": "churn_risk_dist", "query": {"datasetName": "churn_risk_dist"}}], "spec": {"version": 3, "widgetType": "bar", "encodings": {"x": {"fieldName": "risk_tier", "displayName": "Risk Tier"}, "y": {"fieldName": "total_ltv_at_risk", "displayName": "LTV at Risk"}}}}},
            ]
        },
    ],
    "datasets": datasets,
}

print(f"Dashboard definition: {len(dashboard_def['pages'])} pages, {len(datasets)} datasets")

In [None]:
# Create the dashboard using the Lakeview API
try:
    dashboard = w.lakeview.create(
        display_name="Retail Analytics — Executive Dashboard",
        parent_path=DASHBOARD_PARENT,
        serialized_dashboard=json.dumps(dashboard_def),
    )
    dashboard_url = f"{w.config.host}/sql/dashboardsv3/{dashboard.dashboard_id}"
    print(f"✓ Dashboard created!")
    print(f"  Name: Retail Analytics — Executive Dashboard")
    print(f"  ID:   {dashboard.dashboard_id}")
    print(f"  URL:  {dashboard_url}")
except Exception as e:
    print(f"⚠ Dashboard creation: {str(e)[:300]}")
    print(f"\nAlternative: Import dashboard manually via UI:")
    print(f"  1. Go to SQL → Dashboards → Create Dashboard")
    print(f"  2. Add each query from the dashboard_queries dict above")
    print(f"  3. Choose appropriate chart types for each widget")

---
## 5 — Create Genie Space

**Genie** lets business users ask questions in plain English and get answers from the Gold layer.
We programmatically configure a Genie Space with curated instructions and table access.

In [None]:
genie_tables = [
    f"{GOLD}.gold_daily_sales",
    f"{GOLD}.gold_monthly_sales",
    f"{GOLD}.gold_customer_rfm",
    f"{GOLD}.gold_product_performance",
    f"{GOLD}.gold_supplier_scorecard",
    f"{GOLD}.gold_shipping_analysis",
    f"{GOLD}.gold_executive_summary",
    f"{GOLD}.gold_churn_scores",
]

genie_instructions = """
You are a retail analytics assistant. The data comes from a TPC-H retail dataset with the
following Gold-layer tables:

- gold_daily_sales: Daily revenue by region and market segment
- gold_monthly_sales: Monthly revenue with MoM and YoY growth
- gold_customer_rfm: Customer segmentation (Champions, Loyal, At Risk, Lost, etc.)
- gold_product_performance: Product revenue, margin, and return rates by brand/type
- gold_supplier_scorecard: Supplier reliability, on-time delivery, return rates
- gold_shipping_analysis: Shipping mode performance across regions
- gold_executive_summary: Quarterly KPI roll-up for executives
- gold_churn_scores: ML-predicted churn risk per customer

Key business terms:
- "Revenue" means net_revenue (after discounts, before tax)
- "Margin" means profit_margin_pct
- "Churn risk" uses the risk_tier column: Critical, High, Medium, Low
- Regions are: AMERICA, EUROPE, ASIA, AFRICA, MIDDLE EAST
- Market segments: AUTOMOBILE, BUILDING, FURNITURE, HOUSEHOLD, MACHINERY
- Time range: 1992 to 1998

When answering questions:
- Format currency with $ and commas
- Format percentages with %
- Use the most specific table available
"""

sample_questions = [
    "What was total revenue by region in 1997?",
    "Which market segment has the highest profit margin?",
    "How many customers are Champions vs At Risk?",
    "What brands have the highest return rate?",
    "Show me the quarterly revenue trend",
    "Which shipping mode is most reliable?",
    "How many customers are at critical churn risk?",
    "Compare YoY revenue growth across regions",
]

print(f"Genie config: {len(genie_tables)} tables, {len(sample_questions)} sample questions")

In [None]:
# ── Find a SQL warehouse ─────────────────────────────────────────────────────
import json as _json

warehouse_id = None
try:
    for wh in w.warehouses.list():
        state_val = wh.state.value if wh.state else ""
        if state_val in ("RUNNING", "STARTING", "STOPPED"):
            warehouse_id = wh.id
            if state_val == "RUNNING":
                break
    if warehouse_id:
        print(f"✓ Using SQL warehouse: {warehouse_id}")
    else:
        print("⚠ No SQL warehouse found.")
except Exception as e:
    print(f"⚠ Could not list warehouses: {e}")

# ── Create Genie Space via SDK with serialized_space ─────────────────────────
genie_created = False
if warehouse_id:
    serialized = _json.dumps({
        "table_identifiers": genie_tables,
        "curated_questions": sample_questions,
        "instructions": genie_instructions.strip(),
    })
    try:
        genie_space = w.genie.create_space(
            warehouse_id=warehouse_id,
            serialized_space=serialized,
            title="Retail Analytics Genie",
            description="Ask questions about retail sales, customers, products, and suppliers in plain English.",
            parent_path=DASHBOARD_PARENT,
        )
        space_id = genie_space.space_id
        db_host = spark.conf.get("spark.databricks.workspaceUrl")
        genie_url = f"https://{db_host}/genie/rooms/{space_id}"
        print(f"✓ Genie Space created!")
        print(f"  Name: Retail Analytics Genie")
        print(f"  ID:   {space_id}")
        print(f"  URL:  {genie_url}")
        print(f"\n  Try asking:")
        for q in sample_questions[:5]:
            print(f"    → {q}")
        genie_created = True
    except Exception as e:
        print(f"⚠ SDK Genie Space creation failed: {e}")

if not genie_created:
    print(f"\n{'─' * 60}")
    print(f"MANUAL GENIE SPACE SETUP (takes ~2 minutes)")
    print(f"{'─' * 60}")
    print(f"  1. Open: AI/BI → Genie Spaces → '+ New Genie Space'")
    print(f"  2. Title: Retail Analytics Genie")
    print(f"  3. Attach SQL warehouse: {warehouse_id or '(any Pro/Serverless warehouse)'}")
    print(f"  4. Under 'Tables', click '+ Add tables' and add:")
    for t in genie_tables:
        print(f"       {t}")
    print(f"  5. Under 'General instructions', paste the following:")
    print(f"{'─' * 60}")
    print(genie_instructions.strip())
    print(f"{'─' * 60}")
    print(f"  6. Under 'Sample questions', add:")
    for q in sample_questions:
        print(f"       • {q}")
    print(f"\n  Then click 'Save'. Your Genie Space is ready!")

## 6 — Preview Dashboard Queries

In [None]:
# Preview: Revenue Trend
print("Revenue Trend by Region (Monthly)")
display(spark.sql(dashboard_queries["revenue_trend_monthly"]))

In [None]:
# Preview: Customer Segments
print("Customer Segment Distribution")
display(spark.sql(dashboard_queries["customer_segments"]))

In [None]:
# Preview: Churn Risk
print("Churn Risk Distribution")
display(spark.sql(dashboard_queries["churn_risk_dist"]))

In [None]:
# Preview: Executive Quarterly
print("Executive Quarterly Summary")
display(spark.sql(dashboard_queries["executive_quarterly"]))

---
### AI/BI Dashboard & Genie Complete

**What was created:**
- AI/BI Dashboard with 9 widgets across 2 pages (Executive Overview + Products & Suppliers)
- Genie Space with 8 Gold tables and curated business instructions

**Databricks features demonstrated:**
- Lakeview Dashboard API for programmatic creation
- Genie Spaces for self-serve natural language analytics
- SQL-backed widgets (no ETL to a BI tool needed)
- All data governed via Unity Catalog

**Next → `09_databricks_app.ipynb`** for deploying an interactive Databricks App.