# Databricks App — Interactive Retail Analytics

**Databricks Solutions Architecture Demo**

This notebook creates and deploys a **Databricks App** — a Gradio-based web application that provides:

1. **Customer 360 Lookup** — Search any customer for profile, RFM segment, and churn risk
2. **Revenue Explorer** — Interactive revenue analysis by region and time period
3. **Product Analytics** — Brand/product performance with filtering
4. **AI Chat** — Talk to the Retail Analytics Agent (if deployed)

### Databricks features demonstrated
- **Databricks Apps** — deploy web apps natively on Databricks
- **Gradio** for rapid UI development
- **SQL Connector** to query Gold tables from the app
- **Serving endpoint** integration for the AI agent

### Prerequisites
Run notebooks 01–08 first.

## 1 — Configuration

In [None]:
import os

CATALOG = spark.catalog.currentCatalog()
GOLD    = f"{CATALOG}.retail_gold"
SILVER  = f"{CATALOG}.retail_silver"

APP_NAME = "retail-analytics-app"
APP_DIR  = "/Workspace/Users/{}/apps/retail_analytics".format(
    spark.sql("SELECT current_user()").collect()[0][0]
)

print(f"Catalog : {CATALOG}")
print(f"Gold    : {GOLD}")
print(f"App Dir : {APP_DIR}")

## 2 — Create App Directory

In [None]:
import os
os.makedirs(APP_DIR, exist_ok=True)
print(f"✓ App directory: {APP_DIR}")

## 3 — Write the Gradio App Code

The app uses `databricks.sql` connector to query Gold tables and renders results with Gradio.

In [None]:
app_code = '''
import os
import logging

import gradio as gr
import pandas as pd

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# ── Config ─────────────────────────────────────────────────────────────────
CATALOG = os.getenv("CATALOG", "CATALOG_PLACEHOLDER")
GOLD = f"{CATALOG}.retail_gold"
SILVER = f"{CATALOG}.retail_silver"

def get_connection():
    from databricks import sql as dbsql
    return dbsql.connect(
        server_hostname=os.getenv("DATABRICKS_HOST", "").replace("https://", ""),
        http_path=os.getenv("DATABRICKS_SQL_WAREHOUSE_HTTP_PATH", ""),
        access_token=os.getenv("DATABRICKS_TOKEN", ""),
    )

def run_query(sql):
    try:
        with get_connection() as conn:
            with conn.cursor() as cursor:
                cursor.execute(sql)
                columns = [desc[0] for desc in cursor.description]
                return pd.DataFrame(cursor.fetchall(), columns=columns)
    except Exception as e:
        logger.error(f"Query failed: {e}")
        return pd.DataFrame({"error": [str(e)]})

# ── Tab 1: Customer 360 ──
def customer_lookup(customer_id):
    try:
        cid = int(customer_id)
    except ValueError:
        return "Please enter a valid customer ID (integer).", None

    profile = run_query(f"""
        SELECT c.customer_key, c.customer_name, c.market_segment,
               c.nation_name, c.region_name, c.balance_tier,
               r.rfm_segment, r.rfm_score, r.r_score, r.f_score, r.m_score,
               ROUND(r.monetary, 2) as lifetime_value,
               r.frequency as total_orders,
               r.recency_days,
               ROUND(r.avg_order_value, 2) as avg_order_value
        FROM {SILVER}.dim_customer c
        LEFT JOIN {GOLD}.gold_customer_rfm r ON c.customer_key = r.customer_key
        WHERE c.customer_key = {cid}
    """)

    if profile.empty:
        return f"Customer {cid} not found.", None

    row = profile.iloc[0]
    summary = f"""
## Customer #{row["customer_key"]} — {row["customer_name"]}

| Attribute | Value |
|---|---|
| **Segment** | {row["market_segment"]} |
| **Region** | {row["region_name"]} ({row["nation_name"]}) |
| **Balance Tier** | {row["balance_tier"]} |
| **RFM Segment** | {row["rfm_segment"]} |
| **RFM Score** | {row["rfm_score"]} (R:{row["r_score"]} F:{row["f_score"]} M:{row["m_score"]}) |
| **Lifetime Value** | ${row["lifetime_value"]:,.2f} |
| **Total Orders** | {row["total_orders"]} |
| **Avg Order Value** | ${row["avg_order_value"]:,.2f} |
| **Days Since Last Order** | {row["recency_days"]} |
    """

    # Get churn score if available
    try:
        churn = run_query(f"""
            SELECT churn_probability, risk_tier
            FROM {GOLD}.gold_churn_scores
            WHERE customer_key = {cid}
        """)
        if not churn.empty:
            cr = churn.iloc[0]
            summary += f"\n| **Churn Probability** | {cr[\"churn_probability\"]:.1%} ({cr[\"risk_tier\"]}) |"
    except:
        pass

    return summary, profile

# ── Tab 2: Revenue Explorer ──
def revenue_explorer(region, start_month, end_month):
    where = "WHERE 1=1"
    if region != "ALL":
        where += f" AND region = \'{region}\'"
    if start_month:
        where += f" AND year_month >= \'{start_month}\'"
    if end_month:
        where += f" AND year_month <= \'{end_month}\'"

    df = run_query(f"""
        SELECT year_month, region,
               ROUND(SUM(net_revenue), 0) as net_revenue,
               SUM(num_orders) as orders,
               ROUND(AVG(profit_margin_pct), 1) as margin_pct
        FROM {GOLD}.gold_monthly_sales
        {where}
        GROUP BY year_month, region
        ORDER BY year_month, region
    """)

    total = df["net_revenue"].sum()
    summary = f"**Total Revenue**: ${total:,.0f} | **Rows**: {len(df)}"
    return summary, df

# ── Tab 3: Product Analytics ──
def product_analytics(sort_by, top_n):
    df = run_query(f"""
        SELECT brand, price_band,
               ROUND(SUM(net_revenue), 0) as net_revenue,
               ROUND(AVG(profit_margin_pct), 1) as margin_pct,
               ROUND(AVG(return_rate_pct), 1) as return_rate_pct,
               SUM(num_orders) as orders
        FROM {GOLD}.gold_product_performance
        GROUP BY brand, price_band
        ORDER BY {sort_by} DESC
        LIMIT {int(top_n)}
    """)
    return df

# ── Tab 4: Executive KPIs ──
def executive_kpis():
    df = run_query(f"""
        SELECT year_quarter, total_orders, active_customers,
               ROUND(gross_order_value, 0) as gross_order_value,
               ROUND(avg_order_value, 0) as avg_order_value,
               ROUND(revenue_per_customer, 0) as rev_per_customer,
               qoq_revenue_growth_pct
        FROM {GOLD}.gold_executive_summary
        ORDER BY year_quarter
    """)
    return df

# ── Build Gradio App ──
with gr.Blocks(
    title="Retail Analytics",
    theme=gr.themes.Soft(),
) as app:
    gr.Markdown("# Retail Analytics Dashboard")
    gr.Markdown("Powered by **Databricks Lakehouse** — Gold layer tables with Delta Lake")

    with gr.Tab("Customer 360"):
        gr.Markdown("### Look up any customer by ID")
        with gr.Row():
            cust_input = gr.Textbox(label="Customer ID", placeholder="e.g. 42", scale=1)
            cust_btn = gr.Button("Look Up", variant="primary", scale=1)
        cust_md = gr.Markdown()
        cust_table = gr.Dataframe(label="Raw Profile")
        cust_btn.click(customer_lookup, inputs=cust_input, outputs=[cust_md, cust_table])

    with gr.Tab("Revenue Explorer"):
        gr.Markdown("### Monthly revenue by region")
        with gr.Row():
            region_dd = gr.Dropdown(
                choices=["ALL", "AMERICA", "EUROPE", "ASIA", "AFRICA", "MIDDLE EAST"],
                value="ALL", label="Region"
            )
            start_m = gr.Textbox(label="Start Month (yyyy-MM)", value="1995-01")
            end_m = gr.Textbox(label="End Month (yyyy-MM)", value="1997-12")
            rev_btn = gr.Button("Query", variant="primary")
        rev_summary = gr.Markdown()
        rev_table = gr.Dataframe(label="Revenue Data")
        rev_btn.click(revenue_explorer, inputs=[region_dd, start_m, end_m], outputs=[rev_summary, rev_table])

    with gr.Tab("Product Analytics"):
        gr.Markdown("### Product performance by brand")
        with gr.Row():
            sort_dd = gr.Dropdown(
                choices=["net_revenue", "margin_pct", "return_rate_pct", "orders"],
                value="net_revenue", label="Sort By"
            )
            topn_slider = gr.Slider(minimum=5, maximum=50, value=20, step=5, label="Top N")
            prod_btn = gr.Button("Query", variant="primary")
        prod_table = gr.Dataframe(label="Product Performance")
        prod_btn.click(product_analytics, inputs=[sort_dd, topn_slider], outputs=prod_table)

    with gr.Tab("Executive KPIs"):
        gr.Markdown("### Quarterly executive summary")
        exec_btn = gr.Button("Load KPIs", variant="primary")
        exec_table = gr.Dataframe(label="Quarterly KPIs")
        exec_btn.click(executive_kpis, outputs=exec_table)

if __name__ == "__main__":
    logger.info(f"Starting Retail Analytics App (catalog={CATALOG})")
    app.launch(server_name="0.0.0.0", server_port=int(os.getenv("PORT", "8000")))
'''

# Replace placeholder with actual catalog
app_code = app_code.replace("CATALOG_PLACEHOLDER", CATALOG)

app_file = f"{APP_DIR}/app.py"

# Write using standard Python I/O (serverless compatible)
with open(app_file, "w") as f:
    f.write(app_code)
print(f"✓ App code written to: {app_file}")

## 4 — Write App Configuration

In [None]:
config_content = f"""command:
- python
- app.py
env:
- name: CATALOG
  value: {CATALOG}
"""

config_file = f"{APP_DIR}/app.yaml"

with open(config_file, "w") as f:
    f.write(config_content)
print(f"✓ App config: {config_file}")
print(config_content)

In [None]:
# Requirements file
requirements = """gradio>=4.0
databricks-sql-connector>=3.0
pandas
"""

req_file = f"{APP_DIR}/requirements.txt"

with open(req_file, "w") as f:
    f.write(requirements)
print(f"✓ Requirements: {req_file}")

## 5 — Deploy the Databricks App

In [None]:
%pip install databricks-sdk --upgrade --quiet

In [None]:
import requests, time, json

# Re-init variables after pip restart
CATALOG  = spark.catalog.currentCatalog()
GOLD     = f"{CATALOG}.retail_gold"
SILVER   = f"{CATALOG}.retail_silver"
APP_NAME = "retail-analytics-app"
APP_DIR  = "/Workspace/Users/{}/apps/retail_analytics".format(
    spark.sql("SELECT current_user()").collect()[0][0]
)

db_token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
db_host  = spark.conf.get("spark.databricks.workspaceUrl")
headers  = {"Authorization": f"Bearer {db_token}", "Content-Type": "application/json"}
base     = f"https://{db_host}/api/2.0/apps"

# ── Step A: Create the app via REST API ───────────────────────────────────────
print("Creating app...")
resp = requests.post(base, headers=headers, json={
    "name": APP_NAME,
    "description": "Interactive retail analytics dashboard powered by Databricks Lakehouse",
    "resources": [
        {"name": "sql-warehouse", "sql_warehouse": {"id": "92a4af71c38c0649", "permission": "CAN_USE"}},
    ],
})
if resp.status_code == 200:
    app_data = resp.json()
    print(f"✓ App created: {APP_NAME}")
    print(f"  App URL: {app_data.get('url', 'pending...')}")
elif resp.status_code == 409:
    print(f"○ App '{APP_NAME}' already exists — reusing it")
else:
    print(f"⚠ App creation ({resp.status_code}): {resp.text[:300]}")

# ── Step B: Deploy source code via REST API ───────────────────────────────────
print("\nDeploying app...")
resp = requests.post(f"{base}/{APP_NAME}/deployments", headers=headers, json={
    "source_code_path": APP_DIR,
    "mode": "SNAPSHOT",
})
if resp.status_code == 200:
    dep = resp.json()
    dep_id = dep.get("deployment_id", "")
    print(f"✓ Deployment started: {dep_id}")

    # Poll until deployment completes (up to 5 min)
    for i in range(30):
        time.sleep(10)
        status_resp = requests.get(f"{base}/{APP_NAME}/deployments/{dep_id}", headers=headers)
        if status_resp.status_code == 200:
            state = status_resp.json().get("status", {}).get("state", "")
            print(f"  [{i*10}s] Status: {state}")
            if state == "SUCCEEDED":
                # Get the app URL
                app_resp = requests.get(f"{base}/{APP_NAME}", headers=headers)
                if app_resp.status_code == 200:
                    app_url = app_resp.json().get("url", "")
                    print(f"\n✓ App deployed successfully!")
                    print(f"  URL: https://{app_url}")
                break
            elif state == "FAILED":
                msg = status_resp.json().get("status", {}).get("message", "")
                print(f"\n✗ Deployment failed: {msg}")
                break
    else:
        print("  Deployment still in progress — check Compute → Apps in the UI")
else:
    print(f"⚠ Deployment ({resp.status_code}): {resp.text[:300]}")
    print(f"\nManual deployment:")
    print(f"  1. Go to Compute → Apps → click '{APP_NAME}'")
    print(f"  2. Create a new deployment with source: {APP_DIR}")

## 6 — Verify App Files

In [None]:
import os

print(f"App directory contents ({APP_DIR}):")
for name in os.listdir(APP_DIR):
    full = os.path.join(APP_DIR, name)
    size = os.path.getsize(full) if os.path.isfile(full) else 0
    print(f"  {name:<25} {size:>8} bytes")

## 7 — Quick Smoke Test (Query Gold from notebook)

In [None]:
# Verify the same queries the app will run
print("Smoke test — queries that the app executes:\n")

print("1. Customer lookup (ID=42):")
display(spark.sql(f"""
    SELECT c.customer_key, c.customer_name, c.market_segment, c.region_name,
           r.rfm_segment, r.rfm_score, ROUND(r.monetary, 2) as ltv
    FROM {SILVER}.dim_customer c
    LEFT JOIN {GOLD}.gold_customer_rfm r ON c.customer_key = r.customer_key
    WHERE c.customer_key = 42
"""))

print("\n2. Revenue summary (AMERICA, 1996):")
display(spark.sql(f"""
    SELECT year_month, region, ROUND(SUM(net_revenue), 0) as net_revenue
    FROM {GOLD}.gold_monthly_sales
    WHERE region = 'AMERICA' AND year_month >= '1996-01' AND year_month <= '1996-12'
    GROUP BY year_month, region
    ORDER BY year_month
"""))

print("\n3. Top 5 brands:")
display(spark.sql(f"""
    SELECT brand, ROUND(SUM(net_revenue), 0) as revenue
    FROM {GOLD}.gold_product_performance
    GROUP BY brand
    ORDER BY revenue DESC
    LIMIT 5
"""))

---
### Databricks App Complete

**What was built:**
- A 4-tab Gradio web application deployed natively on Databricks
- Customer 360, Revenue Explorer, Product Analytics, Executive KPIs
- Queries Gold layer tables via `databricks-sql-connector`

**Databricks features demonstrated:**
- **Databricks Apps** — deploy and host web apps with zero external infra
- **SQL Connector** — low-latency queries from app to lakehouse
- **Gradio** — rapid prototyping of interactive data apps
- **Unity Catalog** — all data access governed by UC permissions

---

## Full Demo Architecture Summary

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        DATA GENERATION                                 │
│  tpch_databricks_notebook.ipynb → TPC-H data (1GB → 100GB)            │
└───────────────────────────────┬─────────────────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────────────────┐
│                     MEDALLION ARCHITECTURE                              │
│  01_bronze  →  02_silver  →  03_gold                                   │
│  (raw+audit)   (enriched)    (aggregated KPIs)                         │
└───────────────────────────────┬─────────────────────────────────────────┘
                                │
          ┌─────────────────────┼──────────────────────┐
          │                     │                      │
┌─────────▼──────────┐ ┌───────▼────────┐ ┌───────────▼──────────┐
│  05_LAKEBASE       │ │  06_AI/ML      │ │  04_SQL ANALYTICS    │
│  Online Tables     │ │  Churn Model   │ │  12 Gold queries     │
│  Feature Store     │ │  Forecasting   │ │                      │
│  Vector Search     │ │  MLflow + UC   │ │                      │
└─────────┬──────────┘ └───────┬────────┘ └──────────────────────┘
          │                     │
┌─────────▼─────────────────────▼────────────────────────────────┐
│  07_AI AGENT          08_AI/BI DASHBOARD     09_DATABRICKS APP │
│  Mosaic AI Agent      Lakeview Dashboard     Gradio Web App    │
│  6 UC SQL Tools       Genie Space            Customer 360      │
│  LLM + Tool Calling   Self-serve Q&A         Revenue Explorer  │
└────────────────────────────────────────────────────────────────┘
```