### SQL warehouses

- In the Databricks, SQL Warehouses are the specialized compute resources designed specifically for SQL analysts and BI tools. 
- While you’ve been using "All-Purpose Clusters" for your notebooks, SQL Warehouses are the high-performance engines used for the "Gold" layer of your project.
- Think of a SQL Warehouse as a professional kitchen: it is pre-stocked and optimized specifically for one task (running SQL queries) with zero distractions.
- Key Features: Why Use a Warehouse?
- Photon Engine: Every SQL Warehouse is powered by Photon, a vectorized query engine written in C++ that makes your SELECT statements and aggregations run up to 3x–5x faster than standard Spark.
- Serverless Options: Unlike clusters that take 3–5 minutes to start, Serverless SQL Warehouses spin up in about 2–6 seconds. They are managed by Databricks, so you don't worry about VM configuration.
- T-Shirt Sizing: You don't pick CPU or RAM. You pick a size (2X-Small, Small, Medium, etc.).
- Auto-Stop: They are aggressive about saving money. A serverless warehouse can be set to turn off after just 1 minute of inactivity.
- SQL Warehouses vs. All-Purpose Clusters
| Feature | All-Purpose Cluster (Notebooks) | SQL Warehouse (Analyst View) |
| ----- | ----- | ----- |
| Primary Goal | Data Engineering & Data Science. | BI, Reporting, and Ad-hoc SQL.
| Languages | Python, Scala, R, SQL. | SQL Only.
| Startup Time | ~3–5 Minutes. | ~2–6 Seconds (Serverless).
| Scaling | Vertical/Horizontal (Spark). | Intelligent Workload Management (IWM).
| Best For | Bronze/Silver Python ETL. | Gold Dashboards (Power BI/Tableau).

### Complex analytical queries

**1. Window Functions (Ranking & Trends)**
- Window functions allow you to perform calculations across a set of rows related to the current row without collapsing them into a single group.
- Scenario: Find the top 3 selling brands within each category.
```
SQL
SELECT * FROM (
  SELECT 
    main_category, 
    brand, 
    SUM(total_revenue) as revenue,
    RANK() OVER (PARTITION BY main_category ORDER BY SUM(total_revenue) DESC) as category_rank
  FROM ecommerce_prod.gold.brand_metrics
  GROUP BY main_category, brand
) 
WHERE category_rank <= 3;
```

**2. Cohort Analysis (Customer Retention)**
-- This helps you understand if customers who joined in January are more valuable than those who joined in February.
- Scenario: Calculate the "Rolling 7-Day Average" of sales to smooth out weekend spikes.
```
SQL
SELECT 
  event_date,
  total_daily_sales,
  AVG(total_daily_sales) OVER (
    ORDER BY event_date 
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as rolling_7day_avg
FROM ecommerce_prod.gold.daily_sales_summary;
```

**3. Common Table Expressions (CTEs)**
- CTEs (the WITH clause) are essential for readability. Instead of nesting five subqueries, you create "temporary result sets" that lead to a final answer.
- Scenario: Identify "Whale" customers (those who spend 10x the average).
```
SQL
WITH AvgSpend AS (
  SELECT AVG(total_spend) as global_avg FROM ecommerce_prod.silver.user_stats
),
Whales AS (
  SELECT user_id, total_spend 
  FROM ecommerce_prod.silver.user_stats 
  WHERE total_spend > (SELECT global_avg * 10 FROM AvgSpend)
)
SELECT * FROM Whales ORDER BY total_spend DESC;
```

**Why the SQL Warehouse handles this better**
- When you run these queries on a SQL Warehouse (instead of a standard cluster):
- Result Caching: If you run the same "Top 3 Brands" query twice, the second result is instant because the warehouse cached it.
- Photon Acceleration: Photon specifically optimizes the "Join" and "Aggregation" operations inside these complex queries.
- Concurrency: If 10 analysts run these queries at once, the Warehouse scales automatically to prevent "queueing."

### Dashboard creation
- In Databricks, we use AI/BI Dashboards. They are fast, built-in, and designed specifically to work with your SQL Warehouse.

**The 3-Step Build Process**
- **1. Define Your Data (The Data Tab)**
- Before you draw a chart, you need to tell the dashboard what data to look at. 
- You can point it directly to the tables.
- Best Practice: Don't use SELECT *. Only bring in the columns you need for the chart (e.g., event_date, brand, total_revenue). This keeps the dashboard snappy even with millions of rows.

- **2. Create Visualizations (The Canvas Tab)**
- Click Add a Visualization and choose your type.
- Time Series (Line Chart): Show daily revenue trends.
- Bar Chart: Compare top 10 brands by sales.
- Counter: A big, bold number showing "Total Sales This Month."
- Funnel: Show the drop-off from view → add_to_cart → purchase.

- **3. Add Interactivity (Filters)**
- This is the most important part for your users. Add a Date Range filter and a Category dropdown at the top. When the Marketing Manager selects "Electronics," every chart on the dashboard will instantly update to show only that data.

- **Sharing & Governance:** 
- Dashboards respect permissions. When you click Publish, you have two choices:
- Run as Viewer (Individual Permissions): The user can only see data they have SELECT access to in Unity Catalog.
- Run as Owner (Shared Permissions): The user sees the data through your eyes. This is useful if you want to show a summary to someone who doesn't have access to the raw tables.

### Visualizations & filters

- In Databricks AI/BI Dashboards, visualizations and filters turn your complex SQL queries into a "data app" that anyone can use. 

**1. Choosing the Right Visualizations**
- your dashboard should answer three levels of questions: Status, Trend, and Detail.
- **KPI Big Number (The Status):** Use a "Counter" widget for Total Revenue and Conversion Rate. It gives the user an immediate "health check."
- **Area or Line Chart (The Trend):** Map total_revenue against event_date. This helps you spot "Cyber Monday" peaks or seasonal dips.
- **Heatmap or Treemap (The Detail):** Perfect for showing which Brands are dominating within a specific Category.

**2. Using Filters for Interactivity**
- Filters (Parameters) are the most powerful part of the UI. Instead of writing a new query for every department, you add a filter at the top.
- Date Range Filter: Allows users to toggle between "Last 7 Days," "Last Month," or "Year to Date."
- Multi-Select Dropdown: Let users filter by main_category (e.g., Electronics vs. Apparel).
- Text Search: Useful for finding specific Product IDs or User IDs.

**3. Making it "Fast" (The Engineering Secret)**
- When a user changes a filter on dataset, you don't want them waiting 30 seconds. To fix this, we use Widget Parameters directly in your SQL:
```
SQL
-- This query is "connected" to the dashboard filter
SELECT 
    event_date, 
    SUM(total_revenue) as revenue
FROM ecommerce_prod.gold.brand_metrics
WHERE main_category = :category_filter  -- This ':variable' creates the UI filter
GROUP BY 1
```

### Task 1: Create a SQL Warehouse
- This is your dedicated compute for BI.
- In the sidebar, switch the persona to SQL.
- Click SQL Warehouses > Create SQL Warehouse.
- Name it Ecom_Analytics_WH.
- Set Cluster Size to 2X-Small (Serverless if available for instant start).
- Set Auto-stop to 10 mins to save costs.

### Task 2- Write analytical queries

#### A. Daily Revenue Trend

In [0]:
%sql
SELECT 
  date_trunc('day', event_time) AS sales_date,
  SUM(price) AS daily_revenue
FROM ecommerce_prod.silver.cleaned_events
WHERE event_type = 'purchase'
GROUP BY 1
ORDER BY 1;

#### B. Funnel Chart

In [0]:
%sql
SELECT 
  event_type,
  COUNT(DISTINCT user_session) AS unique_sessions
FROM ecommerce_prod.silver.cleaned_events
WHERE event_type IN ('view', 'cart', 'purchase')
GROUP BY 1
ORDER BY 
  CASE 
    WHEN event_type = 'view' THEN 1 
    WHEN event_type = 'cart' THEN 2 
    WHEN event_type = 'purchase' THEN 3 
  END;

### C. Bar Chart

In [0]:
%sql
SELECT 
  main_category, 
  COUNT(*) as total_items_sold,
  SUM(price) as total_category_revenue
FROM ecommerce_prod.silver.cleaned_events
WHERE event_type = 'purchase' AND main_category IS NOT NULL
GROUP BY 1
ORDER BY total_category_revenue DESC
LIMIT 10;