# Snowflake Semantic View Autopilot 
# Virtual Hands-On Lab

In this hands-on lab, you'll learn how to use **Semantic View Autopilot** to automatically generate semantic views from existing BI artifacts like Tableau workbooks. You'll also explore how to query semantic views using **Standard SQL** and generate Tableau Data Source (.tds) files for seamless BI integration.

## What You'll Build:
- **Marketing Data Foundation**: Dimension tables (product, region, campaign, channel) and a fact table for campaign performance
- **Automated Semantic View**: Use Autopilot to generate a semantic view from a Tableau workbook
- **Standard SQL Queries**: Query semantic views using familiar ANSI-style SQL
- **Tableau Integration**: Generate .tds files that connect Tableau directly to semantic views

## Key Technologies:
- Semantic View Autopilot
- Standard SQL for Semantic Views
- Snowflake Stored Procedures (Python)
- Streamlit for interactive downloads

## Step 1: Load The Base Campaign Data

This section creates the database, schema, and loads sample marketing data that we'll use throughout this lab. The data model follows a star schema pattern commonly used in analytics:

#### Dimension Tables
| Table | Description |
|-------|-------------|
| `product_dim` | Product information including name, category, and vertical |
| `region_dim` | Geographic regions for campaign targeting |
| `campaign_dim` | Marketing campaigns with names and objectives |
| `channel_dim` | Marketing channels (e.g., Facebook, Google, Email) |

#### Fact Table
| Table | Description |
|-------|-------------|
| `marketing_campaign_fact` | Campaign performance metrics including spend, leads generated, and impressions |

> **Note**: This script takes approximately 1 minute to run. It creates the database `SVA_VHOL_DB`, loads data from a GitHub repository, and sets up the necessary infrastructure.

In [None]:
-- =============================================================================
-- This script borrows heavily from the Snowflake Intelligence end-to-end demo:
-- https://github.com/NickAkincilar/Snowflake_AI_DEMO
-- Expected runtime: ~1 minute
-- =============================================================================

-- -----------------------------------------------------------------------------
-- SETUP: Role, Database, and Schema
-- -----------------------------------------------------------------------------
USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE DATABASE SVA_VHOL_DB;
USE DATABASE SVA_VHOL_DB;

CREATE SCHEMA IF NOT EXISTS SVA_VHOL_SCHEMA;
USE SCHEMA SVA_VHOL_SCHEMA;

-- Optional: Grant public access
GRANT USAGE ON DATABASE SVA_VHOL_DB TO ROLE PUBLIC;
GRANT USAGE ON SCHEMA SVA_VHOL_DB.SVA_VHOL_SCHEMA TO ROLE PUBLIC;

-- -----------------------------------------------------------------------------
-- FILE FORMAT: CSV Configuration
-- -----------------------------------------------------------------------------
CREATE OR REPLACE FILE FORMAT CSV_FORMAT
    TYPE                         = 'CSV'
    FIELD_DELIMITER              = ','
    RECORD_DELIMITER             = '\n'
    SKIP_HEADER                  = 1
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    TRIM_SPACE                   = TRUE
    ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
    ESCAPE                       = 'NONE'
    ESCAPE_UNENCLOSED_FIELD      = '\134'
    DATE_FORMAT                  = 'YYYY-MM-DD'
    TIMESTAMP_FORMAT             = 'YYYY-MM-DD HH24:MI:SS'
    NULL_IF                      = ('NULL', 'null', '', 'N/A', 'n/a');

-- -----------------------------------------------------------------------------
-- GIT INTEGRATION: Connect to GitHub Repository
-- -----------------------------------------------------------------------------
CREATE OR REPLACE API INTEGRATION git_api_integration
    API_PROVIDER       = git_https_api
    API_ALLOWED_PREFIXES = ('https://github.com/NickAkincilar/')
    ENABLED            = TRUE;

CREATE OR REPLACE GIT REPOSITORY SVA_VHOL_REPO
    API_INTEGRATION = git_api_integration
    ORIGIN          = 'https://github.com/NickAkincilar/Snowflake_AI_DEMO.git';

-- -----------------------------------------------------------------------------
-- STAGE: Internal Storage for Data Files
-- -----------------------------------------------------------------------------
CREATE OR REPLACE STAGE INTERNAL_DATA_STAGE
    FILE_FORMAT = CSV_FORMAT
    COMMENT     = 'Internal stage for copied demo data files'
    DIRECTORY   = (ENABLE = TRUE)
    ENCRYPTION  = (TYPE = 'SNOWFLAKE_SSE');

ALTER GIT REPOSITORY SVA_VHOL_REPO FETCH;

-- -----------------------------------------------------------------------------
-- COPY DATA FROM GIT TO INTERNAL STAGE
-- -----------------------------------------------------------------------------
COPY FILES
    INTO @INTERNAL_DATA_STAGE/demo_data/
    FROM @SVA_VHOL_REPO/branches/main/demo_data/;

COPY FILES
    INTO @INTERNAL_DATA_STAGE/unstructured_docs/
    FROM @SVA_VHOL_REPO/branches/main/unstructured_docs/;

LS @INTERNAL_DATA_STAGE;

ALTER STAGE INTERNAL_DATA_STAGE REFRESH;

-- -----------------------------------------------------------------------------
-- DIMENSION TABLES
-- -----------------------------------------------------------------------------

-- Product Dimension
CREATE OR REPLACE TABLE product_dim (
    product_key   INT PRIMARY KEY,
    product_name  VARCHAR(200) NOT NULL,
    category_key  INT NOT NULL,
    category_name VARCHAR(100),
    vertical      VARCHAR(50)
);

-- Region Dimension
CREATE OR REPLACE TABLE region_dim (
    region_key  INT PRIMARY KEY,
    region_name VARCHAR(100) NOT NULL
);

-- Campaign Dimension
CREATE OR REPLACE TABLE campaign_dim (
    campaign_key  INT PRIMARY KEY,
    campaign_name VARCHAR(300) NOT NULL,
    objective     VARCHAR(100)
);

-- Channel Dimension
CREATE OR REPLACE TABLE channel_dim (
    channel_key  INT PRIMARY KEY,
    channel_name VARCHAR(100) NOT NULL
);

-- -----------------------------------------------------------------------------
-- FACT TABLE
-- -----------------------------------------------------------------------------
CREATE OR REPLACE TABLE marketing_campaign_fact (
    campaign_fact_id INT PRIMARY KEY,
    date             DATE NOT NULL,
    campaign_key     INT NOT NULL,
    product_key      INT NOT NULL,
    channel_key      INT NOT NULL,
    region_key       INT NOT NULL,
    spend            DECIMAL(10,2) NOT NULL,
    leads_generated  INT NOT NULL,
    impressions      INT NOT NULL
);

-- -----------------------------------------------------------------------------
-- LOAD DIMENSION DATA
-- -----------------------------------------------------------------------------
COPY INTO product_dim
    FROM @INTERNAL_DATA_STAGE/demo_data/product_dim.csv
    FILE_FORMAT = CSV_FORMAT
    ON_ERROR    = 'CONTINUE';

COPY INTO region_dim
    FROM @INTERNAL_DATA_STAGE/demo_data/region_dim.csv
    FILE_FORMAT = CSV_FORMAT
    ON_ERROR    = 'CONTINUE';

COPY INTO campaign_dim
    FROM @INTERNAL_DATA_STAGE/demo_data/campaign_dim.csv
    FILE_FORMAT = CSV_FORMAT
    ON_ERROR    = 'CONTINUE';

COPY INTO channel_dim
    FROM @INTERNAL_DATA_STAGE/demo_data/channel_dim.csv
    FILE_FORMAT = CSV_FORMAT
    ON_ERROR    = 'CONTINUE';

-- -----------------------------------------------------------------------------
-- LOAD FACT DATA
-- -----------------------------------------------------------------------------
COPY INTO marketing_campaign_fact
    FROM @INTERNAL_DATA_STAGE/demo_data/marketing_campaign_fact.csv
    FILE_FORMAT = CSV_FORMAT
    ON_ERROR    = 'CONTINUE';

-- -----------------------------------------------------------------------------
-- VERIFICATION: Check Data Loads
-- -----------------------------------------------------------------------------
SHOW GIT REPOSITORIES;

SELECT 'DIMENSION TABLES' AS category, '' AS table_name, NULL AS row_count
UNION ALL SELECT '', 'product_dim',  COUNT(*) FROM product_dim
UNION ALL SELECT '', 'campaign_dim', COUNT(*) FROM campaign_dim
UNION ALL SELECT '', 'channel_dim',  COUNT(*) FROM channel_dim
UNION ALL SELECT 'FACT TABLES', '', NULL
UNION ALL SELECT '', 'marketing_campaign_fact', COUNT(*) FROM marketing_campaign_fact;
    

## Step 2: Seed Queries for Semantic View Autopilot (Optional)

The Semantic View Autopilot analyzes your **query history** to understand how your data is used in practice. By running these seed queries before creating your semantic view, you provide valuable context that helps Autopilot:

- **Suggest model improvements**: Identify commonly used JOINs, aggregations, and filters
- **Generate verified queries**: Pre-populate the semantic view with known-good query patterns
- **Infer business logic**: Understand calculated metrics like cost-per-lead (CPL) and conversion rates

### What These Queries Cover:
1. **Overall marketing performance** by month
2. **Channel efficiency analysis** (cost per lead by channel)
3. **Campaign performance ranking** (top campaigns by leads)
4. **Regional analysis** and budget allocation
5. **Product conversion rates** and category analysis
6. **Trend analysis** (week-over-week growth, anomaly detection)
7. **Efficiency frontier** (best performing segments)

> **Tip**: If you want to see Autopilot's suggestions in action, run this cell before creating your semantic view in the next step. Otherwise, skip this cell.

In [None]:
-- =============================================================================
-- MARKETING SEMANTIC VIEW: Analytic Query Playbook
-- These queries seed the Autopilot with common business patterns
-- =============================================================================

-- -----------------------------------------------------------------------------
-- Query 1: Overall Marketing Performance by Month
-- -----------------------------------------------------------------------------
SELECT
    DATE_TRUNC('month', mcf.date)                                   AS month,
    SUM(mcf.spend)                                                  AS total_spend,
    SUM(mcf.impressions)                                            AS total_impressions,
    SUM(mcf.leads_generated)                                        AS total_leads,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)            AS cost_per_lead,
    SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0)      AS lead_conversion_rate
FROM MARKETING_CAMPAIGN_FACT mcf
GROUP BY 1
ORDER BY 1;

-- -----------------------------------------------------------------------------
-- Query 2: Channel Efficiency Analysis
-- -----------------------------------------------------------------------------
SELECT
    cd.channel_name,
    SUM(mcf.spend)                                        AS total_spend,
    SUM(mcf.leads_generated)                              AS total_leads,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)  AS cost_per_lead
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
GROUP BY 1
HAVING SUM(mcf.leads_generated) > 0
ORDER BY cost_per_lead ASC, total_leads DESC;

-- -----------------------------------------------------------------------------
-- Query 3: Channel Trend (Month-over-Month)
-- -----------------------------------------------------------------------------
SELECT
    DATE_TRUNC('month', mcf.date)                                   AS month,
    cd.channel_name,
    SUM(mcf.spend)                                                  AS total_spend,
    SUM(mcf.impressions)                                            AS total_impressions,
    SUM(mcf.leads_generated)                                        AS total_leads,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)            AS cpl,
    SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0)      AS conv_rate
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
GROUP BY 1, 2
ORDER BY 1, 2;

-- -----------------------------------------------------------------------------
-- Query 4: Top Campaigns by Leads
-- -----------------------------------------------------------------------------
SELECT
    c.campaign_name,
    c.objective,
    SUM(mcf.leads_generated)                                        AS total_leads,
    SUM(mcf.spend)                                                  AS total_spend,
    SUM(mcf.impressions)                                            AS total_impressions,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)            AS cpl,
    SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0)      AS conv_rate
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
GROUP BY 1, 2
ORDER BY total_leads DESC
LIMIT 20;

-- -----------------------------------------------------------------------------
-- Query 5: Objectives Performance
-- -----------------------------------------------------------------------------
SELECT
    c.objective,
    SUM(mcf.spend)                                                  AS total_spend,
    SUM(mcf.impressions)                                            AS total_impressions,
    SUM(mcf.leads_generated)                                        AS total_leads,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)            AS cpl,
    SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0)      AS conv_rate
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
GROUP BY 1
ORDER BY cpl ASC;

-- -----------------------------------------------------------------------------
-- Query 6: Performance by Region
-- -----------------------------------------------------------------------------
WITH totals AS (
    SELECT 
        SUM(spend)           AS all_spend, 
        SUM(leads_generated) AS all_leads 
    FROM MARKETING_CAMPAIGN_FACT
)
SELECT
    rd.region_name,
    SUM(mcf.spend)                                        AS spend,
    SUM(mcf.leads_generated)                              AS leads,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)  AS cpl,
    SUM(mcf.spend) / NULLIF(t.all_spend, 0)               AS spend_share,
    SUM(mcf.leads_generated) / NULLIF(t.all_leads, 0)     AS leads_share
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN REGION_DIM rd ON mcf.region_key = rd.region_key
    CROSS JOIN totals t
GROUP BY 1, t.all_spend, t.all_leads
ORDER BY leads DESC;

-- -----------------------------------------------------------------------------
-- Query 7: Products with Best Conversion
-- -----------------------------------------------------------------------------
SELECT
    pd.product_name,
    pd.category_name,
    pd.vertical,
    SUM(mcf.impressions)                                            AS impressions,
    SUM(mcf.leads_generated)                                        AS leads,
    SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0)      AS conv_rate
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN PRODUCT_DIM pd ON mcf.product_key = pd.product_key
GROUP BY 1, 2, 3
HAVING SUM(mcf.impressions) > 0
ORDER BY conv_rate DESC
LIMIT 20;

-- -----------------------------------------------------------------------------
-- Query 8: Category x Channel CPL Heatmap
-- -----------------------------------------------------------------------------
SELECT
    pd.category_name,
    cd.channel_name,
    SUM(mcf.spend)                                        AS total_spend,
    SUM(mcf.leads_generated)                              AS total_leads,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)  AS cpl
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN PRODUCT_DIM pd ON mcf.product_key = pd.product_key
    JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
GROUP BY 1, 2
ORDER BY pd.category_name, cpl ASC;

-- -----------------------------------------------------------------------------
-- Query 9: Vertical by Region Leads
-- -----------------------------------------------------------------------------
SELECT
    pd.vertical,
    rd.region_name,
    SUM(mcf.leads_generated)                              AS total_leads,
    SUM(mcf.spend)                                        AS total_spend,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)  AS cpl
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN PRODUCT_DIM pd ON mcf.product_key = pd.product_key
    JOIN REGION_DIM rd ON mcf.region_key = rd.region_key
GROUP BY 1, 2
ORDER BY total_leads DESC;

-- -----------------------------------------------------------------------------
-- Query 10: Worst CPL Segments
-- -----------------------------------------------------------------------------
SELECT
    c.campaign_name,
    cd.channel_name,
    SUM(mcf.leads_generated)                              AS leads,
    SUM(mcf.spend)                                        AS spend,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)  AS cpl
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
    JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
GROUP BY 1, 2
HAVING SUM(mcf.leads_generated) >= 50
ORDER BY cpl DESC
LIMIT 10;

-- -----------------------------------------------------------------------------
-- Query 11: Budget Allocation by Objective and Channel
-- -----------------------------------------------------------------------------
WITH obj_totals AS (
    SELECT
        c.objective,
        SUM(mcf.spend) AS objective_spend
    FROM MARKETING_CAMPAIGN_FACT mcf
        JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
    GROUP BY 1
)
SELECT
    c.objective,
    cd.channel_name,
    SUM(mcf.spend)                                          AS spend,
    SUM(mcf.spend) / NULLIF(ot.objective_spend, 0)          AS spend_share_within_objective
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
    JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
    JOIN obj_totals ot ON c.objective = ot.objective
GROUP BY 1, 2, ot.objective_spend
ORDER BY c.objective, spend_share_within_objective DESC;

-- -----------------------------------------------------------------------------
-- Query 12: Facebook vs Non-Facebook Efficiency Over Time
-- -----------------------------------------------------------------------------
SELECT
    DATE_TRUNC('month', mcf.date) AS month,
    SUM(CASE WHEN cd.channel_name = 'Facebook'  THEN mcf.spend ELSE 0 END)           AS facebook_spend,
    SUM(CASE WHEN cd.channel_name <> 'Facebook' THEN mcf.spend ELSE 0 END)           AS non_facebook_spend,
    SUM(CASE WHEN cd.channel_name = 'Facebook'  THEN mcf.leads_generated ELSE 0 END) AS facebook_leads,
    SUM(CASE WHEN cd.channel_name <> 'Facebook' THEN mcf.leads_generated ELSE 0 END) AS non_facebook_leads,
    SUM(CASE WHEN cd.channel_name = 'Facebook'  THEN mcf.spend ELSE 0 END)
        / NULLIF(SUM(CASE WHEN cd.channel_name = 'Facebook' THEN mcf.leads_generated ELSE 0 END), 0) 
                                                                                     AS facebook_cpl,
    SUM(CASE WHEN cd.channel_name <> 'Facebook' THEN mcf.spend ELSE 0 END)
        / NULLIF(SUM(CASE WHEN cd.channel_name <> 'Facebook' THEN mcf.leads_generated ELSE 0 END), 0) 
                                                                                     AS non_facebook_cpl
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
GROUP BY 1
ORDER BY 1;

-- -----------------------------------------------------------------------------
-- Query 13: Campaign Flighting (Duration and Totals)
-- -----------------------------------------------------------------------------
SELECT
    c.campaign_name,
    c.objective,
    MIN(mcf.date)                                                   AS first_active_date,
    MAX(mcf.date)                                                   AS last_active_date,
    DATEDIFF('day', MIN(mcf.date), MAX(mcf.date)) + 1               AS active_days,
    SUM(mcf.spend)                                                  AS total_spend,
    SUM(mcf.leads_generated)                                        AS total_leads,
    SUM(mcf.impressions)                                            AS total_impressions,
    SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)            AS cpl,
    SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0)      AS conv_rate
FROM MARKETING_CAMPAIGN_FACT mcf
    JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
GROUP BY 1, 2
ORDER BY last_active_date DESC, total_spend DESC;

-- -----------------------------------------------------------------------------
-- Query 14: Week-over-Week Growth
-- -----------------------------------------------------------------------------
WITH weekly AS (
    SELECT
        DATE_TRUNC('week', date) AS week,
        SUM(spend)               AS spend,
        SUM(leads_generated)     AS leads
    FROM MARKETING_CAMPAIGN_FACT
    GROUP BY 1
)
SELECT
    week,
    spend,
    leads,
    (spend - LAG(spend) OVER (ORDER BY week)) / NULLIF(LAG(spend) OVER (ORDER BY week), 0) AS spend_wow_pct,
    (leads - LAG(leads) OVER (ORDER BY week)) / NULLIF(LAG(leads) OVER (ORDER BY week), 0) AS leads_wow_pct
FROM weekly
ORDER BY week;

--- Query 15: Best â€˜scaledâ€™ segments (lowest CPL with high lead volume, region x channel)
--- SQL:

SELECT
rd.region_name,
cd.channel_name,
SUM(mcf.leads_generated) AS leads,
SUM(mcf.spend) AS spend,
SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0) AS cpl
FROM MARKETING_CAMPAIGN_FACT mcf
JOIN REGION_DIM rd
ON mcf.region_key = rd.region_key
JOIN CHANNEL_DIM cd
ON mcf.channel_key = cd.channel_key
GROUP BY 1, 2
HAVING SUM(mcf.leads_generated) >= 100
ORDER BY cpl ASC, leads DESC
LIMIT 20;

--- Query 16: Objective mix by region (what are we running where?)
--- SQL:

SELECT
rd.region_name,
c.objective,
SUM(mcf.spend) AS spend,
SUM(mcf.leads_generated) AS leads,
SUM(mcf.impressions) AS impressions,
SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0) AS cpl,
SUM(mcf.leads_generated) / NULLIF(SUM(mcf.impressions), 0) AS conv_rate
FROM MARKETING_CAMPAIGN_FACT mcf
JOIN REGION_DIM rd
ON mcf.region_key = rd.region_key
JOIN CAMPAIGN_DIM c
ON mcf.campaign_key = c.campaign_key
GROUP BY 1, 2
ORDER BY rd.region_name, spend DESC;

--- Query 17: Pareto view (cumulative leads vs cumulative spend by campaign)
--- SQL:

WITH by_campaign AS (
SELECT
c.campaign_name,
SUM(mcf.leads_generated) AS leads,
SUM(mcf.spend) AS spend
FROM MARKETING_CAMPAIGN_FACT mcf
JOIN CAMPAIGN_DIM c
ON mcf.campaign_key = c.campaign_key
GROUP BY 1
),
ranked AS (
SELECT
*,
SUM(leads) OVER () AS total_leads,
SUM(spend) OVER () AS total_spend,
SUM(leads) OVER (ORDER BY leads DESC) AS cum_leads,
SUM(spend) OVER (ORDER BY leads DESC) AS cum_spend
FROM by_campaign
)
SELECT
campaign_name,
leads,
spend,
cum_leads / NULLIF(total_leads, 0) AS cumulative_leads_share,
cum_spend / NULLIF(total_spend, 0) AS cumulative_spend_share
FROM ranked
ORDER BY leads DESC;

-- -----------------------------------------------------------------------------
-- Query 18: Anomaly Detection (High CPL Days via Z-Score)
-- -----------------------------------------------------------------------------
WITH daily AS (
    SELECT
        date,
        SUM(spend)                                        AS spend,
        SUM(leads_generated)                              AS leads,
        SUM(spend) / NULLIF(SUM(leads_generated), 0)      AS cpl
    FROM MARKETING_CAMPAIGN_FACT
    GROUP BY 1
),
stats AS (
    SELECT
        AVG(cpl)         AS avg_cpl,
        STDDEV_SAMP(cpl) AS std_cpl
    FROM daily
    WHERE cpl IS NOT NULL
),
scored AS (
    SELECT
        d.date,
        d.spend,
        d.leads,
        d.cpl,
        (d.cpl - s.avg_cpl) / NULLIF(s.std_cpl, 0) AS cpl_zscore
    FROM daily d
        CROSS JOIN stats s
    WHERE d.cpl IS NOT NULL
)
SELECT *
FROM scored
WHERE ABS(cpl_zscore) >= 2
ORDER BY ABS(cpl_zscore) DESC, date DESC;

-- -----------------------------------------------------------------------------
-- Query 19: Diminishing Returns Curve (CPL by Spend Bucket)
-- -----------------------------------------------------------------------------
WITH daily AS (
    SELECT
        date,
        SUM(spend)           AS spend,
        SUM(leads_generated) AS leads
    FROM MARKETING_CAMPAIGN_FACT
    GROUP BY 1
),
bucketed AS (
    SELECT
        *,
        WIDTH_BUCKET(spend, 0, (SELECT MAX(spend) FROM daily), 10) AS spend_bucket
    FROM daily
)
SELECT
    spend_bucket,
    MIN(spend)                         AS min_spend_in_bucket,
    MAX(spend)                         AS max_spend_in_bucket,
    SUM(spend)                         AS total_spend,
    SUM(leads)                         AS total_leads,
    SUM(spend) / NULLIF(SUM(leads), 0) AS cpl
FROM bucketed
GROUP BY 1
ORDER BY 1;

-- -----------------------------------------------------------------------------
-- Query 20: Efficiency Frontier (Top 5 Campaigns per Channel)
-- -----------------------------------------------------------------------------
WITH campaign_channel AS (
    SELECT
        cd.channel_name,
        c.campaign_name,
        SUM(mcf.spend)                                        AS spend,
        SUM(mcf.leads_generated)                              AS leads,
        SUM(mcf.spend) / NULLIF(SUM(mcf.leads_generated), 0)  AS cpl
    FROM MARKETING_CAMPAIGN_FACT mcf
        JOIN CHANNEL_DIM cd ON mcf.channel_key = cd.channel_key
        JOIN CAMPAIGN_DIM c ON mcf.campaign_key = c.campaign_key
    GROUP BY 1, 2
    HAVING SUM(mcf.leads_generated) >= 50
),
ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY channel_name
            ORDER BY cpl ASC NULLS LAST, leads DESC
        ) AS rn
    FROM campaign_channel
)
SELECT
    channel_name,
    campaign_name,
    spend,
    leads,
    cpl
FROM ranked
WHERE rn <= 5
ORDER BY channel_name, rn;

## Step 3: Create a Semantic View Using Autopilot

Now comes the exciting part: using the **Semantic View Wizard** to automatically generate a semantic view from an existing Tableau workbook. This demonstrates how Autopilot can accelerate semantic layer creation by importing business logic from your existing BI artifacts.

### Instructions:

1. **Navigate to the Semantic View Wizard**
   - Go to **AI & ML > Analyst** from the left side menu in Snowflake
   - Select `SVA_VHOL_DB.SVA_VHOL_SCHEMA` from the dropdown
   - Click **"Create New Semantic View"**
   
   > **Note**: Use the `ACCOUNTADMIN` role or a role with ownership rights on `SVA_VHOL_DB.SVA_VHOL_SCHEMA`

2. **Import from Tableau Workbook**
   - Name your semantic view: `SVA_MARKETING_SV`
   - Click **Next**
   - Select **"Tableau Files"** as your context
   - Choose stage: `SVA_VHOL_SCHEMA.INTERNAL_DATA_STAGE`
   - Navigate to: `/unstructured_docs/BI_dashboards/`
   - Select: `CampaignMetrics.twb`
   - Click **"Create and Save"**

3. **Test Your Semantic View**
   - Go to the **"Playground"** tab in the right panel
   - Ask: *"Show me the top 10 most expensive products based on cost per lead"*

ðŸŽ‰ **Congratulations!** You just went from "zero-to-semantic-view" in under 1 minute!


## Step 4: Query Semantic Views with Standard SQL

**Semantic View Standard SQL** allows you to write familiar ANSI-style SQL queries against semantic views. This is a powerful feature that enables BI tools like Tableau to connect directly to semantic views without requiring specialized connectors.

### What You'll Learn:
- Write and run a **Standard SQL query** against the `SVA_MARKETING_SV` semantic view
- Understand the syntax differences (using `AGG()` for metric aggregation)
- Generate a **Tableau Data Source (.tds) file** for seamless BI integration

### Key Syntax Rules for Standard SQL:

| Element | Standard SQL Syntax |
|---------|---------------------|
| Dimensions/Facts | Select directly: `product_name` |
| Metrics | Must use `AGG()`: `AGG(cost_per_lead)` |
| Grouping | Required when selecting metrics: `GROUP BY ALL` |
| Table Reference | Use semantic view name directly: `FROM SVA_MARKETING_SV` |

> **Why Standard SQL?** It enables existing BI tools to query semantic views without modification, making adoption seamless for organizations with established Tableau or Power BI workflows.


In [None]:
-- -----------------------------------------------------------------------------
-- SETUP: Set Context
-- -----------------------------------------------------------------------------
USE ROLE ACCOUNTADMIN;
USE DATABASE SVA_VHOL_DB;
USE SCHEMA SVA_VHOL_SCHEMA;

-- -----------------------------------------------------------------------------
-- STANDARD SQL QUERY: Query the Semantic View
-- -----------------------------------------------------------------------------
SELECT
    product_name,                        -- Dimensions/facts: select directly
    AGG(cost_per_lead) AS total_cost_per_lead  -- Metrics: must use AGG()
FROM SVA_MARKETING_SV                    -- Reference semantic view directly
GROUP BY ALL                             -- Required when selecting metrics
ORDER BY total_cost_per_lead DESC
LIMIT 10;

## Step 5: Create a TDS Generator Stored Procedure

To enable Tableau users to easily connect to semantic views, we'll create a **stored procedure** that automatically generates Tableau Data Source (.tds) files. This procedure:

### What It Does:
1. **Reads semantic view metadata** using `DESCRIBE SEMANTIC VIEW`
2. **Parses dimensions, facts, and metrics** from the metadata
3. **Generates valid TDS XML** with proper field mappings, data types, and folder organization
4. **Returns downloadable content** ready for use in Tableau

### How It Works:
- **Dimensions** â†’ Tableau dimensions (organized in folders by source table)
- **Facts** (numeric) â†’ Tableau measures with Sum aggregation
- **Metrics** â†’ Tableau measures with pre-defined aggregation (uses Minimum to preserve calculated values)

> **Note**: This is a Python stored procedure that runs entirely within Snowflake. No external dependencies required.

In [None]:
-- =============================================================================
-- TDS GENERATOR: Stored Procedure to Create Tableau Data Source Files
-- Run this entire script in Snowsight web interface
-- =============================================================================

-- -----------------------------------------------------------------------------
-- SETUP: Set Context
-- -----------------------------------------------------------------------------
USE ROLE ACCOUNTADMIN;
USE DATABASE SVA_VHOL_DB;
USE SCHEMA SVA_VHOL_SCHEMA;

-- -----------------------------------------------------------------------------
-- CREATE PROCEDURE: Generate TDS from Semantic View
-- -----------------------------------------------------------------------------
CREATE OR REPLACE PROCEDURE generate_tds_from_semantic_view(semantic_view_name STRING)
    RETURNS STRING
    LANGUAGE PYTHON
    RUNTIME_VERSION = '3.9'
    PACKAGES = ('snowflake-snowpark-python', 'pandas')
    HANDLER = 'generate_tds_procedure'
AS $$
import pandas as pd
from typing import Dict, List
import xml.etree.ElementTree as ET
import xml.dom.minidom
from dataclasses import dataclass
import uuid

@dataclass
class SemanticField:
    """Represents a field from semantic view metadata"""
    name: str
    parent_table: str
    data_type: str
    object_kind: str  # DIMENSION, FACT, METRIC
    access_modifier: str  # PUBLIC, PRIVATE
    expression: str = ""

@dataclass
class TableInfo:
    """Represents table information from semantic view"""
    name: str
    database: str
    schema: str
    base_table: str

@dataclass
class Relationship:
    """Represents a relationship between tables"""
    name: str
    ref_table: str  # Referenced table
    table: str      # Referencing table  
    ref_key: str    # Referenced key
    foreign_key: str # Foreign key

class UDFSemanticViewParser:
    """Lightweight parser for UDF use"""
    
    def __init__(self, metadata_df: pd.DataFrame):
        self.metadata_df = metadata_df
        self.tables: Dict[str, TableInfo] = {}
        self.fields: List[SemanticField] = []
        self.relationships: List[Relationship] = []
        
    def parse(self):
        """Parse the metadata and return organized data"""
        self._parse_tables()
        self._parse_fields()
        self._parse_relationships()
        return self.tables, self.fields, self.relationships
    
    def _parse_tables(self):
        """Extract table information"""
        # Handle quoted column names from Snowflake
        cols = self.metadata_df.columns
        object_kind_col = None
        object_name_col = None
        
        for col in cols:
            if col.strip('"').upper() == 'OBJECT_KIND':
                object_kind_col = col
            elif col.strip('"').upper() == 'OBJECT_NAME':
                object_name_col = col
        
        if not object_kind_col or not object_name_col:
            return  # Skip if we can't find the columns
        
        table_rows = self.metadata_df[self.metadata_df[object_kind_col] == 'TABLE']
        
        # Find property columns
        property_col = None
        property_value_col = None
        
        for col in cols:
            if col.strip('"').upper() == 'PROPERTY':
                property_col = col
            elif col.strip('"').upper() == 'PROPERTY_VALUE':
                property_value_col = col
        
        if not property_col or not property_value_col:
            return  # Skip if we can't find the columns
        
        for table_name in table_rows[object_name_col].unique():
            table_data = table_rows[table_rows[object_name_col] == table_name]
            
            table_info = TableInfo(name=table_name, database="", schema="", base_table="")
            
            for _, row in table_data.iterrows():
                if row[property_col] == 'BASE_TABLE_DATABASE_NAME':
                    table_info.database = row[property_value_col]
                elif row[property_col] == 'BASE_TABLE_SCHEMA_NAME':
                    table_info.schema = row[property_value_col]
                elif row[property_col] == 'BASE_TABLE_NAME':
                    table_info.base_table = row[property_value_col]
            
            self.tables[table_name] = table_info
    
    def _parse_fields(self):
        """Extract field information for dimensions, facts, and metrics"""
        # Handle quoted column names from Snowflake
        cols = self.metadata_df.columns
        object_kind_col = None
        object_name_col = None
        parent_entity_col = None  # parent_entity is a column, not a property!
        property_col = None
        property_value_col = None
        
        for col in cols:
            col_upper = col.strip('"').upper()
            if col_upper == 'OBJECT_KIND':
                object_kind_col = col
            elif col_upper == 'OBJECT_NAME':
                object_name_col = col
            elif col_upper == 'PARENT_ENTITY':
                parent_entity_col = col
            elif col_upper == 'PROPERTY':
                property_col = col
            elif col_upper == 'PROPERTY_VALUE':
                property_value_col = col
        
        if not all([object_kind_col, object_name_col, property_col, property_value_col]):
            return  # Skip if we can't find all required columns
        
        field_kinds = ['DIMENSION', 'FACT', 'METRIC']
        
        for kind in field_kinds:
            field_rows = self.metadata_df[self.metadata_df[object_kind_col] == kind]
            
            for field_name in field_rows[object_name_col].unique():
                field_data = field_rows[field_rows[object_name_col] == field_name]
                
                # Get parent_table from the parent_entity column (it's a column, not a property!)
                parent_table = ""
                if parent_entity_col and len(field_data) > 0:
                    parent_table = field_data.iloc[0][parent_entity_col]
                    if pd.isna(parent_table):
                        parent_table = ""
                
                field = SemanticField(
                    name=field_name,
                    parent_table=str(parent_table) if parent_table else "",
                    data_type="",
                    object_kind=kind,
                    access_modifier="PUBLIC",
                    expression=""
                )
                
                for _, row in field_data.iterrows():
                    if row[property_col] == 'DATA_TYPE':
                        field.data_type = row[property_value_col]
                    elif row[property_col] == 'ACCESS_MODIFIER':
                        field.access_modifier = row[property_value_col]
                    elif row[property_col] == 'EXPRESSION':
                        field.expression = row[property_value_col]
                
                self.fields.append(field)
    
    def _parse_relationships(self):
        """Extract relationship information from RELATIONSHIP objects"""
        # Handle quoted column names from Snowflake
        cols = self.metadata_df.columns
        object_kind_col = None
        object_name_col = None
        property_col = None
        property_value_col = None
        
        for col in cols:
            col_upper = col.strip('"').upper()
            if col_upper == 'OBJECT_KIND':
                object_kind_col = col
            elif col_upper == 'OBJECT_NAME':
                object_name_col = col
            elif col_upper == 'PROPERTY':
                property_col = col
            elif col_upper == 'PROPERTY_VALUE':
                property_value_col = col
        
        if not all([object_kind_col, object_name_col, property_col, property_value_col]):
            return  # Skip if we can't find all required columns
        
        relationship_rows = self.metadata_df[self.metadata_df[object_kind_col] == 'RELATIONSHIP']
        
        for relationship_name in relationship_rows[object_name_col].unique():
            rel_data = relationship_rows[relationship_rows[object_name_col] == relationship_name]
            
            relationship = Relationship(
                name=relationship_name,
                ref_table="",
                table="", 
                ref_key="",
                foreign_key=""
            )
            
            for _, row in rel_data.iterrows():
                if row[property_col] == 'REF_TABLE':
                    relationship.ref_table = row[property_value_col]
                elif row[property_col] == 'TABLE':
                    relationship.table = row[property_value_col]
                elif row[property_col] == 'REF_KEY':
                    relationship.ref_key = row[property_value_col]
                elif row[property_col] == 'FOREIGN_KEY':
                    relationship.foreign_key = row[property_value_col]
            
            # Only add if we have the essential information
            if relationship.ref_table and relationship.table and relationship.ref_key and relationship.foreign_key:
                self.relationships.append(relationship)
    
    def get_public_fields(self) -> List[SemanticField]:
        """Get only public fields"""
        return [f for f in self.fields if f.access_modifier == 'PUBLIC']

@dataclass
class TDSField:
    """Represents a field in the TDS file"""
    name: str
    caption: str
    data_type: str
    local_type: str
    role: str  # 'dimension' or 'measure'
    aggregation: str = 'Sum'
    semantic_role: str = None
    contains_null: bool = True
    precision: int = None
    scale: int = None
    width: int = None
    table_name: str = ""  # Parent table name for folder organization

class UDFTDSGenerator:
    """Full-featured TDS generator matching the working implementation"""
    
    def __init__(self, semantic_view_name: str):
        self.semantic_view_name = semantic_view_name
        # Extract database and schema from fully qualified name
        parts = semantic_view_name.split('.')
        if len(parts) >= 3:
            self.database = parts[0]
            self.schema = parts[1]
            self.view_name = parts[2]
        else:
            self.database = "TABLEAU"
            self.schema = "SEMANTICVIEWS"
            self.view_name = semantic_view_name
        
        # Generate IDs like Tableau does
        self.connection_id = f"snowflake.{str(uuid.uuid4()).replace('-', '')[:26]}"
        self.object_id = f"SV ({semantic_view_name})_{str(uuid.uuid4()).replace('-', '').upper()[:32]}"
    
    def generate_tds(self, tables: Dict[str, TableInfo], fields: List[SemanticField]) -> str:
        """Generate complete TDS XML matching working format"""
        
        # Process fields according to requirements
        processed_fields = self._process_fields(fields)
        
        # Create XML structure
        root = ET.Element('datasource', {
            'formatted-name': f'federated.{str(uuid.uuid4()).replace("-", "")[:26]}',
            'inline': 'true',
            'source-platform': 'mac',
            'version': '18.1',
            'xmlns:user': 'http://www.tableausoftware.com/xml/user'
        })
        
        # Add document format change manifest
        self._add_format_manifest(root)
        
        # Add connection
        connection = self._create_connection(root)
        
        # Add metadata records
        self._add_metadata_records(connection, processed_fields)
        
        # Add column definitions
        self._add_column_definitions(root, processed_fields)
        
        # Add folder organization
        self._add_folders(root, processed_fields, tables, fields)
        
        # Add layout and other elements
        self._add_layout_and_misc(root)
        
        # Convert to formatted XML string
        return self._prettify_xml(root)
    
    def _process_fields(self, fields: List[SemanticField]) -> List[TDSField]:
        """Process semantic fields into TDS fields according to requirements"""
        tds_fields = []
        
        for field in fields:
            # Skip private fields
            if field.access_modifier == 'PRIVATE':
                continue
            
            tds_field = TDSField(
                name=f"[{field.name}]",
                caption=self._format_field_caption(field.name),
                data_type=field.data_type,
                local_type=self._get_local_type(field.data_type),
                role=self._determine_role(field),
                aggregation=self._determine_aggregation(field),
                semantic_role=self._get_semantic_role(field.name),
                contains_null=True,
                precision=self._extract_precision(field.data_type),
                scale=self._extract_scale(field.data_type),
                width=self._extract_width(field.data_type),
                table_name=field.parent_table  # Track parent table for folder organization
            )
            
            tds_fields.append(tds_field)
        
        return tds_fields
    
    def _format_field_caption(self, field_name: str) -> str:
        """Format field name into proper caption (CamelCase with spaces)"""
        # Handle special prefixes like F_
        if field_name.startswith('F_'):
            field_name = field_name[2:]  # Remove F_ prefix
        
        # Split by underscores and capitalize each word
        words = field_name.lower().split('_')
        formatted_words = []
        
        for word in words:
            if word.upper() in ['LTV', 'ID', 'SK', 'URL', 'API', 'SQL', 'TAX']:
                formatted_words.append(word.upper())
            elif word == '30day':
                formatted_words.append('30Day')
            else:
                formatted_words.append(word.capitalize())
        
        return ' '.join(formatted_words)
    
    def _add_format_manifest(self, root: ET.Element):
        """Add document format change manifest"""
        manifest = ET.SubElement(root, 'document-format-change-manifest')
        ET.SubElement(manifest, '_.fcp.ObjectModelEncapsulateLegacy.true...ObjectModelEncapsulateLegacy')
        ET.SubElement(manifest, '_.fcp.ObjectModelTableType.true...ObjectModelTableType')
        ET.SubElement(manifest, '_.fcp.SchemaViewerObjectModel.true...SchemaViewerObjectModel')
    
    def _create_connection(self, root: ET.Element) -> ET.Element:
        """Create connection element"""
        connection = ET.SubElement(root, 'connection', {'class': 'federated'})
        
        # Named connections
        named_connections = ET.SubElement(connection, 'named-connections')
        named_conn = ET.SubElement(named_connections, 'named-connection', {
            'caption': 'pm.snowflakecomputing.com',
            'name': self.connection_id
        })
        
        # Connection details
        conn_details = ET.SubElement(named_conn, 'connection', {
            'authentication': 'Username Password',
            'class': 'snowflake',
            'dbname': self.database,  # Use semantic view database
            'odbc-connect-string-extras': '',
            'one-time-sql': '',
            'schema': self.schema,  # Use semantic view schema
            'server': 'pm.snowflakecomputing.com',
            'service': 'SYSADMIN',
            'username': 'current_user',
            'warehouse': 'COMPUTE_WH'
        })
        
        # Connection customization
        customization = ET.SubElement(conn_details, 'connection-customization', {
            'class': 'snowflake',
            'enabled': 'true',
            'version': '18.1'
        })
        ET.SubElement(customization, 'vendor', {'name': 'snowflake'})
        ET.SubElement(customization, 'driver', {'name': 'snowflake'})
        
        customizations = ET.SubElement(customization, 'customizations')
        custom_opts = [
            ('CAP_ODBC_METADATA_SUPPRESS_EXECUTED_QUERY', 'yes'),
            ('CAP_ODBC_METADATA_SUPPRESS_PREPARED_QUERY', 'yes'),
            ('CAP_ODBC_METADATA_SUPPRESS_SELECT_STAR', 'yes'),
            ('CAP_ODBC_METADATA_SUPPRESS_SQLCOLUMNS_API', 'no'),
            ('CAP_ESCAPE_UNDERSCORE_IN_NAMES', 'no'),
            ('CAP_DISABLE_ESCAPE_UNDERSCORE_IN_CATALOG', 'yes')
        ]
        
        for name, value in custom_opts:
            ET.SubElement(customizations, 'customization', {'name': name, 'value': value})
        
        # Relation
        ET.SubElement(connection, '_.fcp.ObjectModelEncapsulateLegacy.false...relation', {
            'connection': self.connection_id,
            'name': 'SV',
            'table': f'[{self.database}].[{self.schema}].[{self.view_name}]',
            'type': 'table'
        })
        
        ET.SubElement(connection, '_.fcp.ObjectModelEncapsulateLegacy.true...relation', {
            'connection': self.connection_id,
            'name': 'SV',
            'table': f'[{self.database}].[{self.schema}].[{self.view_name}]',
            'type': 'table'
        })
        
        return connection
    
    def _add_metadata_records(self, connection: ET.Element, fields: List[TDSField]):
        """Add metadata records for each field"""
        metadata_records = ET.SubElement(connection, 'metadata-records')
        
        ordinal = 1
        for field in fields:
            record = ET.SubElement(metadata_records, 'metadata-record', {'class': 'column'})
            
            # Clean field name for remote-name
            clean_name = field.name.strip('[]')
            ET.SubElement(record, 'remote-name').text = clean_name
            ET.SubElement(record, 'remote-type').text = self._get_remote_type(field.data_type)
            ET.SubElement(record, 'local-name').text = field.name
            ET.SubElement(record, 'parent-name').text = '[SV]'
            ET.SubElement(record, 'remote-alias').text = clean_name
            ET.SubElement(record, 'ordinal').text = str(ordinal)
            ET.SubElement(record, 'local-type').text = field.local_type
            ET.SubElement(record, 'aggregation').text = field.aggregation
            
            if field.precision is not None:
                ET.SubElement(record, 'precision').text = str(field.precision)
            if field.scale is not None:
                ET.SubElement(record, 'scale').text = str(field.scale)
            if field.width is not None:
                ET.SubElement(record, 'width').text = str(field.width)
            
            ET.SubElement(record, 'contains-null').text = str(field.contains_null).lower()
            
            if field.local_type == 'string':
                ET.SubElement(record, 'collation', {'flag': '0', 'name': 'binary'})
            
            # Attributes
            attributes = ET.SubElement(record, 'attributes')
            debug_remote = self._get_debug_remote_type(field.data_type)
            debug_wire = self._get_debug_wire_type(field.data_type)
            
            ET.SubElement(attributes, 'attribute', {
                'datatype': 'string',
                'name': 'DebugRemoteType'
            }).text = f'"{debug_remote}"'
            
            ET.SubElement(attributes, 'attribute', {
                'datatype': 'string',
                'name': 'DebugWireType'
            }).text = f'"{debug_wire}"'
            
            if field.local_type == 'string':
                ET.SubElement(attributes, 'attribute', {
                    'datatype': 'string',
                    'name': 'TypeIsVarchar'
                }).text = '"true"'
            
            ET.SubElement(record, '_.fcp.ObjectModelEncapsulateLegacy.true...object-id').text = f'[{self.object_id}]'
            
            ordinal += 1
    
    def _add_column_definitions(self, root: ET.Element, fields: List[TDSField]):
        """Add column definitions"""
        # Add aliases
        ET.SubElement(root, 'aliases', {'enabled': 'yes'})
        
        # Add column definitions
        # Note: We don't add 'table' attribute here because in single semantic view mode,
        # the underlying tables (CAMPAIGN_PERFORMANCE_METRICS, etc.) are not defined as
        # separate relations in the TDS. Folders are used for organization instead.
        for field in fields:
            attrs = {
                'caption': field.caption,
                'datatype': field.local_type,
                'name': field.name,
                'role': field.role,
                'type': 'quantitative' if field.role == 'measure' else ('ordinal' if field.local_type == 'date' else 'nominal')
            }
            
            if field.role == 'measure' and field.aggregation != 'Sum':
                attrs['aggregation'] = field.aggregation
            
            if field.semantic_role:
                attrs['semantic-role'] = field.semantic_role
            
            ET.SubElement(root, 'column', attrs)
    
    def _add_layout_and_misc(self, root: ET.Element):
        """Add layout and miscellaneous elements"""
        # Layout
        ET.SubElement(root, 'layout', {
            '_.fcp.SchemaViewerObjectModel.false...dim-percentage': '0.5',
            '_.fcp.SchemaViewerObjectModel.false...measure-percentage': '0.4',
            'dim-ordering': 'alphabetic',
            'measure-ordering': 'alphabetic',
            'show-structure': 'false'
        })
        
        # Object graph
        object_graph = ET.SubElement(root, '_.fcp.ObjectModelEncapsulateLegacy.true...object-graph')
        objects = ET.SubElement(object_graph, 'objects')
        
        obj = ET.SubElement(objects, 'object', {
            'caption': 'SV',
            'id': self.object_id
        })
        
        properties = ET.SubElement(obj, 'properties', {'context': ''})
        ET.SubElement(properties, 'relation', {
            'connection': self.connection_id,
            'name': 'SV',
            'table': f'[{self.database}].[{self.schema}].[{self.view_name}]',
            'type': 'table'
        })
    
    def _prettify_xml(self, root: ET.Element) -> str:
        """Convert XML to prettified string"""
        rough_string = ET.tostring(root, encoding='unicode')
        reparsed = xml.dom.minidom.parseString(rough_string)
        
        # Add XML declaration
        xml_str = reparsed.toprettyxml(indent='  ', encoding=None)
        
        # Clean up extra newlines
        lines = [line for line in xml_str.split('\n') if line.strip()]
        
        # Add build comment
        lines.insert(1, '')
        lines.insert(2, '<!-- build 20233.25.0610.1449                               -->')
        
        return '\n'.join(lines)
    
    def _determine_role(self, field: SemanticField) -> str:
        """Determine if field should be dimension or measure based on requirements"""
        # All METRICS are measures
        if field.object_kind == 'METRIC':
            return 'measure'
        
        # All numeric FACTS are measures
        if field.object_kind == 'FACT' and self._is_numeric_type(field.data_type):
            return 'measure'
        
        # Non-numeric FACTS are dimensions (grouped in folders with dimensions)
        if field.object_kind == 'FACT' and not self._is_numeric_type(field.data_type):
            return 'dimension'
        
        # ALL DIMENSIONS are dimensions, regardless of data type
        if field.object_kind == 'DIMENSION':
            return 'dimension'
        
        return 'dimension'  # Default
    
    def _determine_aggregation(self, field: SemanticField) -> str:
        """Determine aggregation based on requirements"""
        role = self._determine_role(field)
        
        if role == 'dimension':
            if self._is_numeric_type(field.data_type):
                return 'Sum'
            elif 'DATE' in field.data_type.upper():
                return 'Year'
            else:
                return 'Count'
        
        # For measures
        if field.object_kind == 'METRIC':
            return 'Min'  # All metrics use Minimum aggregation
        else:
            return 'Sum'  # Facts default to Sum
    
    def _is_numeric_type(self, data_type: str) -> bool:
        """Check if data type is numeric"""
        numeric_indicators = ['NUMBER', 'DECIMAL', 'INTEGER', 'FLOAT', 'REAL', 'DOUBLE']
        return any(indicator in data_type.upper() for indicator in numeric_indicators)
    
    def _get_local_type(self, data_type: str) -> str:
        """Convert Snowflake data type to Tableau local type"""
        data_type_upper = data_type.upper()
        
        if 'VARCHAR' in data_type_upper or 'STRING' in data_type_upper or 'TEXT' in data_type_upper:
            return 'string'
        elif 'DATE' in data_type_upper:
            return 'date'
        elif 'NUMBER' in data_type_upper or 'DECIMAL' in data_type_upper:
            # Check if it has decimal places
            if ',' in data_type and not data_type.endswith(',0)'):
                return 'real'
            else:
                return 'integer'
        elif 'FLOAT' in data_type_upper or 'REAL' in data_type_upper or 'DOUBLE' in data_type_upper:
            return 'real'
        else:
            return 'string'  # Default
    
    def _get_semantic_role(self, field_name: str) -> str:
        """Get semantic role for geographic fields"""
        if 'COUNTRY' in field_name.upper():
            return '[Country].[Name]'
        return None
    
    def _extract_precision(self, data_type: str) -> int:
        """Extract precision from data type like NUMBER(38,0)"""
        if '(' in data_type and ')' in data_type:
            parts = data_type.split('(')[1].split(')')[0].split(',')
            if parts:
                try:
                    return int(parts[0])
                except ValueError:
                    pass
        return None
    
    def _extract_scale(self, data_type: str) -> int:
        """Extract scale from data type like NUMBER(38,0)"""
        if '(' in data_type and ')' in data_type and ',' in data_type:
            parts = data_type.split('(')[1].split(')')[0].split(',')
            if len(parts) > 1:
                try:
                    return int(parts[1])
                except ValueError:
                    pass
        return None
    
    def _extract_width(self, data_type: str) -> int:
        """Extract width from data type like VARCHAR(50)"""
        if 'VARCHAR' in data_type.upper() and '(' in data_type:
            parts = data_type.split('(')[1].split(')')[0]
            try:
                return int(parts)
            except ValueError:
                pass
        return None
    
    def _get_remote_type(self, data_type: str) -> str:
        """Map data type to remote type code"""
        data_type_upper = data_type.upper()
        
        if 'VARCHAR' in data_type_upper or 'STRING' in data_type_upper:
            return '129'  # SQL_VARCHAR
        elif 'DATE' in data_type_upper:
            return '7'    # SQL_TYPE_DATE
        elif 'NUMBER' in data_type_upper or 'DECIMAL' in data_type_upper:
            return '131'  # SQL_DECIMAL
        else:
            return '129'  # Default to VARCHAR
    
    def _get_debug_remote_type(self, data_type: str) -> str:
        """Get debug remote type string"""
        data_type_upper = data_type.upper()
        
        if 'VARCHAR' in data_type_upper:
            return 'SQL_VARCHAR'
        elif 'DATE' in data_type_upper:
            return 'SQL_TYPE_DATE'
        elif 'NUMBER' in data_type_upper or 'DECIMAL' in data_type_upper:
            return 'SQL_DECIMAL'
        else:
            return 'SQL_VARCHAR'
    
    def _get_debug_wire_type(self, data_type: str) -> str:
        """Get debug wire type string"""
        data_type_upper = data_type.upper()
        
        if 'VARCHAR' in data_type_upper:
            return 'SQL_C_CHAR'
        elif 'DATE' in data_type_upper:
            return 'SQL_C_TYPE_DATE'
        elif 'NUMBER' in data_type_upper or 'DECIMAL' in data_type_upper:
            return 'SQL_C_NUMERIC'
        else:
            return 'SQL_C_CHAR'
    
    def _add_folders(self, root: ET.Element, fields: List[TDSField], tables: Dict[str, TableInfo], semantic_fields: List[SemanticField]):
        """Add folder organization"""
        field_groups = {}
        
        # Group fields by table - ONLY DIMENSIONS get folders
        dimension_count = 0
        for field in fields:
            if field.role == 'dimension':  # Only group dimensions in folders
                dimension_count += 1
                
                # Use table_name from TDSField, fall back to pattern matching if not set
                table_name = field.table_name
                if not table_name:
                    field_name = field.name.strip('[]')
                    table_name = self._determine_table_from_field_name(field_name)
                
                # Don't default to generic "Dimensions" - use pattern-based names
                if not table_name or table_name == 'Other':
                    table_name = 'Other'  # Keep as Other instead of Dimensions
                
                if table_name not in field_groups:
                    field_groups[table_name] = {'dimensions': []}
                
                field_groups[table_name]['dimensions'].append(field)
        
        # Only create folders if we have dimensions
        if dimension_count > 0:
            # Legacy folder format - ONLY dimensions
            for table_name, groups in field_groups.items():
                if groups['dimensions']:
                    folder = ET.SubElement(root, '_.fcp.SchemaViewerObjectModel.false...folder', {
                        'name': table_name.title(),
                        'role': 'dimensions'
                    })
                    for field in groups['dimensions']:
                        ET.SubElement(folder, 'folder-item', {
                            'name': field.name,
                            'type': 'field'
                        })
            
            # New folder format - ONLY dimensions
            folders_common = ET.SubElement(root, '_.fcp.SchemaViewerObjectModel.true...folders-common')
            
            for table_name, groups in field_groups.items():
                if groups['dimensions']:
                    folder = ET.SubElement(folders_common, 'folder', {
                        'name': table_name.title()
                    })
                    
                    # Add only dimensions to folders
                    for field in sorted(groups['dimensions'], key=lambda f: f.name):
                        ET.SubElement(folder, 'folder-item', {
                            'name': field.name,
                            'type': 'field'
                        })
    
    def _determine_table_from_field_name(self, field_name: str) -> str:
        """Determine table name from field name patterns based on TPCDS schema"""
        field_upper = field_name.upper()
        
        # Customer dimension fields (from customer table)
        if field_upper in ['AGE', 'AGE_BUCKET', 'BIRTHYEAR', 'COUNTRY', 'VALUE_BUCKET'] or 'C_CUSTOMER' in field_upper:
            return 'Customer'
        
        # Date dimension fields (from date_dim table)
        elif field_upper in ['DATE', 'MONTH', 'WEEK', 'YEAR'] or 'D_DATE' in field_upper:
            return 'Date'
        
        # Customer demographics fields (from customer_demographics table)
        elif field_upper in ['CREDIT_RATING', 'MARITAL_STATUS'] or 'CD_DEMO' in field_upper:
            return 'Demographics'
        
        # Item dimension fields (from item table)
        elif field_upper in ['BRAND', 'CATEGORY', 'CLASS'] or 'I_ITEM' in field_upper:
            return 'Item'
        
        # Store dimension fields (from store table)
        elif field_upper in ['MARKET', 'SQUAREFOOTAGE', 'STATE', 'STORECOUNTRY'] or 'S_STORE' in field_upper:
            return 'Store'
        
        # Store sales foreign key fields (from store_sales table)
        elif field_upper.startswith('SS_') and field_upper.endswith('_SK'):
            return 'Store Sales'
        
        # Default grouping for unmatched fields
        else:
            return 'Other'

def generate_tds_procedure(session, semantic_view_name: str) -> str:
    """
    Stored procedure to generate TDS from semantic view
    
    Args:
        session: Snowflake session object (automatically provided)
        semantic_view_name: Fully qualified semantic view name
    
    Returns:
        str: TDS XML content
    """
    try:
        # Execute DESCRIBE SEMANTIC VIEW
        describe_sql = f"DESCRIBE SEMANTIC VIEW {semantic_view_name}"
        metadata_df = session.sql(describe_sql).to_pandas()
        
        if metadata_df.empty:
            return f"<!-- Error: No metadata found for semantic view {semantic_view_name} -->"
        
        # Debug: Check what columns we actually have
        available_columns = list(metadata_df.columns)
        if len(available_columns) == 0:
            return f"<!-- Error: DataFrame has no columns -->"
        
        # Check if we have the required columns (handling quoted names)
        required_cols = ['OBJECT_KIND', 'OBJECT_NAME', 'PROPERTY', 'PROPERTY_VALUE']
        found_required = []
        
        for req_col in required_cols:
            found = False
            for avail_col in available_columns:
                if avail_col.strip('"').upper() == req_col:
                    found_required.append(avail_col)
                    found = True
                    break
            if not found:
                return f"<!-- Error: Required column '{req_col}' not found. Available columns: {', '.join(available_columns)} -->"
        
        # Parse metadata
        parser = UDFSemanticViewParser(metadata_df)
        tables, fields, relationships = parser.parse()
        
        # Get only public fields
        public_fields = parser.get_public_fields()
        
        if not public_fields:
            return f"<!-- Error: No public fields found in semantic view {semantic_view_name} -->"
        
        # Generate TDS
        generator = UDFTDSGenerator(semantic_view_name)
        tds_content = generator.generate_tds(tables, public_fields)
        
        return tds_content
        
    except Exception as e:
        return f"<!-- Error generating TDS: {str(e)} -->"
$$;


## Step 6: Generate and Download the TDS File

This Streamlit app provides an interactive interface to generate and download TDS files for any semantic view in your account.

### How to Use:
1. **Run the cell below** to launch the Streamlit interface
2. **Enter the semantic view name** (default: `SVA_VHOL_DB.SVA_VHOL_SCHEMA.SVA_MARKETING_SV`)
3. **Click "Generate TDS"** to create the file
4. **Download the .tds file** and open it in Tableau Desktop

### Features:
- **Single TDS Mode**: Generate one TDS file at a time
- **Batch TDS Mode**: Generate multiple TDS files and download as a ZIP

### What Happens in Tableau:
When you open the .tds file in Tableau:
- Dimensions appear organized in folders by source table
- Metrics are ready to use with proper aggregations
- You can immediately start building visualizations using the semantic view

> **Tip**: Make sure Tableau Desktop is configured with your Snowflake credentials before opening the .tds file.

In [None]:
# Clean TDS Generator for Snowflake Notebook
# Simple, minimal interface for developers

import streamlit as st
import zipfile
import io
from snowflake.snowpark.context import get_active_session

def generate_and_download_tds():
    """Clean interface for TDS generation and download"""
    
    st.title("TDS Generator")
    st.write("Generate Tableau Data Source files from Snowflake Semantic Views")
    
    # Input section
    semantic_view = st.text_input(
        "Semantic View Name",
        value="SVA_VHOL_DB.SVA_VHOL_SCHEMA.SVA_MARKETING_SV",
        help="Enter the fully qualified semantic view name"
    )
    
    if st.button("Generate TDS", type="primary"):
        if not semantic_view:
            st.error("Please enter a semantic view name")
            return
            
        try:
            # Get Snowflake session
            session = get_active_session()
            
            # Call the stored procedure
            with st.spinner("Generating TDS file..."):
                result = session.sql(f"CALL generate_tds_from_semantic_view('{semantic_view}')").collect()
                tds_content = result[0][0]
            
            # Check for errors
            if tds_content.startswith("<!-- Error"):
                st.error(f"Generation failed: {tds_content}")
                return
            
            # Success - show download
            st.success("TDS file generated successfully")
            
            # Create filename
            view_name = semantic_view.split('.')[-1] if '.' in semantic_view else semantic_view
            filename = f"{view_name}_Semantic_View.tds"
            
            # Download button
            st.download_button(
                label="Download TDS",
                data=tds_content,
                file_name=filename,
                mime="application/xml"
            )
            
            # Show file info
            st.info(f"File: {filename} ({len(tds_content):,} bytes)")
            
        except Exception as e:
            st.error(f"Error: {str(e)}")

def batch_generate_tds():
    """Generate multiple TDS files at once"""
    
    st.title("Batch TDS Generator")
    
    # Text area for multiple semantic views
    semantic_views = st.text_area(
        "Semantic View Names (one per line)",
        placeholder="DATABASE.SCHEMA.VIEW1\nDATABASE.SCHEMA.VIEW2\nDATABASE.SCHEMA.VIEW3",
        height=150
    )
    
    if st.button("Generate All TDS Files", type="primary"):
        if not semantic_views.strip():
            st.error("Please enter at least one semantic view name")
            return
            
        view_list = [v.strip() for v in semantic_views.split('\n') if v.strip()]
        
        try:
            session = get_active_session()
            zip_buffer = io.BytesIO()
            
            with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
                progress_bar = st.progress(0)
                
                for i, semantic_view in enumerate(view_list):
                    st.write(f"Processing: {semantic_view}")
                    
                    # Generate TDS
                    result = session.sql(f"CALL generate_tds_from_semantic_view('{semantic_view}')").collect()
                    tds_content = result[0][0]
                    
                    if not tds_content.startswith("<!-- Error"):
                        # Add to zip
                        view_name = semantic_view.split('.')[-1] if '.' in semantic_view else semantic_view
                        filename = f"{view_name}_Semantic_View.tds"
                        zip_file.writestr(filename, tds_content)
                        st.success(f"âœ“ {filename}")
                    else:
                        st.error(f"âœ— Failed: {semantic_view}")
                    
                    progress_bar.progress((i + 1) / len(view_list))
            
            # Download zip file
            zip_buffer.seek(0)
            st.download_button(
                label="Download All TDS Files (ZIP)",
                data=zip_buffer.getvalue(),
                file_name="semantic_view_tds_files.zip",
                mime="application/zip"
            )
            
        except Exception as e:
            st.error(f"Error: {str(e)}")

# Main interface
def main():
    """Main application"""
    
    # Sidebar for mode selection
    mode = st.sidebar.radio(
        "Mode",
        ["Single TDS", "Batch TDS"]
    )
    
    if mode == "Single TDS":
        generate_and_download_tds()
    else:
        batch_generate_tds()

if __name__ == "__main__":
    main()


## Conclusion and Resources

### What You Learned
In this hands-on lab, you accomplished the following:
- âœ… Created a marketing analytics data foundation with dimension and fact tables
- âœ… Used **Semantic View Autopilot** to generate a semantic view from a Tableau workbook
- âœ… Queried the semantic view using **Standard SQL** syntax
- âœ… Created a stored procedure to generate **Tableau Data Source (.tds) files**
- âœ… Built an interactive Streamlit app for TDS generation and download

### Next Steps
- Explore adding more verified queries to improve Autopilot suggestions
- Try creating semantic views from other BI artifacts (Power BI, Looker)
- Build dashboards in Tableau using your generated .tds files
- Integrate semantic views with Cortex Analyst for natural language querying

### Documentation
- [Semantic Views SQL Documentation](https://docs.snowflake.com/en/user-guide/views-semantic/sql)
- [Semantic View Autopilot](https://docs.snowflake.com/en/user-guide/views-semantic/autopilot)
- [Standard SQL for Semantic Views](https://docs.snowflake.com/en/user-guide/views-semantic/standard-sql)

### Related Tutorials
- [Getting Started with Snowflake Semantic Views](https://quickstarts.snowflake.com/guide/getting-started-with-snowflake-semantic-views)
- [Build Business-Ready Queries with Semantic Views](https://quickstarts.snowflake.com/guide/build-business-ready-queries-with-snowflake-semantic-views)
- [Snowflake Semantic View and Agentic Analytics](https://github.com/Snowflake-Labs/snowflake-demo-notebooks/tree/main/Snowflake_Semantic_View_and_Agentic_Analytics)

### GitHub Repository
- [Snowflake AI Demo (Data Source)](https://github.com/NickAkincilar/Snowflake_AI_DEMO)