# Build Business-Ready Queries with Snowflake Semantic Views

## Setup

### About the TPC-DS Dataset

The **TPC-DS (Transaction Processing Performance Council - Decision Support)** benchmark is the industry-standard dataset for modeling complex decision support systems. It simulates a global retail empire with multiple sales channels including:

- **Store sales**: Traditional brick-and-mortar retail transactions
- **Web sales**: E-commerce transactions
- **Catalog sales**: Mail-order catalog purchases
- **Store returns**: Product returns and exchanges

The dataset includes:
- **Dimension tables**: Store, Item, Customer, Date, Warehouse, Ship Mode, and more
- **Fact tables**: Store Sales, Web Sales, Catalog Sales, Inventory, and Returns
- **Scale Factor**: We're using the SF10TCL scale (10TB scale factor) from Snowflake's sample data

### What We're Building

In this notebook, we'll create a **Semantic View** that acts as a business-friendly abstraction layer over the complex TPC-DS schema. This semantic layer allows us to:

1. Define relationships between tables once
2. Create reusable metrics (aggregations, calculations)
3. Define business-friendly dimensions
4. Write simpler, more intuitive queries

### Traditional SQL vs Semantic SQL

We'll compare two approaches to querying the data:

- **Traditional SQL**: Requires explicit JOINs, table references, and aggregation logic
- **Semantic SQL**: Uses the semantic view to abstract complexity, making queries shorter and more business-focused

The queries are organized by complexity:
- **Low Question / Low Schema**: Simple filters and selections on single or few tables
- **High Question / Low Schema**: Aggregations and metrics on simple table relationships
- **Low Question / High Schema**: Simple filters across many joined tables
- **High Question / High Schema**: Complex aggregations across many joined tables

Let's proceed to creating the Semantic View for the TPC-DS dataset.

### Create Semantic Views

In [None]:
USE DATABASE SNOWFLAKE_LEARNING_DB;
USE SCHEMA PUBLIC;

CREATE OR REPLACE SEMANTIC VIEW tpcds_nlq_view
  TABLES (
       store as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.store PRIMARY KEY (s_store_sk),
       store_sales as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.store_sales PRIMARY KEY (ss_item_sk, ss_ticket_number),
       web_sales as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.web_sales PRIMARY KEY (ws_item_sk, ws_order_number),
       catalog_sales as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.catalog_sales PRIMARY KEY (cs_item_sk, cs_order_number),
       store_returns as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.store_returns PRIMARY KEY (sr_item_sk, sr_ticket_number),
       item as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.item PRIMARY KEY (i_item_sk),
       returned_item as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.item PRIMARY KEY (i_item_sk) comment ='Dimension for returned items',
       customer as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.customer PRIMARY KEY (c_customer_sk),
       customer_address as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.customer_address PRIMARY KEY (ca_address_sk),
       current_customer_demographics AS SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.customer_demographics PRIMARY KEY (cd_demo_sk) comment ='Dimension for Current customer demographics',
       customer_demographics AS SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.customer_demographics PRIMARY KEY (cd_demo_sk) comment ='Dimension for Customer demographics at the time of sale',
       date_dim as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.date_dim PRIMARY KEY (d_date_sk),
       hd as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.household_demographics PRIMARY KEY (hd_demo_sk),
       income_band as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.income_band PRIMARY KEY (ib_income_band_sk),
       web_site as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.web_site PRIMARY KEY (web_site_sk),
       inventory as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.inventory PRIMARY KEY (inv_date_sk, inv_item_sk, inv_warehouse_sk),
       ship_mode as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.ship_mode PRIMARY KEY (sm_ship_mode_sk),
       warehouse as SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.warehouse PRIMARY KEY (w_warehouse_sk)
  )
  RELATIONSHIPS (
       sales_to_store as store_sales (ss_store_sk) REFERENCES store,
       sales_to_customer as store_sales (ss_customer_sk) REFERENCES customer,
       sales_to_date as store_sales (ss_sold_date_sk) REFERENCES date_dim,
       sales_to_customer_demo as store_sales (ss_cdemo_sk) REFERENCES customer_demographics,
       sales_to_item as store_sales (ss_item_sk) REFERENCES item,
       web_sales_to_bill_customer as web_sales (ws_bill_customer_sk) REFERENCES customer,
       web_sales_to_sold_date as web_sales (ws_sold_date_sk) REFERENCES date_dim ,
       web_sales_to_bill_customer_demo as web_sales (ws_bill_cdemo_sk) REFERENCES customer_demographics,
       web_sales_to_item as web_sales (ws_item_sk) REFERENCES item (i_item_sk),
       web_sales_to_web_site as web_sales (ws_web_site_sk) REFERENCES web_site,
       catalog_sales_to_bill_customer as catalog_sales (cs_bill_customer_sk) REFERENCES customer,
       catalog_sales_to_sold_date as catalog_sales (cs_sold_date_sk) REFERENCES date_dim,
       catalog_sales_to_bill_customer_demo as catalog_sales (cs_bill_cdemo_sk) REFERENCES customer_demographics,
       catalog_sales_to_item as catalog_sales (cs_item_sk) REFERENCES item,
       sales_returns_to_item as store_returns (sr_item_sk) REFERENCES returned_item (i_item_sk),
       sales_returns_to_sales as store_returns (sr_ticket_number, sr_item_sk, sr_customer_sk) references store_sales (ss_ticket_number, ss_item_sk, ss_customer_sk),
       customer_to_customer_address as customer (c_current_addr_sk) REFERENCES customer_address (ca_address_sk),
       customer_to_household_demo as customer (c_current_hdemo_sk) REFERENCES hd,
       customer_to_customer_demo as customer (c_current_cdemo_sk) REFERENCES current_customer_demographics (cd_demo_sk),
       household_demo_to_income_band as hd (hd_income_band_sk) REFERENCES income_band,
       inventory_to_item as inventory (inv_item_sk) REFERENCES item,
       inventory_to_date as inventory (inv_date_sk) REFERENCES date_dim,
       catalog_sales_to_ship_mode as catalog_sales (cs_ship_mode_sk) REFERENCES ship_mode,
       web_sales_to_ship_mode as web_sales (ws_ship_mode_sk) REFERENCES ship_mode
  )
  FACTS (
         -- Q12
         store_sales.f_ss_item_sk as ss_item_sk
         comment='Item SKU (Stock Keeping Unit) for each sale'
          -- -- Q22, Q32
         ,store_sales.f_net_profit_tier AS CASE
            WHEN store_sales.ss_net_profit > 25000 THEN 'More than 25000'
            WHEN store_sales.ss_net_profit BETWEEN 3000 AND 25000 THEN '3000-25000'
            WHEN store_sales.ss_net_profit BETWEEN 2000 AND 3000 THEN '2000-3000'
            WHEN store_sales.ss_net_profit BETWEEN 300 AND 2000 THEN '300-2000'
            WHEN store_sales.ss_net_profit BETWEEN 250 AND 300 THEN '250-300'
            WHEN store_sales.ss_net_profit BETWEEN 200 AND 250 THEN '200-250'
            WHEN store_sales.ss_net_profit BETWEEN 150 AND 200 THEN '150-200'
            WHEN store_sales.ss_net_profit BETWEEN 100 AND 150 THEN '100-150'
            WHEN store_sales.ss_net_profit BETWEEN 50 AND 100 THEN ' 50-100'
            WHEN store_sales.ss_net_profit BETWEEN 0 AND 50 THEN '  0- 50'
            ELSE ' 50 or Less'
          END
          comment='Tier labels for net profit from store sales'
         ,date_dim.f_year AS date_dim.d_year
          comment='Year of Date'
         ,store_returns.f_ss_has_sales as iff(store_sales.f_ss_item_sk is not null, true, false)
          comment='Boolean indicating whether valid store sales item was returned'
  )
  DIMENSIONS (
         store.s_store_sk AS store.s_store_sk
         comment='Store SKU (Stock Keeping Unit)'
        ,store.s_city AS store.s_city
         comment='City where the store is located'
        ,date_dim.d_year AS date_dim.d_year
         comment='Year of Date'
        ,date_dim.d_date AS date_dim.d_date
         comment='Date of the day'
        -- Q2, Q25
        ,customer.c_first_name AS  customer.c_first_name
         comment='First name of the customer'
        -- Q2
        ,customer.c_last_name AS customer.c_last_name
         comment='Last name of the customer'
        ,current_customer_demographics.cd_dep_count AS current_customer_demographics.cd_dep_count
         comment='Current number of dependents for the customer'
         ,customer_demographics.cd_dep_count AS customer_demographics.cd_dep_count
         comment='Number of dependents for the customer at the time of sale'
        --  Q3, Q14, Q24, Q40
        ,store.s_store_name AS store.s_store_name
         comment='the names of stores, likely a list of store names in a retail or commercial setting'
        ,store_sales.ss_sale_year AS date_dim.d_year
         comment='Year of store sale'
       -- Q3, Q33
        ,store.s_manager AS store.s_manager
         comment='Store manager'
       -- Q3, -- Q36
        ,store.s_floor_space AS store.s_floor_space
         comment='Total floor space in square feet'
       -- -- Q14
        ,store.s_store_id AS store.s_store_id
         comment='Unique identifier for each store'
        --  -- Q4, -- Q39
        ,item.i_brand AS item.i_brand
         comment='Brands of export items'
        -- -- Q4
        ,item.i_product_name AS item.i_product_name
         comment='Product names'
        -- -- Q5
        ,item.i_manufact as item.i_manufact
         comment='Manufacturing items, including antibarable, n stbarpri, and barationese'
       -- Q11
        ,customer_address.ca_state AS ca_state
         comment='State where the customer address is located'
       -- Q12
        , item.i_item_id as item.i_item_id
          comment='Unique item identifiers'
       -- Q12, -- Q39
        , item.i_item_sk AS i_item_sk
          comment='Item identifier SKU (Stock Keeping Unit)'
       -- -- Q20
       , customer.c_state as customer_address.ca_state
          comment='Customer state abbreviation'
       -- Q21, -- Q25
        ,hd.hd_vehicle_count AS  hd.hd_vehicle_count
         comment='Number of vehicles owned by the household'
        ,customer.c_customer_id AS customer.c_customer_id
         comment='Customer identifier'
        ,income_band.ib_income_band_sk AS income_band.ib_income_band_sk
         comment='Income Band Identifier'
       -- Q22, Q23
        , customer_demographics.gender as customer_demographics.cd_gender
          comment = 'Gender of the customer at the time of sale'
        , current_customer_demographics.gender as current_customer_demographics.cd_gender
          comment = 'Gender of the customer'
        ,store_sales.ss_net_profit_tier as f_net_profit_tier
         comment='Tier labels for net profit from store sales'
       -- Q23
        ,store_sales.ss_customer_sk as store_sales.ss_customer_sk
         comment='Customer ID'
        -- -- Q24
        ,store_sales.ss_store_sk as store_sales.ss_store_sk
         comment='Store''s SKU (Stock Keeping Unit) where sales happened'
        ,income_band.ib_lower_bound as  income_band.ib_lower_bound
         comment='Lower bound of income bands'
        ,income_band.ib_upper_bound as  income_band.ib_upper_bound
         comment='Upper bound of income bands'
        -- -- Q25
        , hd.hd_buy_potential as hd.hd_buy_potential
          comment='Household buying potential'
        -- -- Q28
        ,customer_address.ca_city as ca_city
         comment='City where the customer address is located'
        ,customer.c_city as customer_address.ca_city
         comment='City where the customer is located'
        ,item.i_item_size as i_size
         comment='Item size'
        ,store_sales.ss_item_id as item.i_item_sk
         comment='Identifier of item that was sold through store'
        ,store_sales.ss_product_name as item.i_product_name
         comment='Product name of item that was sold through store'
        ,store_sales.ss_item_size as item.i_item_size
         comment='Size of item that was sold through store'
        ,web_sales.ws_item_id as item.i_item_sk
         comment='Identifier of item that was sold through web'
        ,web_sales.ws_product_name as item.i_product_name
         comment='Product name of item that was sold through web'
        ,web_sales.ws_item_size as item.i_item_size
         comment='Size of item that was sold through web'
        ,catalog_sales.cs_item_id as item.i_item_sk
         comment='Identifier of item that was sold through catalog'
        ,catalog_sales.cs_product_name as item.i_product_name
         comment='Product name of item that was sold through catalog'
        ,catalog_sales.cs_item_size as item.i_item_size
         comment='Size of item that was sold through catalog'
        -- Q31
        , customer.c_customer_sk as c_customer_sk
          comment='Customer unique identifier'
        -- -- Q34
        , web_site.web_site_sk AS web_site.web_site_sk
          comment='Unique identifier for each web site'
        , web_site.web_name AS web_site.web_name
          comment='Web site name'
        -- -- Q35, -- Q39
        , item.i_brand_id as i_brand_id
          comment='Brand ID for items'
        -- -- Q19, -- Q40
        , item.i_category as item.i_category
          comment='Product categories'
        -- -- Q39
        , item.i_color as i_color
          comment='Color options'
        -- -- Q36
        , store.s_hours as s_hours
          comment='Store hours'
        , store.s_state as s_state
          comment='Store state'
        -- -- Q38
        , customer_address.ca_zip as ca_zip
          comment='Customer zip code'
        , customer.ca_zip as customer_address.ca_zip
          comment = 'Customer zip code'
        , customer.ca_state as customer_address.ca_state
          comment='State where the customer address is located'
         -- -- Q6
        , ship_mode.sm_type as sm_type
          comment='Shipping mode type'
        , ship_mode.sm_carrier as sm_carrier
          comment='Shipping mode carrier'
        -- -- Q7, -- Q10
        , warehouse.w_warehouse_name as w_warehouse_name
          comment='Warehouse name'
        , warehouse.w_city as w_city
          comment='Warehouse city'
        , warehouse.w_warehouse_sq_ft as w_warehouse_sq_ft
          comment='Warehouse square footage'
        -- -- Q8
         , item.i_current_price as i_current_price
          comment='Current price of the item'
         -- -- Q9
        , store.s_number_employees as s_number_employees
          comment='Number of employees in the store'
        -- -- Q26
        , customer.c_birth_country as c_birth_country
          comment='Country where the customer was born'
       -- -- Q27
        , catalog_sales.cs_ship_mode_sk as catalog_sales.cs_ship_mode_sk
          comment='Unique Identifier for Shipping mode  for catalog sales'
        , web_sales.ws_ship_mode_sk as web_sales.ws_ship_mode_sk
          comment='Unique Identifier for Shipping mode for web sales'
        , hd.hd_income_band_sk as hd.hd_income_band_sk
          comment='Unique Identifier for Household income band '
        -- -- Q30
        , catalog_sales.cs_item_sk as catalog_sales.cs_item_sk
          comment='Unique Identifier for catalog sales item'
        , store_sales.ss_item_sk as store_sales.ss_item_sk
          comment='Unique Identifier for store sales item'
        , web_sales.ws_item_sk as web_sales.ws_item_sk
          comment='Unique Identifier for web sales item'
      )
  METRICS (
        -- Q11
        customer.customer_count AS COUNT(DISTINCT c_customer_sk)
        comment='Count of distinct customer identifiers'
        -- -- Q19
        ,item.product_count AS (COUNT(DISTINCT i_item_sk))
         comment ='Count of distinct products'
       -- Q12
        ,store_returns.ss_store_returns AS count_if(f_ss_has_sales)
         comment='Count of records that have a valid store sales item returned'
       -- Q13
       , web_sales.total_sales as sum(cast(ws_ext_sales_price*ws_quantity AS decimal(38, 2)))
         comment='Sum of the revenue (sales price multiplied by quantity) from web sales'
      -- -- Q14
       , store_sales.start_date AS MIN(date_dim.d_date)
         comment='Min date (start date) of the store sales'
       , store_sales.end_date AS MAX(date_dim.d_date)
         comment='Max date (end date) of the store sales'
         -- -- Q15
       , web_sales.w_net_profit AS SUM(ws_net_profit)
         comment='Sum of net profit through web sales'
       , catalog_sales.c_net_profit AS SUM(cs_net_profit)
         comment='Sum of net profit through catalog sales'
       , store_sales.s_net_profit AS SUM(ss_net_profit)
         comment='Sum of net profit through store sales'
        -- Q31, -- Q38
       , catalog_sales.total_sales AS sum(CAST(cs_sales_price * cs_quantity AS decimal(38,2)))
         comment='Sum of revenue (sales price multiplied by quantity) from catalog sales'
        -- Q32
       , store_sales.ss_customer_count as COUNT(ss_customer_sk)
         comment='Count of customers who purchased through store sales'
        -- -- Q14, Q32, -- Q40
       , store_sales.ss_average_sale_quantity as CASE WHEN COUNT(ss_quantity) = 0 THEN NULL ELSE CAST((SUM(ss_quantity) / COUNT(ss_quantity)) AS DOUBLE) END
         comment='Average store sale quantity calculated as sum of sold quantity divided by number of rows'
       -- Q33
       , store_sales.total_sales AS SUM(ss_sales_price* ss_quantity)
         comment='Sum of the revenue (sales price multiplied by quantity) from store sales'
      -- -- Q34
       , web_sales.total_quantity_sold AS COALESCE(SUM(ws_quantity), 0)
         comment='Sum of number of items sold through web sales'
       , web_sales.total_shipping_cost AS SUM(ws_ext_ship_cost)
         comment='Sum of the shipping cost'
       -- -- Q35
       , store_sales.ss_average_store_net_profit as case
           when (SUM(ss_quantity) = 0) then null
           else CAST(CAST(SUM(ss_net_profit) AS DECIMAL(17,2)) / SUM(ss_quantity) AS DECIMAL(37,22))
         end
         comment='Average profit sold through store'
        -- -- Q39
       , inventory.total_inventory_on_hand as SUM(inv_quantity_on_hand)
         comment='Total inventory on hand for a given item'
        -- -- Q36
        , store_sales.total_quantity_sold AS COALESCE(SUM(ss_quantity), 0) comment ='Total quantity sold for a given item in a store'
        , store_returns.total_quantity_returned AS COALESCE(SUM(sr_return_quantity),0) comment='Total quantity returned for a given item in a store'
        -- -- Q38
       , catalog_sales.start_date AS MIN(date_dim.d_date) comment='Min date for catalog sales'
       , catalog_sales.end_date AS MAX(date_dim.d_date) comment='Max date for catalog sales'
        -- -- Q16
       , catalog_sales.unique_catalog_customers AS COUNT(DISTINCT cs_bill_customer_sk) comment = 'Unique customers who made a purchase through catalog sales'
        -- -- Q18
       , catalog_sales.total_quantity_sold AS COALESCE(SUM(cs_quantity), 0)
         comment='Sum of number of items sold through catalog sales'
);

## Traditional SQL vs Semantic SQL

Now that we've created our semantic view, let's compare how traditional SQL and semantic SQL handle queries of varying complexity. 

We'll use the TPC-DS benchmark to demonstrate queries ranging from simple filters to complex multi-table aggregations. Each example will show:
1. **Traditional SQL**: The standard approach with explicit JOINs and aggregations
2. **Semantic SQL**: The simplified approach using our semantic view

The queries are organized by complexity levels to highlight how semantic SQL provides the most value as query complexity increases.

## Low question complexity / Low schema complexity

These queries demonstrate simple filtering and selection operations on single tables or simple joins. They represent straightforward business questions that can be answered with basic SQL operations like WHERE clauses and simple aggregations. The semantic view simplifies these queries by abstracting table references and making the intent clearer.

#### Query 1
What are all of the unique store numbers in the state of Tennessee?

In [None]:
SELECT
    DISTINCT s_store_sk
FROM
    snowflake_sample_data.tpcds_sf10tcl.store
WHERE
    s_state = 'TN'
    AND s_store_sk IS NOT NULL
ORDER BY s_store_sk;

In [None]:
SELECT * FROM SEMANTIC_VIEW (
    tpcds_nlq_view
    DIMENSIONS store.s_store_sk
    WHERE s_state='TN'
)
ORDER BY s_store_sk;

#### Query 2
What are the first and last names of all customers that have more than 5 dependents?

In [None]:
SELECT
    customer.c_first_name,
    customer.c_last_name
FROM
    snowflake_sample_data.tpcds_sf10tcl.customer
JOIN
    snowflake_sample_data.tpcds_sf10tcl.customer_demographics AS current_customer_demographics
    ON customer.c_current_cdemo_sk = current_customer_demographics.cd_demo_sk
WHERE
    current_customer_demographics.cd_dep_count > 5
ORDER BY
    c_first_name ASC, c_last_name ASC
LIMIT 100;

In [None]:
SELECT * FROM SEMANTIC_VIEW (
    tpcds_nlq_view
    FACTS customer.c_first_name first_name, customer.c_last_name last_name
    WHERE current_customer_demographics.cd_dep_count > 5
)
ORDER BY first_name ASC, last_name ASC
LIMIT 100;

#### Query 3
What is the name, manager and floor space of each store in the city of Midway?

In [None]:
SELECT
    s_store_name,
    s_manager,
    s_floor_space
FROM
    snowflake_sample_data.tpcds_sf10tcl.store
WHERE
    s_city = 'Midway'
ORDER BY s_store_name;

In [None]:
SELECT * FROM SEMANTIC_VIEW (
    tpcds_nlq_view
    FACTS s_store_name, s_manager, s_floor_space
    WHERE s_city = 'Midway'
)
ORDER BY s_store_name;

## High question complexity / Low schema complexity

These queries introduce aggregations, grouping, and metrics while maintaining relatively simple table relationships. They represent business questions requiring calculations like counts, sums, and averages across datasets. The semantic view's pre-defined metrics significantly reduce query complexity compared to traditional SQL.

#### Query 11
What is the total count of customers for each customer home state?

In [None]:
SELECT
    ca_state,
    COUNT(DISTINCT c_customer_sk) AS customer_count
FROM
    snowflake_sample_data.tpcds_sf10tcl.customer
JOIN
    snowflake_sample_data.tpcds_sf10tcl.customer_address
    ON customer.c_current_addr_sk = customer_address.ca_address_sk
GROUP BY
    ca_state
ORDER BY
    customer_count DESC, ca_state
LIMIT 100;


In [None]:
SELECT * FROM SEMANTIC_VIEW (
    tpcds_nlq_view
    DIMENSIONS customer_address.ca_state AS ca_state
    METRICS customer.customer_count
)
ORDER BY customer_count DESC
LIMIT 100;

#### Query 13
What was the overall web sales for the year 2002?

In [None]:
SELECT
    SUM(CAST(ws_ext_sales_price * ws_quantity AS DECIMAL(38, 2))) AS total_sales
FROM
    snowflake_sample_data.tpcds_sf10tcl.web_sales
JOIN
    snowflake_sample_data.tpcds_sf10tcl.date_dim
    ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
WHERE
    date_dim.d_year = 2002;

In [None]:
SELECT * FROM SEMANTIC_VIEW (
    tpcds_nlq_view
    METRICS web_sales.total_sales
    WHERE date_dim.d_year = 2002
)
LIMIT 100;


#### Query 19
What is the count of products in each product category?

In [None]:
SELECT
    i_category AS product_category,
    COUNT(DISTINCT i_item_sk) AS product_count
FROM
    snowflake_sample_data.tpcds_sf10tcl.item
WHERE
    i_category IS NOT NULL
    AND i_item_sk IS NOT NULL
GROUP BY
    i_category
ORDER BY
    i_category
LIMIT 5000;

In [None]:
SELECT * FROM semantic_view(tpcds_nlq_view
    DIMENSIONS item.i_category
    METRICS item.product_count
)
ORDER BY i_category
limit 100;

## Low question complexity / High schema complexity

These queries involve simple filters but require joining multiple tables across complex relationships. While the question itself is straightforward, the underlying data model is intricate with many foreign key relationships. The semantic view dramatically simplifies these queries by hiding the complex join logic.

#### Query 21
What is the customer id and vehicle count for every customer in income band 9?

In [None]:
SELECT DISTINCT
    customer.c_customer_id,
    hd.hd_vehicle_count
FROM
    snowflake_sample_data.tpcds_sf10tcl.customer
JOIN
    snowflake_sample_data.tpcds_sf10tcl.household_demographics hd
    ON customer.c_current_hdemo_sk = hd.hd_demo_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.income_band
    ON hd.hd_income_band_sk = income_band.ib_income_band_sk
WHERE
    income_band.ib_income_band_sk = 9
    AND customer.c_customer_id IS NOT NULL
ORDER BY
    customer.c_customer_id
LIMIT 100;


In [None]:
SELECT DISTINCT c_customer_id, hd_vehicle_count FROM (
    SELECT * FROM SEMANTIC_VIEW (
        tpcds_nlq_view
        FACTS customer.c_customer_id, hd.hd_vehicle_count
        WHERE ib_income_band_sk = 9 AND c_customer_id IS NOT NULL
    )
)
ORDER BY c_customer_id
LIMIT 5000;

#### Query 22
What is the net profit tier for each store name and gender in the 2002 sales year?

In [None]:
SELECT DISTINCT
    store.s_store_name,
    customer_demographics.cd_gender,
    CASE
        WHEN store_sales.ss_net_profit > 25000 THEN 'More than 25000'
        WHEN store_sales.ss_net_profit BETWEEN 3000 AND 25000 THEN '3000-25000'
        WHEN store_sales.ss_net_profit BETWEEN 2000 AND 3000 THEN '2000-3000'
        WHEN store_sales.ss_net_profit BETWEEN 300 AND 2000 THEN '300-2000'
        WHEN store_sales.ss_net_profit BETWEEN 250 AND 300 THEN '250-300'
        WHEN store_sales.ss_net_profit BETWEEN 200 AND 250 THEN '200-250'
        WHEN store_sales.ss_net_profit BETWEEN 150 AND 200 THEN '150-200'
        WHEN store_sales.ss_net_profit BETWEEN 100 AND 150 THEN '100-150'
        WHEN store_sales.ss_net_profit BETWEEN 50 AND 100 THEN ' 50-100'
        WHEN store_sales.ss_net_profit BETWEEN 0 AND 50 THEN '  0- 50'
        ELSE ' 50 or Less'
    END AS net_profit_tier
FROM
    snowflake_sample_data.tpcds_sf10tcl.store_sales
JOIN
    snowflake_sample_data.tpcds_sf10tcl.store
    ON store_sales.ss_store_sk = store.s_store_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.customer_demographics
    ON store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.date_dim
    ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE
    date_dim.d_year = 2002
ORDER BY
    s_store_name, cd_gender, net_profit_tier
LIMIT 100;


In [None]:
SELECT DISTINCT s_store_name, gender, f_net_profit_tier FROM (
    SELECT * FROM SEMANTIC_VIEW (
        tpcds_nlq_view
        FACTS store_sales.f_net_profit_tier, 
              store.s_store_name, 
              customer_demographics.gender,
              date_dim.d_year
    )
)
WHERE d_year = 2002
      AND NOT gender IS NULL
ORDER BY s_store_name, gender, f_net_profit_tier
LIMIT 100;

#### Query 23
What was the first name and gender of each customer that shopped in the store named 'ese' in 2001?

In [None]:
SELECT DISTINCT
    customer.c_first_name,
    customer_demographics.cd_gender
FROM
    snowflake_sample_data.tpcds_sf10tcl.store_sales
JOIN
    snowflake_sample_data.tpcds_sf10tcl.customer
    ON store_sales.ss_customer_sk = customer.c_customer_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.customer_demographics
    ON store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.store
    ON store_sales.ss_store_sk = store.s_store_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.date_dim
    ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE
    store.s_store_name = 'ese'
    AND date_dim.d_year = 2001
ORDER BY
    c_first_name, cd_gender
LIMIT 5000;

In [None]:
SELECT DISTINCT c_first_name, gender FROM (
    SELECT * FROM SEMANTIC_VIEW (
        tpcds_nlq_view
        FACTS store_sales.ss_customer_sk,
              customer.c_first_name,
              customer_demographics.gender,
              date_dim.d_year,
              store.s_store_name
    )
)
WHERE s_store_name = 'ese'
    AND d_year = 2001
    AND NOT gender IS NULL
ORDER BY c_first_name, gender
LIMIT 5000;

## High question complexity / High schema complexity

These queries represent the most challenging scenarios, combining complex aggregations with intricate multi-table joins. They answer sophisticated business questions requiring both computational complexity and deep schema knowledge. The semantic view provides the greatest value here by abstracting both the complex relationships and pre-computing metrics.

#### Query 32
For each store state in the year 2002, what was the count of store customers and the average sales quantity?

In [None]:
SELECT
    store.s_state,
    COUNT(store_sales.ss_customer_sk) AS store_customer_count,
    CASE 
        WHEN COUNT(store_sales.ss_quantity) = 0 THEN NULL 
        ELSE CAST((SUM(store_sales.ss_quantity) / COUNT(store_sales.ss_quantity)) AS DOUBLE) 
    END AS average_store_sales_quantity
FROM
    snowflake_sample_data.tpcds_sf10tcl.store_sales
JOIN
    snowflake_sample_data.tpcds_sf10tcl.store
    ON store_sales.ss_store_sk = store.s_store_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.date_dim
    ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE
    date_dim.d_year = 2002
    AND store.s_state IS NOT NULL
GROUP BY
    store.s_state
ORDER BY
    store.s_state
LIMIT 5000;

In [None]:
SELECT * FROM SEMANTIC_VIEW (
    tpcds_nlq_view
    DIMENSIONS store.s_state
    METRICS
        store_sales.ss_customer_count,
        store_sales.ss_average_sale_quantity
    WHERE date_dim.d_year = 2002
) AS R(store_state, store_customer_count, average_store_sales_quantity)
WHERE store_state IS NOT NULL
ORDER BY store_state
LIMIT 5000;

#### Query 33
What were the store sales in 2002 for each store manager in the state of Tennessee?

In [None]:
SELECT
    store.s_manager,
    SUM(store_sales.ss_sales_price * store_sales.ss_quantity) AS total_sales
FROM
    snowflake_sample_data.tpcds_sf10tcl.store_sales
JOIN
    snowflake_sample_data.tpcds_sf10tcl.store
    ON store_sales.ss_store_sk = store.s_store_sk
JOIN
    snowflake_sample_data.tpcds_sf10tcl.date_dim
    ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE
    store.s_state = 'TN'
    AND date_dim.d_year = 2002
    AND store.s_manager IS NOT NULL
GROUP BY
    store.s_manager
ORDER BY
    total_sales DESC NULLS LAST
LIMIT 5000;

In [None]:
SELECT s_manager, total_sales FROM (
    SELECT * FROM SEMANTIC_VIEW (
        tpcds_nlq_view
        DIMENSIONS store.s_manager
        METRICS store_sales.total_sales
        WHERE store.s_state = 'TN' AND d_year = 2002 
              AND store.s_manager IS NOT NULL
    )
)
ORDER BY total_sales DESC NULLS LAST
LIMIT 5000;

#### Query 34
For each website, what is the quantity sold and total shipping cost for customers with a shipping address in the state of New Jersey?

In [None]:
SELECT
  ws.ws_web_site_sk AS "web_site_sk",
  web.web_name AS "web_name",
  SUM(ws.ws_quantity) AS "total_quantity_sold",
  SUM(ws.ws_ext_ship_cost) AS "total_shipping_cost"
FROM
  snowflake_sample_data.tpcds_sf10tcl.web_sales AS ws
  JOIN snowflake_sample_data.tpcds_sf10tcl.web_site AS web ON ws.ws_web_site_sk = web.web_site_sk
  JOIN snowflake_sample_data.tpcds_sf10tcl.customer_address AS ca ON ws.ws_ship_addr_sk = ca.ca_address_sk
WHERE
  ca.ca_state='NJ'
  AND web.web_name IS NOT NULL
  AND ws.ws_quantity IS NOT NULL
  AND ws.ws_ext_ship_cost IS NOT NULL
GROUP BY
  ws.ws_web_site_sk,
  web.web_name
ORDER BY
  ws.ws_web_site_sk;

In [None]:
SELECT * FROM SEMANTIC_VIEW(
    tpcds_nlq_view
    DIMENSIONS web_site.web_site_sk, web_site.web_name
    METRICS web_sales.total_quantity_sold, web_sales.total_shipping_cost
    WHERE customer_address.ca_state='NJ'
)
WHERE web_name IS NOT NULL
    AND total_quantity_sold IS NOT NULL
    AND total_shipping_cost IS NOT NULL
ORDER BY web_site_sk;

## Resources

For more information about Snowflake Semantic Views and related features, refer to the following documentation:

### Snowflake Documentation
- **[Snowflake Cortex Analyst](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst)** - Use natural language to query semantic models with AI
- **[Cortex Analyst Semantic Model Specification](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/semantic-model-spec)** - Comprehensive guide on creating semantic models
- **[Best practices for semantic views](https://docs.snowflake.com/en/user-guide/views-semantic/best-practices-dev)** - Best practices for working with semantic models
- **[Cortex Analyst Getting Started](https://quickstarts.snowflake.com/guide/getting_started_with_cortex_analyst/)** - Step-by-step tutorial for building semantic models

### TPC-DS Benchmark Resources
- **[TPC-DS Benchmark Specification](https://www.tpc.org/tpcds/)** - Official TPC-DS specification and documentation
- **[TPCDS NLQ Benchmark](https://github.com/NLQBenchmarks/TPCDS_Benchmark)** - Open benchmark for evaluating Text-to-SQL solutions with 40 questions
- **[Snowflake Sample Data](https://docs.snowflake.com/en/user-guide/sample-data-tpcds)** - Information about the TPC-DS sample data available in Snowflake
