# Lab: Vertex AI–Assisted BigQuery Analytics — Example Prompts
**Goal:** Practice moving from simple SQL to complex analytics in BigQuery using *only* carefully engineered prompts with Vertex AI (Gemini).  
**Important:** This notebook contains **prompts only** (no starter code). Paste the prompts into **Vertex AI Studio**, **Vertex AI in Colab Enterprise**, or your chosen chat interface, and then run the generated SQL directly in **BigQuery**. If you decide to automate later, you can ask Vertex AI to convert the winning SQL into a Colab pipeline.

## How to use this prompts-only notebook
1. Open **Vertex AI Studio** (or Gemini in Colab Enterprise chat panel).  
2. Copy a prompt from this notebook and paste it into the model. Do **not** paste any code from here; let the model generate it.  
3. Run the generated SQL in **BigQuery** (Console → BigQuery Studio).  
4. Iterate: refine the prompt when results aren’t what you expect.  
5. Document: capture your final SQL, plus a one-sentence takeaway, in your notes/README.

## Dataset assumptions
Use one of these sources (adjust table paths accordingly):
- **Global Superstore (Kaggle)** loaded into BigQuery (e.g., `[YOUR_PROJECT].superstore_data.sales`)  
- **TheLook eCommerce** public dataset: `bigquery-public-data.thelook_ecommerce`  
If you are using *Global Superstore*, make sure column names match your schema (e.g., `Order_Date`, `Region`, `Category`, `Sub_Category`, `Sales`, `Profit`, `Discount`, `State`, `Customer_ID`, `Ship_Mode`).

---
## Prompting guardrails (quick checklist)
- **Be explicit**: table path, column names, filters, output columns, sort order, and limits.  
- **Ask for runnable SQL**: “Return a BigQuery SQL block only.”  
- **Control cost**: ask for `LIMIT` during exploration and remove it for the final run.  
- **Validate**: request a brief explanation of why each clause is present and how you can sanity-check results.
---

## Install Dependencies

In [None]:
# Install the Google Cloud BigQuery client library
!pip install google-cloud-bigquery==3.17.0 pandas==2.1.4

# Authenticate your Colab environment
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


## Copy Schema to a dataframe

In [None]:
from google.cloud import bigquery
import pandas as pd

# Replace with your Google Cloud Project ID
project_id = 'mgmt-467-47888-471119' # This is derived from your provided table name
dataset_id = 'lab1_foundation'
table_id = 'superstore'

# Construct a BigQuery client object.
client = bigquery.Client(project=project_id)

# Get the table object
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref)

# Extract schema information
schema_list = []
for field in table.schema:
    schema_list.append({
        'name': field.name,
        'field_type': field.field_type,
        'mode': field.mode,
        'description': field.description
    })

# Convert to Pandas DataFrame
schema_df = pd.DataFrame(schema_list)

# Display the schema DataFrame (optional, for verification)
print("Schema DataFrame created:")
# To see the output, run the code.


Schema DataFrame created:


## CLean Column Names

In [None]:
# --- 1. Clean the Column Names ---
# Create a 'clean_name' column with standard naming conventions:
# lowercase, with spaces and hyphens replaced by underscores.
schema_df['clean_name'] = schema_df['name'].str.lower().str.replace(' ', '_').str.replace('-', '_')


# --- 2. Generate the Aliases for the SELECT Clause ---
column_expressions = []
for index, row in schema_df.iterrows():
    original_name = row['name']
    clean_name = row['clean_name']

    # If the original name contains a space or special character, it needs to be
    # enclosed in backticks (`) in the SQL statement.
    if ' ' in original_name or '-' in original_name:
        expression = f'`{original_name}` AS {clean_name}'
    else:
        # If the name is already clean, we still alias it for consistency.
        expression = f'{original_name} AS {clean_name}'
    column_expressions.append(expression)

# Join all the individual column expressions into a single, formatted string.
select_clause = ",\n  ".join(column_expressions)


# --- 3. Construct the Final CREATE VIEW Statement ---
new_view_id = 'superstore_clean' # You can change this if you like

create_view_sql = f"""
CREATE OR REPLACE VIEW `{project_id}.{dataset_id}.{new_view_id}` AS
SELECT
  {select_clause}
FROM
  `{project_id}.{dataset_id}.{table_id}`;
"""

# --- 4. Print the Final SQL ---
print("--- Copy the SQL below and run it in your BigQuery Console ---")
print(create_view_sql)

--- Copy the SQL below and run it in your BigQuery Console ---

CREATE OR REPLACE VIEW `mgmt-467-47888-471119.lab1_foundation.superstore_clean` AS
SELECT
  `Row ID` AS row_id,
  `Order ID` AS order_id,
  `Order Date` AS order_date,
  `Ship Date` AS ship_date,
  `Ship Mode` AS ship_mode,
  `Customer ID` AS customer_id,
  `Customer Name` AS customer_name,
  Segment AS segment,
  Country AS country,
  City AS city,
  State AS state,
  `Postal Code` AS postal_code,
  Region AS region,
  `Product ID` AS product_id,
  Category AS category,
  `Sub-Category` AS sub_category,
  `Product Name` AS product_name,
  Sales AS sales,
  Quantity AS quantity,
  Discount AS discount,
  Profit AS profit
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore`;



## Generate View with standard column naming convention

In [None]:
# Execute the CREATE VIEW SQL query
try:
    query_job = client.query(create_view_sql)  # API request
    query_job.result()  # Waits for the query to finish
    print(f"View '{new_view_id}' created/replaced successfully in dataset '{dataset_id}'.")
except Exception as e:
    print(f"An error occurred while creating the view: {e}")

# Now, let's print 10 rows from the newly created view to verify
print(f"\n--- First 10 rows from the new view '{new_view_id}' ---")
try:
    # Construct a reference to the new view
    view_table_ref = client.dataset(dataset_id).table(new_view_id)

    # Fetch the first 10 rows
    rows = client.list_rows(view_table_ref, max_results=10)

    # Print header
    print(" | ".join([field.name for field in rows.schema]))
    print("-" * 80) # Separator

    # Print rows
    for row in rows:
        print(" | ".join([str(item) for item in row.values()]))

except Exception as e:
    print(f"An error occurred while fetching rows from the view: {e}")



View 'superstore_clean' created/replaced successfully in dataset 'lab1_foundation'.

--- First 10 rows from the new view 'superstore_clean' ---
row_id | order_id | order_date | ship_date | ship_mode | customer_id | customer_name | segment | country | city | state | postal_code | region | product_id | category | sub_category | product_name | sales | quantity | discount | profit
--------------------------------------------------------------------------------
An error occurred while fetching rows from the view: 400 GET https://bigquery.googleapis.com/bigquery/v2/projects/mgmt-467-47888-471119/datasets/lab1_foundation/tables/superstore_clean/data?maxResults=10&formatOptions.useInt64Timestamp=True&prettyPrint=false: Cannot list a table of type VIEW.


In [None]:
# This assumes your 'client' object from the previous cell is still active
# and correctly authenticated.

print("✅ Step 1: Defining the query string...")

query_string = """
SELECT
  order_id,
  customer_name,
  product_name,
  sales,
  profit
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
LIMIT 10;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Please double-check that your 'superstore_clean' view exists and the original table has data.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 10 rows.

--- Displaying Results ---


Unnamed: 0,order_id,customer_name,product_name,sales,profit
0,CA-2015-154900,Sung Shariari,Avery 518,3.15,1.512
1,CA-2015-154900,Sung Shariari,Adams Telephone Message Book W/Dividers/Space ...,22.72,10.224
2,US-2016-152415,Patrick O'Donnell,"C-Line Magnetic Cubicle Keepers, Clear Polypro...",14.82,6.2244
3,US-2016-152415,Patrick O'Donnell,"Howard Miller 14-1/2"" Diameter Chrome Round Wa...",191.82,61.3824
4,CA-2016-153269,Pamela Stobb,"Personal Folder Holder, Ebony",11.21,3.363
5,CA-2016-153269,Pamela Stobb,"Situations Contoured Folding Chairs, 4/Set",354.9,88.725
6,CA-2016-153269,Pamela Stobb,Xerox 193,17.94,8.7906
7,CA-2016-153269,Pamela Stobb,GBC Binding covers,51.8,23.31
8,CA-2015-158792,Brian Dahlen,Staples,22.2,10.434
9,CA-2016-141082,Fred McMath,Avery 517,3.69,1.7343


## Part A — SQL Warm‑Up (SELECT, WHERE, ORDER BY, LIMIT, DISTINCT)
**Aim:** Build confidence with precise, unambiguous prompts that yield clean, runnable SQL.

### A1. Unique values (DISTINCT)
**Prompt (paste in Vertex AI):**
```
Act as a senior BigQuery analyst. Produce a **single runnable BigQuery SQL** (no commentary) for:
- Task: List all unique `Sub_Category` values sold in the 'West' region.
- Table: `mgmt-467-47888.lab1_foundation.superstore`
- Filter: `Region = 'West'`
- Output: a single column named `Sub_Category`
- Sort: alphabetically A→Z
- Add: `LIMIT 100` to control cost during exploration.
```
**Reflection:** Did the result match your expectations? If not, what ambiguity in your prompt might have caused the mismatch?


**My Response:** As requested in the prompt, the result shows a list of unique Sub_Category values in the 'West' region, sorted alphabetically and limited to 100, The result appears to match the expectations based on the prompt's specifications. There doesn't seem to be any ambiguity in the prompt that caused a mismatch.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
SELECT
    DISTINCT `sub_category` AS sub_category
FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
WHERE
    Region = 'West'
ORDER BY
    Sub_Category ASC
LIMIT 100
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Please double-check that your 'superstore_clean' view exists and the original table has data.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 17 rows.

--- Displaying Results ---


Unnamed: 0,sub_category
0,Accessories
1,Appliances
2,Art
3,Binders
4,Bookcases
5,Chairs
6,Copiers
7,Envelopes
8,Fasteners
9,Furnishings


### A2. Top‑N by metric (ORDER BY … DESC)
**Prompt:**
```
BigQuery SQL only.
Task: Return the top 10 customers by total profit.
Table: `mgmt-467-47888.lab_foundation.superstore`
Columns used: `Customer_ID`, `Profit`
Output columns: `Customer_ID`, `total_profit`
Logic: SUM Profit per customer, order by `total_profit` DESC
Add `LIMIT 10`.
```
**Tip:** If your schema uses different identifiers (e.g., `Customer Name`), restate column names explicitly.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
SELECT
  customer_id,
  SUM(profit) AS total_profit
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
GROUP BY
  customer_id
ORDER BY
  total_profit DESC
LIMIT 10;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Please double-check that your 'superstore_clean' view exists and the original table has data.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 10 rows.

--- Displaying Results ---


Unnamed: 0,customer_id,total_profit
0,TC-20980,8981.3239
1,RB-19360,6976.0959
2,SC-20095,5757.4119
3,HL-15040,5622.4292
4,AB-10105,5444.8055
5,TA-21385,4703.7883
6,CM-12385,3899.8904
7,KD-16495,3038.6254
8,AR-10540,2884.6208
9,DR-12940,2869.076


### A3. Basic filtering (WHERE) + sanity checks
**Prompt:**
```
BigQuery SQL only.
Task: Count orders shipped with each `Ship_Mode`, but only for orders in the 'Technology' category.
Table: `[YOUR_PROJECT].superstore_data.sales`
Output: `Ship_Mode`, `order_count`
Logic: COUNT(*) grouped by `Ship_Mode`
Sort by `order_count` DESC
```
**Validation ask:** “Also list two quick sanity checks to verify the numbers.”

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
SELECT
  ship_mode,
  COUNT(*) AS order_count
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
WHERE
  category = 'Technology'
GROUP BY
  ship_mode
ORDER BY
  order_count DESC;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Please double-check that your 'superstore_clean' view exists and the original table has data.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 4 rows.

--- Displaying Results ---


Unnamed: 0,ship_mode,order_count
0,Standard Class,1082
1,Second Class,366
2,First Class,301
3,Same Day,98


**Sanity Checks for Verification**

Here are two quick sanity checks to verify the numbers from the query above:

*Total Count Verification*: The sum of order_count across all ship modes should equal the total number of orders in the 'Technology' category. You can verify this by running: SELECT COUNT(*) FROM mgmt-467-47888-471119.lab1_foundation.superstore_clean WHERE category = 'Technology';

*Ship Mode Completeness*: The query should return a count for every Ship_Mode that exists for Technology products. You can see all possible ship modes in the dataset by running: SELECT DISTINCT ship_mode FROM mgmt-467-47888-471119.lab1_foundation.superstore_clean;

## Part B — Grouped Analytics (GROUP BY, HAVING)
**Aim:** Turn raw facts into grouped metrics and filtered aggregations.

### B1. KPI aggregation with WHERE + GROUP BY
**Prompt:**
```
BigQuery SQL only.
Task: Compute monthly revenue for the last 12 full months.
Table: `[YOUR_PROJECT].superstore_data.sales`
Assume: `Order_Date` is a DATE or TIMESTAMP column named exactly `Order_Date`.
Output: `year_month` (YYYY-MM format), `monthly_revenue`
Logic: Truncate date to month, SUM `Sales`, filter to last 12 full months.
Sort by `year_month` ascending.
Include a `LIMIT` safeguard for exploration.
```

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
SELECT * FROM (
    SELECT
      FORMAT_DATE('%Y-%m', DATE_TRUNC(order_date, MONTH)) AS year_month,
      SUM(sales) AS monthly_revenue
    FROM
      `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
    GROUP BY
      year_month
    ORDER BY
      year_month DESC
    LIMIT 12
)
ORDER BY year_month ASC;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Please double-check that your 'superstore_clean' view exists and has data.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 12 rows.

--- Displaying Results ---


Unnamed: 0,year_month,monthly_revenue
0,2017-01,43971.374
1,2017-02,20301.1334
2,2017-03,58872.3528
3,2017-04,36521.5361
4,2017-05,44261.1102
5,2017-06,52981.7257
6,2017-07,45264.416
7,2017-08,63120.888
8,2017-09,87866.652
9,2017-10,77776.9232


### B2. Post‑aggregation filter (HAVING)
**Prompt:**
```
BigQuery SQL only.
Task: Find sub-categories whose total profit over the entire dataset is negative.
Table: `[YOUR_PROJECT].superstore_data.sales`
Output: `Sub_Category`, `total_profit`
Logic: SUM `Profit` GROUP BY `Sub_Category`, HAVING SUM(Profit) < 0
Sort by `total_profit` ASC (most negative first).
```
**Why HAVING?** Ask the model to include a 1-sentence explanation of why HAVING is used instead of WHERE here.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
SELECT
  sub_category,
  SUM(profit) AS total_profit
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
GROUP BY
  sub_category
HAVING
  SUM(profit) < 0
ORDER BY
  total_profit ASC;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. This means no sub-categories had negative profit.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 3 rows.

--- Displaying Results ---


Unnamed: 0,sub_category,total_profit
0,Tables,-17725.4811
1,Bookcases,-3472.556
2,Supplies,-1189.0995


HAVING is used here because it filters groups after the SUM(Profit) aggregation has been calculated, whereas WHERE filters individual rows before any aggregation occurs.

## Part C — Joins (dimension enrichment)
**Aim:** Use joins to enhance facts with attributes.

### C1. Join facts to a small dimension
*(If you have a customer or product dimension in your schema, use it. Otherwise, request a synthetic example.)*  
**Prompt:**
```
BigQuery SQL only.
Task: Join the sales table to a product dimension to report `Product_ID`, `Product_Name`, and total sales.
Tables: `[YOUR_PROJECT].superstore_data.sales` as s, `[YOUR_PROJECT].superstore_data.products` as p
Join key: `s.Product_ID = p.Product_ID`
Output: `Product_ID`, `Product_Name`, `total_sales`
Sort by `total_sales` DESC
```
**If you lack a dimension table:** Ask the model how to simulate one temporarily via a CTE.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
WITH product_dimension AS (
  -- This CTE simulates a dimension table by getting unique product information.
  SELECT DISTINCT
    product_id,
    product_name
  FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
)
SELECT
  s.product_id,
  p.product_name,
  SUM(s.sales) AS total_sales
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean` AS s
JOIN
  product_dimension AS p ON s.product_id = p.product_id
GROUP BY
  s.product_id,
  p.product_name
ORDER BY
  total_sales DESC
LIMIT 20; -- Added a limit to show top 20 products
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 20 rows.

--- Displaying Results ---


Unnamed: 0,product_id,product_name,total_sales
0,TEC-CO-10004722,Canon imageCLASS 2200 Advanced Copier,61599.824
1,OFF-BI-10003527,Fellowes PB500 Electric Punch Plastic Comb Bin...,27453.384
2,TEC-MA-10002412,Cisco TelePresence System EX90 Videoconferenci...,22638.48
3,FUR-CH-10002024,HON 5400 Series Task Chairs for Big and Tall,21870.576
4,OFF-BI-10001359,GBC DocuBind TL300 Electric Binding System,19823.479
5,OFF-BI-10000545,GBC Ibimaster 500 Manual ProClick Binding System,19024.5
6,TEC-CO-10001449,Hewlett Packard LaserJet 3310 Copier,18839.686
7,TEC-MA-10001127,HP Designjet T520 Inkjet Large Format Printer ...,18374.895
8,OFF-BI-10004995,GBC DocuBind P400 Electric Binding System,17965.068
9,OFF-SU-10000151,High Speed Automatic Electric Letter Opener,17030.312


Since our dataset doesn't have a separate products dimension table, we can simulate one using a Common Table Expression (CTE). The CTE, named product_dimension, creates a temporary, unique list of products (product_id and product_name). We then join this temporary dimension table back to our main sales data to calculate the total sales for each product, fulfilling the prompt's requirement for a join operation.

## Part D — Common Table Expressions (CTEs)
**Aim:** Make complex logic readable and testable in steps.

### D1. Multi‑step ranking with CTEs
**Prompt:**
```
BigQuery SQL only.
Goal: Within each `Region`, rank states by total sales and return top 3 per region.
Table: `[YOUR_PROJECT].superstore_data.sales`
CTE 1 (`state_sales`): SUM(Sales) by `Region`, `State`
CTE 2 (`ranked_state_sales`): Add `RANK() OVER (PARTITION BY Region ORDER BY total_sales DESC)` as `sales_rank`
Final SELECT: rows where `sales_rank <= 3`
Output columns: `Region`, `State`, `total_sales`, `sales_rank`
Sort: by `Region`, then `sales_rank`
```
**Ask for**: a one-paragraph explanation of each step, then **provide only the final runnable SQL**.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
WITH state_sales AS (
  -- CTE 1: Sum sales by state and region
  SELECT
    region,
    state,
    SUM(sales) AS total_sales
  FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
  GROUP BY
    region,
    state
),
ranked_state_sales AS (
  -- CTE 2: Rank states within each region by sales
  SELECT
    region,
    state,
    total_sales,
    RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS sales_rank
  FROM
    state_sales
)
-- Final SELECT: Get the top 3 states per region
SELECT
  region,
  state,
  total_sales,
  sales_rank
FROM
  ranked_state_sales
WHERE
  sales_rank <= 3
ORDER BY
  region,
  sales_rank;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 12 rows.

--- Displaying Results ---


Unnamed: 0,region,state,total_sales,sales_rank
0,Central,Texas,170188.0458,1
1,Central,Illinois,80166.101,2
2,Central,Michigan,76269.614,3
3,East,New York,310876.271,1
4,East,Pennsylvania,116511.914,2
5,East,Ohio,78258.136,3
6,South,Florida,89473.708,1
7,South,Virginia,70636.72,2
8,South,North Carolina,55603.164,3
9,West,California,457687.6315,1


This query first creates a temporary table named state_sales (CTE 1) to aggregate the total sales for each state within every region. Next, it uses that result to create a second temporary table, ranked_state_sales (CTE 2), which assigns a sales_rank to each state by partitioning the data by Region and ordering it by the calculated total_sales in descending order. Finally, the main query selects the desired columns from this ranked data, filtering for only those rows where the sales_rank is less than or equal to 3, giving us the top three states for each region.

### D2. Time‑boxed “most improved” analysis
**Prompt:**
```
BigQuery SQL only.
Goal: Identify the top 5 sub-categories with the largest YoY revenue increase from 2023 to 2024.
Table: `[YOUR_PROJECT].superstore_data.sales`
CTE `yr_sales`: SUM(Sales) by `Sub_Category` and `year` extracted from `Order_Date`
Final: pivot or self-join to compute delta (2024 minus 2023) as `yoy_delta`
Output: `Sub_Category`, `sales_2023`, `sales_2024`, `yoy_delta`
Order by `yoy_delta` DESC
Limit 5
```
**Validation:** Ask the model for two quick failure modes (e.g., missing years) and how to handle them.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
WITH yr_sales AS (
  -- First, calculate the total sales for each sub-category for the years 2013 and 2014.
  SELECT
    sub_category,
    EXTRACT(YEAR FROM order_date) AS sales_year,
    SUM(sales) AS total_sales
  FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
  WHERE
    EXTRACT(YEAR FROM order_date) IN (2013, 2014)
  GROUP BY
    sub_category,
    sales_year
),
pivoted_sales AS (
  -- Pivot the data to have separate columns for 2013 and 2014 sales.
  SELECT
    sub_category,
    SUM(IF(sales_year = 2013, total_sales, 0)) AS sales_2013,
    SUM(IF(sales_year = 2014, total_sales, 0)) AS sales_2014
  FROM
    yr_sales
  GROUP BY
    sub_category
)
-- Finally, calculate the year-over-year change and select the top 5.
SELECT
  sub_category,
  sales_2013,
  sales_2014,
  (sales_2014 - sales_2013) AS yoy_delta
FROM
  pivoted_sales
ORDER BY
  yoy_delta DESC
LIMIT 5;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Check if data exists for the specified years (2013, 2014).")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 5 rows.

--- Displaying Results ---


Unnamed: 0,sub_category,sales_2013,sales_2014,yoy_delta
0,Phones,0.0,77390.806,77390.806
1,Chairs,0.0,77241.576,77241.576
2,Machines,0.0,62023.373,62023.373
3,Storage,0.0,50329.042,50329.042
4,Tables,0.0,46088.3655,46088.3655


**Potential Failure Modes and Solutions**

1. ***Missing Years for a Sub-Category***: A sub-category might have sales in one year but not the other (e.g., a new product line). A simple inner join would exclude it, hiding its growth. Solution: The provided query handles this by using a SUM(IF(...)) pivot combined with a GROUP BY on the sub-category, which correctly treats a missing year's sales as zero, ensuring all relevant sub-categories are included in the yoy_delta calculation.

2. ***No Data for Specified Period***: The query might run successfully but return zero rows if there is no data at all for the specified years (e.g., 2013 and 2014). This could lead to the incorrect assumption that there was no growth. Solution: Before running the main analysis, you can run a quick check to confirm the date range of your data: SELECT MIN(order_date), MAX(order_date) FROM \mgmt-467-47888-471119.lab1_foundation.superstore_clean`;`. This helps you select relevant years for your analysis.

## Part E — Window Functions (ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD, moving averages)
**Aim:** Compare rows across partitions and time; compute trends and ranks without collapsing rows.

### E1. Top product per region (ROW_NUMBER)
**Prompt:**
```
BigQuery SQL only.
Task: For each `Region`, return only the single highest-revenue `Sub_Category`.
Table: `[YOUR_PROJECT].superstore_data.sales`
CTE `subcat_sales`: SUM(Sales) by `Region`, `Sub_Category`
Add `ROW_NUMBER() OVER (PARTITION BY Region ORDER BY total_sales DESC)` as rn
Final: filter `rn = 1`
Output: `Region`, `Sub_Category`, `total_sales`
Sort by `Region`
```
**Why `ROW_NUMBER` instead of `RANK`?** Ask the model to add a 2-sentence contrast.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
WITH subcat_sales AS (
  -- First, aggregate sales by region and sub-category
  SELECT
    region,
    sub_category,
    SUM(sales) AS total_sales
  FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
  GROUP BY
    region,
    sub_category
),
ranked_sales AS (
  -- Then, rank sub-categories within each region by their total sales
  SELECT
    region,
    sub_category,
    total_sales,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_sales DESC) AS rn
  FROM
    subcat_sales
)
-- Finally, select only the top-ranked sub-category for each region
SELECT
  region,
  sub_category,
  total_sales
FROM
  ranked_sales
WHERE
  rn = 1
ORDER BY
  region;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 4 rows.

--- Displaying Results ---


Unnamed: 0,region,sub_category,total_sales
0,Central,Chairs,85230.646
1,East,Phones,100614.982
2,South,Phones,58304.438
3,West,Chairs,101781.328


ROW_NUMBER is used here because it guarantees a unique rank for each row within a partition, ensuring that exactly one Sub_Category is returned per region even if there are ties in total_sales. RANK, on the other hand, would assign the same rank to tied rows, which could result in returning multiple sub-categories for a region if they have the same top sales amount.

### E2. YoY growth with LAG
**Prompt:**
```
BigQuery SQL only.
Task: Compute year-over-year revenue growth for 'Phones' sub-category.
Table: `[YOUR_PROJECT].superstore_data.sales`
Steps:
- Filter to `Sub_Category = 'Phones'`
- Aggregate yearly revenue using EXTRACT(YEAR FROM Order_Date)
- Add `LAG(yearly_revenue) OVER (ORDER BY year)` as `prev_revenue`
- Compute `yoy_pct = 100.0 * (yearly_revenue - prev_revenue) / prev_revenue`
Output: `year`, `yearly_revenue`, `prev_revenue`, `yoy_pct`
Sort by `year` ASC
```
**Ask for**: a guard against divide-by-zero or NULL previous year.

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
WITH sales_by_year AS (
  -- First, filter to 'Phones' and calculate total sales per year
  SELECT
    EXTRACT(YEAR FROM order_date) AS sales_year,
    SUM(sales) AS total_revenue
  FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
  WHERE
    sub_category = 'Phones'
  GROUP BY
    sales_year
),
revenue_with_lag AS (
  -- Use LAG to get the previous year's revenue
  SELECT
    sales_year,
    total_revenue,
    LAG(total_revenue, 1) OVER (ORDER BY sales_year ASC) AS prev_revenue
  FROM
    sales_by_year
)
-- Calculate YoY growth, using SAFE_DIVIDE to prevent errors
SELECT
  sales_year AS year,
  total_revenue AS yearly_revenue,
  prev_revenue,
  SAFE_DIVIDE(100.0 * (total_revenue - prev_revenue), prev_revenue) AS yoy_pct
FROM
  revenue_with_lag
ORDER BY
  year ASC;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 4 rows.

--- Displaying Results ---


Unnamed: 0,year,yearly_revenue,prev_revenue,yoy_pct
0,2014,77390.806,,
1,2015,68313.702,77390.806,-11.728918
2,2016,78962.03,68313.702,15.587397
3,2017,105340.516,78962.03,33.406545


### E3. 3‑month moving average (MA)
**Prompt:**
```
BigQuery SQL only.
Task: For the 'Corporate' segment, compute a 3-month moving average of monthly revenue.
Table: `[YOUR_PROJECT].superstore_data.sales`
Steps:
- Derive `month` via DATE_TRUNC(Order_Date, MONTH)
- SUM(Sales) per `month`
- Add `AVG(monthly_revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)` as `ma_3`
Output: `month`, `monthly_revenue`, `ma_3`
Sort by `month` ASC
```
**Tip:** Ask the model to include a 1‑line cost control note (e.g., restrict date range while iterating).

In [None]:
print("✅ Step 1: Defining the query string...")

query_string = """
WITH monthly_sales AS (
  -- Filter for 'Corporate' segment and aggregate sales by month.
  SELECT
    DATE_TRUNC(order_date, MONTH) AS month,
    SUM(sales) AS monthly_revenue
  FROM
    `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
  WHERE
    segment = 'Corporate'
  GROUP BY
    month
)
-- Calculate the 3-month moving average.
SELECT
  month,
  monthly_revenue,
  AVG(monthly_revenue) OVER (
    ORDER BY month
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS ma_3
FROM
  monthly_sales
ORDER BY
  month ASC;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 48 rows.

--- Displaying Results ---


Unnamed: 0,month,monthly_revenue,ma_3
0,2014-01-01,1701.528,1701.528
1,2014-02-01,1183.668,1442.598
2,2014-03-01,11106.799,4663.998333
3,2014-04-01,14131.729,8807.398667
4,2014-05-01,9142.0,11460.176
5,2014-06-01,3970.914,9081.547667
6,2014-07-01,10032.988,7715.300667
7,2014-08-01,7451.774,7151.892
8,2014-09-01,15507.745,10997.502333
9,2014-10-01,12637.678,11865.732333


**Cost Control Note**: To reduce costs during development, add a WHERE clause inside the CTE to process only a smaller, recent date range (e.g., WHERE order_date >= '2017-01-01').

## Part F — Debugging & Optimization Prompts
**Aim:** Use the model as a rubber duck for error handling and performance.

### F1. Explain the error, propose a fix
**Prompt:**
```
I ran this BigQuery SQL and got an error:

**SQL:**
SELECT
  sub_category,
  SUM(sales) AS total_sales
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
WHERE
  total_sales > 200000
GROUP BY
  sub_category;

**ERROR:**
`Unrecognized name: total_sales at [6:3]`

Act as a BigQuery trouble‑shooter.
1) Identify the root cause.
2) Propose the smallest possible fix.
3) Suggest a quick sanity check query to verify the fix.
Return only the corrected SQL and a 2‑sentence rationale.
```

**Corrected SQL and Rationale:**

```sql
SELECT
  sub_category,
  SUM(sales) AS total_sales
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
GROUP BY
  sub_category
HAVING
  total_sales > 200000;
```

The root cause of the error is that a `WHERE` clause cannot use aliases for aggregated fields because it filters rows *before* the aggregation (`SUM`) is computed. The correct approach is to use a `HAVING` clause, which is designed specifically to filter groups *after* aggregation has occurred.

### F2. Reduce cost / improve speed
**Prompt:**
```
Act as a BigQuery cost optimizer.
Given this query (below), list 3 ways to reduce scanned bytes and improve performance without changing the business logic.

**SQL:**
-- Goal: Find the total sales for each product in the 'Technology' category for the full year of 2017.
SELECT
  s.product_id,
  p.product_name,
  SUM(s.sales) AS total_sales
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean` AS s
LEFT JOIN
  -- This self-join to get product name is redundant and inefficient
  (SELECT DISTINCT product_id, product_name FROM `mgmt-467-47888-471119.lab1_foundation.superstore_clean`) AS p
  ON s.product_id = p.product_id
WHERE
  EXTRACT(YEAR FROM s.order_date) = 2017 -- Filtering on a function output prevents partition pruning
  AND s.category = 'Technology'
GROUP BY
  s.product_id,
  p.product_name
ORDER BY
  total_sales DESC;

Prioritize: partition filters, column pruning, pre-aggregations, and temporary results via CTEs.
```

**Optimizer's Recommendations:**

Here are three ways to make the above query more performant and cost-effective:

1.  **Avoid Redundant Joins and Prune Columns:** The query joins the table to itself simply to retrieve the `product_name`, which is already available in the primary table (`s`). By removing the unnecessary `LEFT JOIN` and selecting columns directly from the source, you eliminate a costly data shuffle operation.

2.  **Filter on Partition Columns Directly:** The filter `EXTRACT(YEAR FROM s.order_date) = 2017` forces a full scan of the `order_date` column before filtering. If the table is partitioned by `order_date`, changing this to a direct range filter like `WHERE s.order_date BETWEEN '2017-01-01' AND '2017-12-31'` allows BigQuery to prune unneeded partitions, dramatically reducing the amount of data scanned.

3.  **Consider Pre-Aggregation:** If this type of analysis is performed frequently, you can pre-aggregate the data into a summary table or a BigQuery Materialized View. For example, a table containing monthly sales per product would make this query much faster and cheaper, as it would be querying a much smaller, pre-calculated dataset.

## Part G — Validation & Counter‑examples (DIVE: Validate)
**Aim:** Avoid “first‑answer fallacy” by testing alternatives.

### G1. Ask for counter‑queries
**Prompt:**
```
I concluded that 'Tables' is a high‑sales but negative‑profit sub-category due to high discounts.
Create two alternative BigQuery SQL queries that could falsify or nuance this finding:
- One that slices by region and time
- One that controls for order priority or ship mode
Return BigQuery SQL only, then a one-paragraph note on how to compare outcomes.
```

In [None]:
print("✅ Query 1: Slicing by Region and Time...")
query_string_1 = """
-- This query checks if the unprofitability of 'Tables' is consistent across all regions and time.
SELECT
  region,
  FORMAT_DATE('%Y-%m', order_date) AS year_month,
  SUM(sales) AS total_sales,
  SUM(profit) AS total_profit,
  AVG(discount) AS avg_discount
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
WHERE
  sub_category = 'Tables'
GROUP BY
  region, year_month
ORDER BY
  region, year_month;
"""

try:
    query_job_1 = client.query(query_string_1)
    results_df_1 = query_job_1.to_dataframe()
    print("--- Query 1 Results: Profitability of 'Tables' by Region and Month ---")
    display(results_df_1)
except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Query 1: Slicing by Region and Time...
--- Query 1 Results: Profitability of 'Tables' by Region and Month ---


Unnamed: 0,region,year_month,total_sales,total_profit,avg_discount
0,Central,2014-03,2452.070,-89.1204,0.200000
1,Central,2014-04,1145.690,-227.6210,0.300000
2,Central,2014-05,355.455,-184.8366,0.500000
3,Central,2014-06,368.853,-228.3255,0.400000
4,Central,2014-08,489.230,41.9340,0.300000
...,...,...,...,...,...
135,West,2017-08,2334.782,286.5952,0.150000
136,West,2017-09,3621.616,46.9125,0.160000
137,West,2017-10,1279.049,-109.2713,0.250000
138,West,2017-11,7368.180,691.8420,0.114286


In [None]:
print("\n✅ Query 2: Controlling for Ship Mode...")
query_string_2 = """
-- This query investigates if high costs associated with certain shipping modes contribute to negative profits.
SELECT
  ship_mode,
  SUM(sales) AS total_sales,
  SUM(profit) AS total_profit,
  AVG(discount) AS avg_discount,
  COUNT(*) AS number_of_orders
FROM
  `mgmt-467-47888-471119.lab1_foundation.superstore_clean`
WHERE
  sub_category = 'Tables'
GROUP BY
  ship_mode
ORDER BY
  total_profit ASC;
"""

try:
    query_job_2 = client.query(query_string_2)
    results_df_2 = query_job_2.to_dataframe()
    print("--- Query 2 Results: Profitability of 'Tables' by Ship Mode ---")
    display(results_df_2)
except Exception as e:
    print(f"\n❌ An error occurred: {e}")


✅ Query 2: Controlling for Ship Mode...
--- Query 2 Results: Profitability of 'Tables' by Ship Mode ---


Unnamed: 0,ship_mode,total_sales,total_profit,avg_discount,number_of_orders
0,Standard Class,124826.6615,-11910.0122,0.270526,190
1,Second Class,43693.7475,-3320.6799,0.248361,61
2,First Class,28800.776,-1365.3665,0.240426,47
3,Same Day,9644.347,-1129.4225,0.261905,21


**How to Compare Outcomes:**

To nuance the original finding, compare the results from these two queries against the initial conclusion. The first query will reveal if the unprofitability of 'Tables' is a blanket issue or if it's concentrated in specific regions or during particular time periods, which might point to regional pricing or discount strategies. The second query tests the hypothesis that logistics are a key factor; if profit is significantly lower for faster shipping modes despite similar discounts, it suggests that high freight costs, not just discounts, are eroding margins for this bulky sub-category. Together, these analyses can help determine if the problem is driven by discounts alone or by a combination of factors like regional strategy and shipping costs.

## Part H — Synthesis (DIVE: Extend)
**Aim:** Turn analysis into business‑ready insights.

### H1. Executive‑style summary
**Prompt:**
```
Act as a business strategist.
Based on the following metrics/figures (briefly summarize your results here), write a 4-sentence executive summary:
- 1 sentence: what changed and by how much
- 1 sentence: why it likely changed (drivers)
- 1 sentence: recommended action (who/what/when)
- 1 sentence: metric to monitor next
```

**Executive Summary:**

Our 'Tables' sub-category, despite high sales volume, has generated a net loss of over $17,700, making it a significant drain on profitability. This loss is primarily driven by an aggressive and widespread discount strategy, with average discounts consistently exceeding 24% across all regions and shipping methods. We recommend that sales leadership immediately implement a strict cap on discounts for all 'Tables' products, starting with the Central region, to be enforced by the beginning of the next fiscal quarter. The key metric to monitor weekly following this change will be the `total_profit` for the 'Tables' sub-category to ensure a swift return to profitability.

### H2. Convert final SQL into an automated job (optional)
**Prompt (use only after your SQL is final):**
```
Convert my final BigQuery SQL into a Python script that can run as a scheduled job from Colab or Cloud Functions.
Requirements:
- Use python‑bigquery client
- Parameterize date range
- Write results to a destination table `[YOUR_PROJECT].analytics.outputs_kpi`
- Add basic error handling & logging
Return one complete runnable script.
```

---
## Submission checklist
- [ ] Kept prompts precise and reproducible  
- [ ] Captured at least **one** CTE query and **one** window function query  
- [ ] Documented **two** validation attempts (counter‑queries or alternate slice)  
- [ ] Wrote a 4‑sentence executive summary based on results  
- [ ] (Optional) Converted final query into a scheduled job
---