
# ü§ñ MGMT 467 - Unit 2 Lab 2: Prompt Studio for AI-Assisted SQL + ML

**Date:** 2025-10-16  
**Objective:** Build and refine a complete ML pipeline for churn prediction using BigQuery ‚Äî but with **Gemini-style prompts** guiding SQL generation.

You'll learn to:
- Frame SQL goals as clear prompts
- Generate, test, and debug queries with an AI assistant
- Reflect on each modeling step and your prompt design



## Task 0: Connect to BigQuery

**üéØ Goal:** Verify BigQuery access from Colab.  
**üìå Requirements:** Use `%%bigquery`, get current date and user session.

---

### üß† Prompt Template  
> Write a SQL query that returns CURRENT_DATE() and SESSION_USER(). I will run it with %%bigquery in Colab.

---

### üë©‚Äçüè´ Example Prompt  
> Write a SQL query using BigQuery syntax that returns today‚Äôs date and the current session user.

---

### ‚úÖ Expected SQL Output
```sql
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;
```

---

### üîç Checkpoint  
Query should return a single row with today's date and your user.


In [1]:
# prompt: can you write a SQL query using BigQuery syntax that returns todays date and the current session user

%%bigquery
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,today,user
0,2025-10-26,anishka.purdue@gmail.com


In [2]:
# prompt: I am getting a 403 access denied error. Would providing my project id help solve the issue

from google.colab import auth
auth.authenticate_user()

# Replace with your actual project ID
project_id = 'mgmt-467-25259'
!gcloud config set project {project_id}

# After setting the project ID, you can try running your %%bigquery command again.
# If the issue persists, it might be due to IAM permissions on the project or dataset.


Updated property [core/project].


In [3]:
# prompt: can you list the datasets in my projects

!bq ls


     datasetId     
 ----------------- 
  netflix          
  superstore_data  


In [4]:
# prompt: how can i access the netflix dataset in my project

!bq ls netflix


         tableId           Type    Labels   Time Partitioning   Clustered Fields  
 ------------------------ ------- -------- ------------------- ------------------ 
  activity_filled          TABLE                                                  
  activity_monthly         TABLE                                                  
  activity_roll3           TABLE                                                  
  calendar_months          TABLE                                                  
  churn_predictions_lite   TABLE                                                  
  feat_churn_lite          TABLE                                                  
  labels_next_month        TABLE                                                  
  month_bounds             TABLE                                                  
  movies                   TABLE                                                  
  recommendation_logs      TABLE                                                  
  re

In [5]:
# prompt: can you print the head of the users table in netflix

%%bigquery
SELECT * FROM netflix.users LIMIT 5


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_id,email,first_name,last_name,age,gender,country,state_province,city,subscription_plan,subscription_start_date,is_active,monthly_spend,primary_device,household_size,created_at
0,user_00015,barnesbrandy@example.net,Sarah,Santiago,45.0,Female,Canada,Alberta,West Randy,Basic,2023-12-16,False,5.14,Mobile,3.0,2022-08-13 06:39:47.240847+00:00
1,user_00015,barnesbrandy@example.net,Sarah,Santiago,45.0,Female,Canada,Alberta,West Randy,Basic,2023-12-16,False,5.14,Mobile,3.0,2022-08-13 06:39:47.240847+00:00
2,user_00021,emurphy@example.com,Darlene,Frazier,38.0,Female,Canada,Alberta,North Natalieview,Basic,2024-04-29,True,17.24,Laptop,3.0,2022-09-13 23:45:07.830930+00:00
3,user_00021,emurphy@example.com,Darlene,Frazier,38.0,Female,Canada,Alberta,North Natalieview,Basic,2024-04-29,True,17.24,Laptop,3.0,2022-09-13 23:45:07.830930+00:00
4,user_00041,michelle64@example.net,Chelsea,Meza,29.0,Female,Canada,Alberta,West Donna,Standard,2023-04-21,True,25.73,Laptop,2.0,2023-01-21 14:04:27.262786+00:00



## Task 1: Prepare ML Table

**üéØ Goal:** Create a clean features table for modeling churn.  
**üìå Requirements:** Use cleaned_features as source, select relevant columns, filter rows with churn_label IS NOT NULL.

---

### üß† Prompt Template  
> Write a query that creates a new table with columns: [region, plan_tier, age_band, ...] and churn_label from [source_table]. Filter to rows where churn_label IS NOT NULL.

---

### üë©‚Äçüè´ Example Prompt  
> Create a BigQuery table named churn_features from cleaned_features with selected features and where churn_label IS NOT NULL.

---

### ‚úÖ Expected SQL Output
```sql
CREATE OR REPLACE TABLE `your_dataset.churn_features` AS
SELECT region, plan_tier, age_band, avg_rating, total_minutes, churn_label
FROM `your_dataset.cleaned_features`
WHERE churn_label IS NOT NULL;
```

---

### üîç Checkpoint  
Table should appear in BigQuery and contain non-null labels.


In [6]:
# prompt: Create  a BigQuery table named churn_features from cleaned_features with selected  features and where churn_label is NOT NULL

%%bigquery
CREATE OR REPLACE TABLE `netflix.churn_features` AS
SELECT region, plan_tier, age_band, avg_rating, total_minutes, churn_label
FROM `netflix.cleaned_features`
WHERE churn_label IS NOT NULL;


Executing query with job ID: afe15036-75de-4b2f-b054-ebbdb78e6fc8
Query executing: 0.31s


ERROR:
 404 Not found: Table mgmt-467-25259:netflix.cleaned_features was not found in location US; reason: notFound, message: Not found: Table mgmt-467-25259:netflix.cleaned_features was not found in location US

Location: US
Job ID: afe15036-75de-4b2f-b054-ebbdb78e6fc8




## Task 2: Train Logistic Regression Model

**üéØ Goal:** Train a basic BQML logistic regression model.  
**üìå Requirements:** Use churn_features table, predict churn_label from features.

---

### üß† Prompt Template  
> Write a CREATE MODEL SQL for logistic regression using churn_label as label and [features] as inputs.

---

### üë©‚Äçüè´ Example Prompt  
> Train a logistic regression model to predict churn_label using region, plan_tier, total_minutes, avg_rating.

---

### ‚úÖ Expected SQL Output
```sql
CREATE OR REPLACE MODEL `your_dataset.churn_model`
OPTIONS(model_type='logistic_reg') AS
SELECT region, plan_tier, total_minutes, avg_rating, churn_label
FROM `your_dataset.churn_features`;
```

---

### üîç Checkpoint  
Model appears in BigQuery under Models. Training completes.



## Task 3: Evaluate Model

**üéØ Goal:** Evaluate the logistic regression model.  
**üìå Requirements:** Use ML.EVALUATE.

---

### üß† Prompt Template  
> Write a query to evaluate my logistic regression model using ML.EVALUATE.

---

### üë©‚Äçüè´ Example Prompt  
> Evaluate the churn_model using ML.EVALUATE to get accuracy, precision, recall.

---

### ‚úÖ Expected SQL Output
```sql
SELECT * FROM ML.EVALUATE(MODEL `your_dataset.churn_model`);
```

---

### üîç Checkpoint  
View performance metrics: accuracy, log_loss, precision, recall.



## Task 4: Predict Churn

**üéØ Goal:** Use ML.PREDICT to generate churn predictions.  
**üìå Requirements:** Apply model to same input table.

---

### üß† Prompt Template  
> Generate SQL to use ML.PREDICT on churn_model and return predictions by user_id.

---

### üë©‚Äçüè´ Example Prompt  
> Predict churn using churn_model. Include user_id, predicted_churn_label, and prediction probability.

---

### ‚úÖ Expected SQL Output
```sql
SELECT user_id, predicted_churn_label, predicted_churn_label_probs
FROM ML.PREDICT(MODEL `your_dataset.churn_model`,
      (SELECT * FROM `your_dataset.churn_features`));
```

---

### üîç Checkpoint  
Inspect top churn risk users. Validate probabilities.
