<a href="https://colab.research.google.com/github/MaxMatteucci/mgmt467-analytics-portfolio/blob/main/Unit2_Lab2_PromptStudio_Tasks0to4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# 🤖 MGMT 467 - Unit 2 Lab 2: Prompt Studio for AI-Assisted SQL + ML

**Date:** 2025-10-16  
**Objective:** Build and refine a complete ML pipeline for churn prediction using BigQuery — but with **Gemini-style prompts** guiding SQL generation.

You'll learn to:
- Frame SQL goals as clear prompts
- Generate, test, and debug queries with an AI assistant
- Reflect on each modeling step and your prompt design



## Task 0: Connect to BigQuery

**🎯 Goal:** Verify BigQuery access from Colab.  
**📌 Requirements:** Use `%%bigquery`, get current date and user session.

---

### 🧠 Prompt Template  
> Write a SQL query that returns CURRENT_DATE() and SESSION_USER(). I will run it with %%bigquery in Colab.

---

### 👩‍🏫 Example Prompt  
> Write a SQL query using BigQuery syntax that returns today’s date and the current session user.

---

### ✅ Expected SQL Output
```sql
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;
```

---

### 🔍 Checkpoint  
Query should return a single row with today's date and your user.


In [1]:
%%bigquery --project database-project-467
SELECT
  CURRENT_DATE() AS today,
  SESSION_USER() AS user;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,today,user
0,2025-10-25,32ma2chi@gmail.com



## Task 1: Prepare ML Table

**🎯 Goal:** Create a clean features table for modeling churn.  
**📌 Requirements:** Use cleaned_features as source, select relevant columns, filter rows with churn_label IS NOT NULL.

---

### 🧠 Prompt Template  
> Write a query that creates a new table with columns: [region, plan_tier, age_band, ...] and churn_label from [source_table]. Filter to rows where churn_label IS NOT NULL.

---

### 👩‍🏫 Example Prompt  
> Create a BigQuery table named churn_features from cleaned_features with selected features and where churn_label IS NOT NULL.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE TABLE `your_dataset.churn_features` AS
SELECT region, plan_tier, age_band, avg_rating, total_minutes, churn_label
FROM `your_dataset.cleaned_features`
WHERE churn_label IS NOT NULL;
```

---

### 🔍 Checkpoint  
Table should appear in BigQuery and contain non-null labels.


In [2]:
# ✅ Task 1: Prepare ML Table (using feat_churn_lite)
%%bigquery --project database-project-467
CREATE OR REPLACE TABLE `database-project-467.netflix.churn_features` AS
SELECT
  country AS region,
  subscription_plan AS plan_tier,
  age AS age_band,
  avg_watch_duration AS avg_rating,
  r3_min AS total_minutes,
  r3_sess AS num_sessions,
  churn_next_month AS churn_label
FROM `database-project-467.netflix.feat_churn_lite`
WHERE churn_next_month IS NOT NULL;


Query is running:   0%|          |


## Task 2: Train Logistic Regression Model

**🎯 Goal:** Train a basic BQML logistic regression model.  
**📌 Requirements:** Use churn_features table, predict churn_label from features.

---

### 🧠 Prompt Template  
> Write a CREATE MODEL SQL for logistic regression using churn_label as label and [features] as inputs.

---

### 👩‍🏫 Example Prompt  
> Train a logistic regression model to predict churn_label using region, plan_tier, total_minutes, avg_rating.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE MODEL `your_dataset.churn_model`
OPTIONS(model_type='logistic_reg') AS
SELECT region, plan_tier, total_minutes, avg_rating, churn_label
FROM `your_dataset.churn_features`;
```

---

### 🔍 Checkpoint  
Model appears in BigQuery under Models. Training completes.


In [3]:
# ✅ Task 2: Train Logistic Regression Model
%%bigquery --project database-project-467
CREATE OR REPLACE MODEL `database-project-467.netflix.churn_model`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['churn_label']
) AS
SELECT
  region,
  plan_tier,
  total_minutes,
  avg_rating,
  num_sessions,
  age_band,
  churn_label
FROM `database-project-467.netflix.churn_features`;


Query is running:   0%|          |


## Task 3: Evaluate Model

**🎯 Goal:** Evaluate the logistic regression model.  
**📌 Requirements:** Use ML.EVALUATE.

---

### 🧠 Prompt Template  
> Write a query to evaluate my logistic regression model using ML.EVALUATE.

---

### 👩‍🏫 Example Prompt  
> Evaluate the churn_model using ML.EVALUATE to get accuracy, precision, recall.

---

### ✅ Expected SQL Output
```sql
SELECT * FROM ML.EVALUATE(MODEL `your_dataset.churn_model`);
```

---

### 🔍 Checkpoint  
View performance metrics: accuracy, log_loss, precision, recall.


In [4]:
# ✅ Task 3: Evaluate Logistic Regression Model
%%bigquery --project database-project-467
SELECT *
FROM ML.EVALUATE(MODEL `database-project-467.netflix.churn_model`);


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.661717,1.0,0.661717,0.796426,0.639896,0.501293



## Task 4: Predict Churn

**🎯 Goal:** Use ML.PREDICT to generate churn predictions.  
**📌 Requirements:** Apply model to same input table.

---

### 🧠 Prompt Template  
> Generate SQL to use ML.PREDICT on churn_model and return predictions by user_id.

---

### 👩‍🏫 Example Prompt  
> Predict churn using churn_model. Include user_id, predicted_churn_label, and prediction probability.

---

### ✅ Expected SQL Output
```sql
SELECT user_id, predicted_churn_label, predicted_churn_label_probs
FROM ML.PREDICT(MODEL `your_dataset.churn_model`,
      (SELECT * FROM `your_dataset.churn_features`));
```

---

### 🔍 Checkpoint  
Inspect top churn risk users. Validate probabilities.


In [7]:
# ✅ Task 4: Predict Churn (no user_id in dataset)
%%bigquery --project database-project-467
SELECT
  predicted_churn_label,
  predicted_churn_label_probs
FROM ML.PREDICT(
  MODEL `database-project-467.netflix.churn_model`,
  (
    SELECT
      region,
      plan_tier,
      total_minutes,
      avg_rating,
      num_sessions,
      age_band,
      churn_label
    FROM `database-project-467.netflix.churn_features`
  )
);


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,predicted_churn_label,predicted_churn_label_probs
0,1,"[{'label': 1, 'prob': 0.6629066518321272}, {'l..."
1,1,"[{'label': 1, 'prob': 0.6630887968661686}, {'l..."
2,1,"[{'label': 1, 'prob': 0.6630887968661686}, {'l..."
3,1,"[{'label': 1, 'prob': 0.6630887968661686}, {'l..."
4,1,"[{'label': 1, 'prob': 0.6630887968661686}, {'l..."
...,...,...
710695,1,"[{'label': 1, 'prob': 0.6700402667503506}, {'l..."
710696,1,"[{'label': 1, 'prob': 0.6698662529318292}, {'l..."
710697,1,"[{'label': 1, 'prob': 0.6698662529318292}, {'l..."
710698,1,"[{'label': 1, 'prob': 0.6700402667503506}, {'l..."
