<a href="https://colab.research.google.com/github/CalebBrunton2/mgmt467-analytics-portfolio/blob/main/Caleb_BruntonUnit2_Lab2_PromptStudio_Tasks0to4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# 🤖 MGMT 467 - Unit 2 Lab 2: Prompt Studio for AI-Assisted SQL + ML

**Date:** 2025-10-16  
**Objective:** Build and refine a complete ML pipeline for churn prediction using BigQuery — but with **Gemini-style prompts** guiding SQL generation.

You'll learn to:
- Frame SQL goals as clear prompts
- Generate, test, and debug queries with an AI assistant
- Reflect on each modeling step and your prompt design



## Task 0: Connect to BigQuery

**🎯 Goal:** Verify BigQuery access from Colab.  
**📌 Requirements:** Use `%%bigquery`, get current date and user session.

---

### 🧠 Prompt Template  
> Write a SQL query that returns CURRENT_DATE() and SESSION_USER(). I will run it with %%bigquery in Colab.

---

### 👩‍🏫 Example Prompt  
> Write a SQL query using BigQuery syntax that returns today’s date and the current session user.

---

### ✅ Expected SQL Output
```sql
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;
```

---

### 🔍 Checkpoint  
Query should return a single row with today's date and your user.


In [9]:
# Step 0 — Authenticate to Google Cloud
from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
PROJECT_ID = "original-wonder-471819-n2"  # your active project
client = bigquery.Client(project=PROJECT_ID)
print("✅ BigQuery client initialized for project:", client.project)


✅ BigQuery client initialized for project: original-wonder-471819-n2


Prompt to Gemini:
Write a SQL query using BigQuery syntax that returns today’s date and the current session user.


In [5]:
%%bigquery --project original-wonder-471819-n2
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,today,user
0,2025-10-27,calebbrunton2@gmail.com



## Task 1: Prepare ML Table

**🎯 Goal:** Create a clean features table for modeling churn.  
**📌 Requirements:** Use cleaned_features as source, select relevant columns, filter rows with churn_label IS NOT NULL.

---

### 🧠 Prompt Template  
> Write a query that creates a new table with columns: [region, plan_tier, age_band, ...] and churn_label from [source_table]. Filter to rows where churn_label IS NOT NULL.

---

### 👩‍🏫 Example Prompt  
> Create a BigQuery table named churn_features from cleaned_features with selected features and where churn_label IS NOT NULL.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE TABLE `your_dataset.churn_features` AS
SELECT region, plan_tier, age_band, avg_rating, total_minutes, churn_label
FROM `your_dataset.cleaned_features`
WHERE churn_label IS NOT NULL;
```

---

### 🔍 Checkpoint  
Table should appear in BigQuery and contain non-null labels.


Prompt to Gemini:
Write a BigQuery SQL query that creates a new table named churn_features_clean
from mgmt-467-1234.netflix.churn_features.
Select the columns region, plan_tier, age_band, avg_rating, total_minutes, and churn_label.
Filter to rows where churn_label IS NOT NULL.


In [13]:
%%bigquery --project original-wonder-471819-n2
CREATE OR REPLACE TABLE `original-wonder-471819-n2.netflix.churn_features_clean` AS
SELECT
  region,
  plan_tier,
  age_band,
  total_minutes,
  user_id,
  churn_label
FROM `mgmt-467-1234.netflix.churn_features`
WHERE churn_label IS NOT NULL;


Query is running:   0%|          |

In [14]:
%%bigquery --project original-wonder-471819-n2
SELECT
  COUNT(*) AS row_count,
  COUNTIF(churn_label IS NULL) AS null_labels
FROM `original-wonder-471819-n2.netflix.churn_features_clean`;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,row_count,null_labels
0,947600,0



## Task 2: Train Logistic Regression Model

**🎯 Goal:** Train a basic BQML logistic regression model.  
**📌 Requirements:** Use churn_features table, predict churn_label from features.

---

### 🧠 Prompt Template  
> Write a CREATE MODEL SQL for logistic regression using churn_label as label and [features] as inputs.

---

### 👩‍🏫 Example Prompt  
> Train a logistic regression model to predict churn_label using region, plan_tier, total_minutes, avg_rating.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE MODEL `your_dataset.churn_model`
OPTIONS(model_type='logistic_reg') AS
SELECT region, plan_tier, total_minutes, avg_rating, churn_label
FROM `your_dataset.churn_features`;
```

---

### 🔍 Checkpoint  
Model appears in BigQuery under Models. Training completes.


Prompt to Gemini:
Write a BigQuery CREATE MODEL statement for a logistic regression that predicts
churn_label using region, plan_tier, age_band, and total_minutes from
original-wonder-471819-n2.netflix.churn_features_clean. Explicitly set input_label_cols
to churn_label.


In [None]:
%%bigquery --project original-wonder-471819-n2
CREATE OR REPLACE MODEL `original-wonder-471819-n2.netflix.churn_model_basic`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['churn_label']
) AS
SELECT
  -- label
  CAST(churn_label AS BOOL) AS churn_label,
  -- features (exclude user_id since it's an identifier, not a predictive feature)
  region,
  plan_tier,
  age_band,
  total_minutes
FROM `original-wonder-471819-n2.netflix.churn_features_clean`;


In [None]:
%%bigquery --project original-wonder-471819-n2
SELECT *
FROM ML.TRAINING_INFO(MODEL `original-wonder-471819-n2.netflix.churn_model_basic`);



## Task 3: Evaluate Model

**🎯 Goal:** Evaluate the logistic regression model.  
**📌 Requirements:** Use ML.EVALUATE.

---

### 🧠 Prompt Template  
> Write a query to evaluate my logistic regression model using ML.EVALUATE.

---

### 👩‍🏫 Example Prompt  
> Evaluate the churn_model using ML.EVALUATE to get accuracy, precision, recall.

---

### ✅ Expected SQL Output
```sql
SELECT * FROM ML.EVALUATE(MODEL `your_dataset.churn_model`);
```

---

### 🔍 Checkpoint  
View performance metrics: accuracy, log_loss, precision, recall.


Prompt to Gemini:
Write a BigQuery SQL query to evaluate my logistic regression model named
churn_model_basic using ML.EVALUATE, and display accuracy, precision, recall, and log_loss.


In [None]:
%%bigquery --project original-wonder-471819-n2
SELECT *
FROM ML.EVALUATE(MODEL `original-wonder-471819-n2.netflix.churn_model_basic`);


In [None]:
%%bigquery --project original-wonder-471819-n2
SELECT *
FROM ML.EVALUATE(
  MODEL `original-wonder-471819-n2.netflix.churn_model_basic`,
  (
    SELECT region, plan_tier, age_band, total_minutes, churn_label
    FROM `original-wonder-471819-n2.netflix.churn_features_clean`
  )
);



## Task 4: Predict Churn

**🎯 Goal:** Use ML.PREDICT to generate churn predictions.  
**📌 Requirements:** Apply model to same input table.

---

### 🧠 Prompt Template  
> Generate SQL to use ML.PREDICT on churn_model and return predictions by user_id.

---

### 👩‍🏫 Example Prompt  
> Predict churn using churn_model. Include user_id, predicted_churn_label, and prediction probability.

---

### ✅ Expected SQL Output
```sql
SELECT user_id, predicted_churn_label, predicted_churn_label_probs
FROM ML.PREDICT(MODEL `your_dataset.churn_model`,
      (SELECT * FROM `your_dataset.churn_features`));
```

---

### 🔍 Checkpoint  
Inspect top churn risk users. Validate probabilities.


Prompt to Gemini:
Generate a BigQuery SQL query that uses ML.PREDICT on my logistic-regression
model churn_model_basic to predict churn for each user in
original-wonder-471819-n2.netflix.churn_features_clean.
Return user_id, predicted_churn_label, and prediction probability.


In [None]:
%%bigquery --project original-wonder-471819-n2
SELECT
  user_id,
  predicted_churn_label,
  predicted_churn_label_probs AS churn_probabilities
FROM ML.PREDICT(
  MODEL `original-wonder-471819-n2.netflix.churn_model_basic`,
  (
    SELECT
      user_id,
      region,
      plan_tier,
      age_band,
      total_minutes
    FROM `original-wonder-471819-n2.netflix.churn_features_clean`
  )
);


In [None]:
%%bigquery --project original-wonder-471819-n2
SELECT
  user_id,
  churn_probabilities[OFFSET(1)].prob AS churn_probability
FROM ML.PREDICT(
  MODEL `original-wonder-471819-n2.netflix.churn_model_basic`,
  (
    SELECT user_id, region, plan_tier, age_band, total_minutes
    FROM `original-wonder-471819-n2.netflix.churn_features_clean`
  )
)
ORDER BY churn_probability DESC
LIMIT 10;
