
# 🤖 MGMT 467 - Unit 2 Lab 2: Prompt Studio for AI-Assisted SQL + ML

**Date:** 2025-10-16  
**Objective:** Build and refine a complete ML pipeline for churn prediction using BigQuery — but with **Gemini-style prompts** guiding SQL generation.

You'll learn to:
- Frame SQL goals as clear prompts
- Generate, test, and debug queries with an AI assistant
- Reflect on each modeling step and your prompt design



## Task 0: Connect to BigQuery

**🎯 Goal:** Verify BigQuery access from Colab.  
**📌 Requirements:** Use `%%bigquery`, get current date and user session.

---

### 🧠 Prompt Template  
> Write a SQL query that returns CURRENT_DATE() and SESSION_USER(). I will run it with %%bigquery in Colab.

---

### 👩‍🏫 Example Prompt  
> Write a SQL query using BigQuery syntax that returns today’s date and the current session user.

---

### ✅ Expected SQL Output
```sql
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;
```

---

### 🔍 Checkpoint  
Query should return a single row with today's date and your user.


My Prompt: Write an SQL script in bigquery in colab that returns CURRENT_DATE() and SESSION_USER()

In [None]:
from google.colab import auth
auth.authenticate_user()

# ✅ Step 2: Import libraries
from google.cloud import bigquery

# 🔧 Step 3: Manually set your Google Cloud project ID
# Replace the string below with your actual project ID from the Google Cloud Console
project_id = "mgmt-labs-unit-two"  # e.g. "data-analytics-123456"

# ✅ Step 4: Initialize BigQuery client
client = bigquery.Client(project=project_id)

# ✅ Step 5: Run SQL query
query = """
SELECT
  CURRENT_DATE() AS current_date,
  SESSION_USER() AS session_user
"""

# ✅ Step 6: Execute and display results
df = client.query(query).to_dataframe()
df

Unnamed: 0,current_date,session_user
0,2025-10-26,tylerwichman13@gmail.com



## Task 1: Prepare ML Table

**🎯 Goal:** Create a clean features table for modeling churn.  
**📌 Requirements:** Use cleaned_features as source, select relevant columns, filter rows with churn_label IS NOT NULL.

---

### 🧠 Prompt Template  
> Write a query that creates a new table with columns: [region, plan_tier, age_band, ...] and churn_label from [source_table]. Filter to rows where churn_label IS NOT NULL.

---

### 👩‍🏫 Example Prompt  
> Create a BigQuery table named churn_features from cleaned_features with selected features and where churn_label IS NOT NULL.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE TABLE `your_dataset.churn_features` AS
SELECT region, plan_tier, age_band, avg_rating, total_minutes, churn_label
FROM `your_dataset.cleaned_features`
WHERE churn_label IS NOT NULL;
```

---

### 🔍 Checkpoint  
Table should appear in BigQuery and contain non-null labels.


My Prompt: Write a query in SQL to print in colab that creates a new table titled churn_features from the table cleaned_features from the netflix dataset with the features user_id(users), country(users), subscription_plan(users), age(users), user_rating(watch_history, this one may have multiple for the same user so you must find the average for all times the user gave a rating), watch_duration_minutesthis one may have multiple for the same user so you must find the average for all times the user gave a rating), is_active which must be renamed to churn_label, where churn_label IS NOT NULL

In [None]:
project_id = "mgmt-labs-unit-two"     # e.g., "mgmt467-lab1"
dataset_id = "netflix"     # e.g., "mgmt467.netflix_data"
table_id = f"{project_id}.{dataset_id}.churn_features"

client = bigquery.Client(project=project_id)

# ✅ Step 4: Define the SQL query with user_id
query = f"""
CREATE OR REPLACE TABLE `{table_id}` AS
SELECT
  u.user_id AS user_id,
  u.country AS country,
  u.subscription_plan AS subscription_plan,
  u.age AS age,
  -- Average rating and watch time per user (if multiple records exist)
  AVG(w.user_rating) AS avg_rating,
  AVG(w.watch_duration_minutes) AS avg_watch_duration,
  CASE WHEN u.is_active = TRUE THEN 1 ELSE 0 END AS churn_label,
FROM `{project_id}.{dataset_id}.users` AS u
LEFT JOIN `{project_id}.{dataset_id}.watch_history` AS w
  ON u.user_id = w.user_id
WHERE u.is_active IS NOT NULL
GROUP BY u.user_id, u.country, u.subscription_plan, u.age, u.is_active
"""

# Step 5: Execute query
job = client.query(query)
job.result()  # Wait for the job to complete

print("✅ Table 'churn_features' created successfully in dataset:", dataset_id)

✅ Table 'churn_features' created successfully in dataset: netflix


In [None]:
project_id = "mgmt-labs-unit-two"     # ← same as before
dataset_id = "netflix"     # ← same as before

client = bigquery.Client(project=project_id)

# Step 3: Query the churn_features table
query = f"""
SELECT *
FROM `{project_id}.{dataset_id}.churn_features`
LIMIT 10
"""  # You can adjust LIMIT to show more rows

# Step 4: Run and display as a DataFrame
df = client.query(query).to_dataframe()
df

Unnamed: 0,user_id,country,subscription_plan,age,avg_rating,avg_watch_duration,churn_label
0,user_03630,Canada,Basic,,3.8,60.9,1
1,user_03774,USA,Basic,,2.0,68.281818,1
2,user_01474,USA,Basic,,4.0,65.788889,1
3,user_02496,USA,Basic,,,61.325,1
4,user_01308,USA,Basic,,2.5,60.336364,1
5,user_05407,Canada,Basic,,,61.4,1
6,user_09997,USA,Basic,,4.0,53.52,1
7,user_05911,USA,Basic,,2.5,71.2,1
8,user_07652,USA,Basic,,,72.166667,1
9,user_03604,Canada,Basic,,3.25,59.183333,0



## Task 2: Train Logistic Regression Model

**🎯 Goal:** Train a basic BQML logistic regression model.  
**📌 Requirements:** Use churn_features table, predict churn_label from features.

---

### 🧠 Prompt Template  
> Write a CREATE MODEL SQL for logistic regression using churn_label as label and [features] as inputs.

---

### 👩‍🏫 Example Prompt  
> Train a logistic regression model to predict churn_label using region, plan_tier, total_minutes, avg_rating.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE MODEL `your_dataset.churn_model`
OPTIONS(model_type='logistic_reg') AS
SELECT region, plan_tier, total_minutes, avg_rating, churn_label
FROM `your_dataset.churn_features`;
```

---

### 🔍 Checkpoint  
Model appears in BigQuery under Models. Training completes.


In [None]:
query = """
CREATE OR REPLACE MODEL `mgmt-labs-unit-two.netflix.churn_logistic_model`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['churn_label'],
  auto_class_weights = TRUE,
  data_split_method = 'AUTO_SPLIT'
) AS

SELECT
  country,
  subscription_plan,
  age,
  avg_rating,
  avg_watch_duration,
  churn_label
FROM `mgmt-labs-unit-two.netflix.churn_features`
WHERE churn_label IS NOT NULL
"""
query_job = client.query(query)
query_job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x79f826a0c620>


## Task 3: Evaluate Model

**🎯 Goal:** Evaluate the logistic regression model.  
**📌 Requirements:** Use ML.EVALUATE.

---

### 🧠 Prompt Template  
> Write a query to evaluate my logistic regression model using ML.EVALUATE.

---

### 👩‍🏫 Example Prompt  
> Evaluate the churn_model using ML.EVALUATE to get accuracy, precision, recall.

---

### ✅ Expected SQL Output
```sql
SELECT * FROM ML.EVALUATE(MODEL `your_dataset.churn_model`);
```

---

### 🔍 Checkpoint  
View performance metrics: accuracy, log_loss, precision, recall.


My Prompt: Evaluate the churn_model just created using ML.EVALUATE to get the accuracy, precision, recall, etc.

In [None]:
query = """
SELECT
  *
FROM
  ML.EVALUATE(
    MODEL `mgmt-labs-unit-two.netflix.churn_logistic_model`,
    (
      SELECT
        country,
        subscription_plan,
        age,
        avg_rating,
        avg_watch_duration,
        churn_label
      FROM `mgmt-labs-unit-two.netflix.churn_features`
      WHERE churn_label IS NOT NULL
    )
  )
"""

# Run and display results
df_eval = client.query(query).to_dataframe()
df_eval

Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.85757,0.505341,0.5071,0.635941,0.693123,0.514276



## Task 4: Predict Churn

**🎯 Goal:** Use ML.PREDICT to generate churn predictions.  
**📌 Requirements:** Apply model to same input table.

---

### 🧠 Prompt Template  
> Generate SQL to use ML.PREDICT on churn_model and return predictions by user_id.

---

### 👩‍🏫 Example Prompt  
> Predict churn using churn_model. Include user_id, predicted_churn_label, and prediction probability.

---

### ✅ Expected SQL Output
```sql
SELECT user_id, predicted_churn_label, predicted_churn_label_probs
FROM ML.PREDICT(MODEL `your_dataset.churn_model`,
      (SELECT * FROM `your_dataset.churn_features`));
```

---

### 🔍 Checkpoint  
Inspect top churn risk users. Validate probabilities.


My Prompt:

In [None]:
query = """
SELECT
  user_id,
  predicted_churn_label,
  predicted_churn_label_probs[OFFSET(1)] AS predicted_churn_probability
FROM
  ML.PREDICT(
    MODEL `mgmt-labs-unit-two.netflix.churn_logistic_model`,
    (
      SELECT
        user_id,
        country,
        subscription_plan,
        age,
        avg_rating,
        avg_watch_duration
      FROM `mgmt-labs-unit-two.netflix.churn_features`
    )
  )
"""
df_predictions = client.query(query).to_dataframe()
df_predictions.head(10)

Unnamed: 0,user_id,predicted_churn_label,predicted_churn_probability
0,user_03630,1,"{'label': 0, 'prob': 0.49948246808358443}"
1,user_03774,0,"{'label': 0, 'prob': 0.5009325896541459}"
2,user_01474,1,"{'label': 0, 'prob': 0.49972983757565104}"
3,user_02496,0,"{'label': 0, 'prob': 0.5000377989452423}"
4,user_01308,0,"{'label': 0, 'prob': 0.500523447596525}"
5,user_05407,1,"{'label': 0, 'prob': 0.4997473701485142}"
6,user_09997,1,"{'label': 0, 'prob': 0.49954811441601366}"
7,user_05911,0,"{'label': 0, 'prob': 0.5006843564202219}"
8,user_07652,0,"{'label': 0, 'prob': 0.5001983825858847}"
9,user_03604,1,"{'label': 0, 'prob': 0.49977764401443747}"


🤖 MGMT 467 - Unit 2 Lab 2: Prompt Studio — Feature Engineering & Beyond
Date: 2025-10-16
This notebook continues from Task 5 onward, focusing on feature engineering and model iteration using AI-assisted prompt design.

You'll continue to:

Generate SQL using prompt templates
Build and test new features
Retrain and evaluate your ML model
Reflect on the effect of engineered features
Task 5.0: Bucket a Continuous Feature
🎯 Goal: Group 'total_minutes' into categories: low, medium, high.
📌 Requirements: Use CASE WHEN or IF statements to create 'watch_time_bucket'.

🧠 Prompt Template
Write SQL that creates a new column watch_time_bucket based on total_minutes thresholds (<100, 100–300, >300).

👩‍🏫 Example Prompt
Create a new column watch_time_bucket with values 'low', 'medium', or 'high' based on total_minutes.

🔍 Exploration
How does churn rate vary across these buckets?

My Prompt: Write an SQL query that creates a new column titled watch_time_bucket based on avg_watch_duration thresholds of (<100, 100-300,>300) title them low, medium and high respectively.

In [None]:
query = """
SELECT
  *,
  CASE
    WHEN avg_watch_duration < 100 THEN 'Low'
    WHEN avg_watch_duration BETWEEN 100 AND 300 THEN 'Medium'
    ELSE 'High'
  END AS watch_time_bucket
FROM `mgmt-labs-unit-two.netflix.churn_features`
"""
df = client.query(query).to_dataframe()
df

Unnamed: 0,user_id,country,subscription_plan,age,avg_rating,avg_watch_duration,churn_label,watch_time_bucket
0,user_03630,Canada,Basic,,3.8,60.900000,1,Low
1,user_03774,USA,Basic,,2.0,68.281818,1,Low
2,user_01474,USA,Basic,,4.0,65.788889,1,Low
3,user_02496,USA,Basic,,,61.325000,1,Low
4,user_01308,USA,Basic,,2.5,60.336364,1,Low
...,...,...,...,...,...,...,...,...
9995,user_07925,Canada,Standard,92.0,2.0,237.275000,1,Medium
9996,user_01372,USA,Standard,94.0,3.0,88.600000,1,Low
9997,user_07883,USA,Standard,96.0,3.0,67.850000,1,Low
9998,user_01444,USA,Standard,105.0,4.0,207.128571,1,Medium


Task 5.1: Create a Binary Flag Feature
🎯 Goal: Add a binary column flag_binge (1 if total_minutes > 500).
📌 Requirements: Use IF logic to create a binary column in SQL.

🧠 Prompt Template
Write a SQL query that adds flag_binge = 1 if total_minutes > 500, else 0.

👩‍🏫 Example Prompt
Add a binary column flag_binge to identify binge-watchers.

🔍 Exploration
Are binge-watchers more or less likely to churn?

My Prompt: Create another binary column titled flag_binge if avg_watch_duration is >500

In [None]:
query = """
SELECT
  *,
  CASE
    WHEN avg_watch_duration > 500 THEN 1
    ELSE 0
  END AS flag_binge
FROM `mgmt-labs-unit-two.netflix.churn_features`
"""
df = client.query(query).to_dataframe()
df

Unnamed: 0,user_id,country,subscription_plan,age,avg_rating,avg_watch_duration,churn_label,flag_binge
0,user_03630,Canada,Basic,,3.8,60.900000,1,0
1,user_03774,USA,Basic,,2.0,68.281818,1,0
2,user_01474,USA,Basic,,4.0,65.788889,1,0
3,user_02496,USA,Basic,,,61.325000,1,0
4,user_01308,USA,Basic,,2.5,60.336364,1,0
...,...,...,...,...,...,...,...,...
9995,user_07925,Canada,Standard,92.0,2.0,237.275000,1,0
9996,user_01372,USA,Standard,94.0,3.0,88.600000,1,0
9997,user_07883,USA,Standard,96.0,3.0,67.850000,1,0
9998,user_01444,USA,Standard,105.0,4.0,207.128571,1,0


Task 5.2: Create an Interaction Term
🎯 Goal: Create plan_region_combo by combining plan_tier and region.
📌 Requirements: Use CONCAT or STRING functions.

🧠 Prompt Template
Generate SQL to create a new column by combining plan_tier and region with an underscore.

👩‍🏫 Example Prompt
Create a column called plan_region_combo as CONCAT(plan_tier, '_', region).

🔍 Exploration
Which plan-region combos have highest churn?

My Prompt: Create an SQL script that makes a new column combining subscription_plan and country with an underscore name it plan_region_combo and use CONCAT(subscription_plan, '_', country)

In [None]:
query = """
SELECT
  *,
  CONCAT(subscription_plan, '_', country) AS plan_region_combo
FROM `mgmt-labs-unit-two.netflix.churn_features`
"""
df = client.query(query).to_dataframe()
df

Unnamed: 0,user_id,country,subscription_plan,age,avg_rating,avg_watch_duration,churn_label,plan_region_combo
0,user_03630,Canada,Basic,,3.800000,60.900000,1,Basic_Canada
1,user_05407,Canada,Basic,,,61.400000,1,Basic_Canada
2,user_03604,Canada,Basic,,3.250000,59.183333,0,Basic_Canada
3,user_00008,Canada,Basic,,3.333333,45.544444,1,Basic_Canada
4,user_09004,Canada,Basic,,3.000000,83.680000,1,Basic_Canada
...,...,...,...,...,...,...,...,...
9995,user_06957,USA,Standard,90.0,3.500000,59.066667,1,Standard_USA
9996,user_01372,USA,Standard,94.0,3.000000,88.600000,1,Standard_USA
9997,user_07883,USA,Standard,96.0,3.000000,67.850000,1,Standard_USA
9998,user_01444,USA,Standard,105.0,4.000000,207.128571,1,Standard_USA


Task 5.3: Add Missingness Indicator Flags
🎯 Goal: Add binary flags to capture NULL values in age_band and avg_rating.
📌 Requirements: Use IS NULL logic to create new flag columns.

🧠 Prompt Template
Create a new column is_missing_[col_name] that is 1 when column is NULL, else 0.

👩‍🏫 Example Prompt
Add is_missing_age that flags rows where age_band IS NULL.

🔍 Exploration
Do missing values correlate with churn?

My Prompt: add a new column titled is_missing_age that flags rows where age is null

In [None]:
query = """
SELECT
  *,
  CASE
    WHEN age IS NULL THEN 1
    ELSE 0
  END AS is_missing_age
FROM `mgmt-labs-unit-two.netflix.churn_features`
"""
df = client.query(query).to_dataframe()
df

Unnamed: 0,user_id,country,subscription_plan,age,avg_rating,avg_watch_duration,churn_label,is_missing_age
0,user_03630,Canada,Basic,,3.8,60.900000,1,1
1,user_03774,USA,Basic,,2.0,68.281818,1,1
2,user_01474,USA,Basic,,4.0,65.788889,1,1
3,user_02496,USA,Basic,,,61.325000,1,1
4,user_01308,USA,Basic,,2.5,60.336364,1,1
...,...,...,...,...,...,...,...,...
9995,user_07925,Canada,Standard,92.0,2.0,237.275000,1,0
9996,user_01372,USA,Standard,94.0,3.0,88.600000,1,0
9997,user_07883,USA,Standard,96.0,3.0,67.850000,1,0
9998,user_01444,USA,Standard,105.0,4.0,207.128571,1,0


Task 5.5: Assemble Enhanced Feature Table
🎯 Goal: Create churn_features_enhanced with all engineered columns.
📌 Requirements: Include all prior features + engineered columns.

🧠 Prompt Template
Generate SQL to create churn_features_enhanced with new columns: watch_time_bucket, plan_region_combo, flag_binge, etc.

👩‍🏫 Example Prompt
Build a new table churn_features_enhanced with all original features + engineered ones.

🔍 Exploration
Are row counts stable? Any NULLs introduced?

My Prompt: build a new table titled churn_features_enhanced with all the original features and all the created ones

In [None]:
query = """
CREATE OR REPLACE TABLE `mgmt-labs-unit-two.netflix.churn_features_enhanced` AS
WITH binge_flag AS (
  SELECT
    user_id,
    AVG(watch_duration_minutes) AS avg_watch_duration_user,
    CASE WHEN AVG(watch_duration_minutes) > 500 THEN 1 ELSE 0 END AS flag_binge
  FROM `mgmt-labs-unit-two.netflix.watch_history`
  GROUP BY user_id
)
SELECT
  c.*,
  -- Watch time bucket based on avg_watch_duration in churn_features
  CASE
    WHEN c.avg_watch_duration < 100 THEN 'low'
    WHEN c.avg_watch_duration BETWEEN 100 AND 300 THEN 'medium'
    WHEN c.avg_watch_duration > 300 THEN 'high'
    ELSE 'unknown'
  END AS watch_time_bucket,

  -- Combine plan and region
  CONCAT(c.subscription_plan, '_', c.country) AS plan_region_combo,

  -- Flag if age is missing
  CASE WHEN c.age IS NULL THEN 1 ELSE 0 END AS is_missing_age,

  -- Add binge flag (defaults to 0 if user not in watch_history)
  IFNULL(b.flag_binge, 0) AS flag_binge
FROM `mgmt-labs-unit-two.netflix.churn_features` AS c
LEFT JOIN binge_flag AS b
  ON c.user_id = b.user_id
"""
df = client.query(query).to_dataframe()
df

Task 6: Retrain Model on Engineered Features
🎯 Goal: Train a logistic regression model using churn_features_enhanced.
📌 Requirements: Use BQML logistic_reg model with new feature columns.

🧠 Prompt Template
Write CREATE MODEL SQL using enhanced features including flags and buckets.

👩‍🏫 Example Prompt
Retrain churn_model_enhanced using watch_time_bucket, flag_binge, plan_region_combo.

🔍 Exploration
Does model accuracy improve?

My Prompt: Retrain the churn_model_enhanced using watch_time_bucket,flag_binge, and plan_region_combo

In [None]:
query = """
CREATE OR REPLACE MODEL `mgmt-labs-unit-two.netflix.churn_model_enhanced`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['churn_label'],
  auto_class_weights = TRUE,
  data_split_method = 'AUTO_SPLIT'
) AS

SELECT
  country,
  subscription_plan,
  age,
  avg_rating,
  avg_watch_duration,
  watch_time_bucket,
  flag_binge,
  plan_region_combo,
  churn_label
FROM `mgmt-labs-unit-two.netflix.churn_features_enhanced`
WHERE churn_label IS NOT NULL
"""
df = client.query(query).to_dataframe()
df

Task 7: Compare Model Performance
🎯 Goal: Compare base model vs enhanced model using ML.EVALUATE.
📌 Requirements: Use same evaluation query for both models.

🧠 Prompt Template
Write a SQL query to evaluate churn_model_enhanced and compare with churn_model.

👩‍🏫 Example Prompt
Compare ML.EVALUATE output from both models side-by-side.

🔍 Exploration
Which features made the most difference?

In [None]:
query = """
-- Evaluate original model
WITH original AS (
  SELECT
    'original_model' AS model_name,
    *
  FROM ML.EVALUATE(
    MODEL `mgmt-labs-unit-two.netflix.churn_logistic_model`,
    (
      SELECT
        country,
        subscription_plan,
        age,
        avg_rating,
        avg_watch_duration,
        churn_label
      FROM `mgmt-labs-unit-two.netflix.churn_features`
      WHERE churn_label IS NOT NULL
    )
  )
),
-- Evaluate enhanced model
enhanced AS (
  SELECT
    'enhanced_model' AS model_name,
    *
  FROM ML.EVALUATE(
    MODEL `mgmt-labs-unit-two.netflix.churn_model_enhanced`,
    (
      SELECT
        country,
        subscription_plan,
        age,
        avg_rating,
        avg_watch_duration,
        watch_time_bucket,
        flag_binge,
        plan_region_combo,
        churn_label
      FROM `mgmt-labs-unit-two.netflix.churn_features_enhanced`
      WHERE churn_label IS NOT NULL
    )
  )
),
-- Unpivot both result sets for easy side-by-side comparison
combined AS (
  SELECT 'original_model' AS model_name, metric, value FROM original
  UNPIVOT(value FOR metric IN (precision, recall, accuracy, f1_score, log_loss, roc_auc))
  UNION ALL
  SELECT 'enhanced_model' AS model_name, metric, value FROM enhanced
  UNPIVOT(value FOR metric IN (precision, recall, accuracy, f1_score, log_loss, roc_auc))
)
SELECT
  metric,
  ROUND(MAX(CASE WHEN model_name = 'original_model' THEN value END), 4) AS original_model_value,
  ROUND(MAX(CASE WHEN model_name = 'enhanced_model' THEN value END), 4) AS enhanced_model_value
FROM combined
GROUP BY metric
ORDER BY metric
"""
df_compare = client.query(query).to_dataframe()
df_compare

Unnamed: 0,metric,original_model_value,enhanced_model_value
0,accuracy,0.5071,0.4891
1,f1_score,0.6359,0.6146
2,log_loss,0.6931,0.693
3,precision,0.8576,0.8599
4,recall,0.5053,0.4782
5,roc_auc,0.5143,0.521


🤔 Chain-of-Thought Prompts: Feature Engineering
1. Why bucket continuous values like watch time?
What patterns become clearer by using categories like "low", "medium", "high"?
2. What value do interaction terms (e.g., plan_tier_region) add?
Could some plans behave differently in different regions?
3. What’s the purpose of binary flags like flag_binge?
Can these capture unique behaviors not reflected in raw totals?
4. After evaluating the enhanced model:
Which new features helped the most?
Did any surprise you?
✍️ Write your responses in a text cell below or in a shared doc for discussion.



1.   Putting continuous features such as avg_watch_duration into buckets simplifies numerical data into interpretable behavioral groups.
2.   Interaction terms can reveal context specific data such as more users in Canada are premium subscribers
3.   Binary flags identify distinct behavioral signals that may otherwise not be obvious from an average
4. The results show that the engineered features do not significantly improve the model accuracy. This surprised me as I would have thought that including more useful information would help performance but sometimes it is not statistcially significant.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [61]:
import nbformat, uuid, re

path = "/content/drive/My Drive/Colab Notebooks/Unit2_Lab2_PromptStudio.ipynb"

nb = nbformat.read(path, as_version=nbformat.NO_CONVERT)

for cell in nb.cells:
    cell_id = cell.get("id", "")
    if not re.fullmatch(r"[A-Za-z0-9_-]+", cell_id or ""):
        cell["id"] = uuid.uuid4().hex

nbformat.write(nb, path)
print("✅ All invalid cell IDs replaced successfully.")

✅ All invalid cell IDs replaced successfully.
