<a href="https://colab.research.google.com/github/bulut19/mgmt467-analytics-portfolio/blob/main/Lab5_Part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# 🤖 MGMT 467 - Unit 2 Lab 2: Prompt Studio — Feature Engineering & Beyond

**Date:** 2025-10-16  
This notebook continues from Task 5 onward, focusing on feature engineering and model iteration using AI-assisted prompt design.

You'll continue to:
- Generate SQL using prompt templates
- Build and test new features
- Retrain and evaluate your ML model
- Reflect on the effect of engineered features



## Task 5.0: Bucket a Continuous Feature

**🎯 Goal:** Group 'total_minutes' into categories: low, medium, high.  
**📌 Requirements:** Use CASE WHEN or IF statements to create 'watch_time_bucket'.

---

### 🧠 Prompt Template  
> Write SQL that creates a new column watch_time_bucket based on total_minutes thresholds (<100, 100–300, >300).

---

### 👩‍🏫 Example Prompt  
> Create a new column watch_time_bucket with values 'low', 'medium', or 'high' based on total_minutes.

---

### 🔍 Exploration  
How does churn rate vary across these buckets?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
CREATE OR REPLACE TABLE `netflix.churn_features_bucketed` AS
SELECT
  *,
  CASE
    WHEN total_minutes < 100 THEN 'low'
    WHEN total_minutes >= 100 AND total_minutes <= 300 THEN 'medium'
    ELSE 'high'
  END AS watch_time_bucket
FROM `netflix.churn_features`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
SELECT
  watch_time_bucket,
  COUNT(*) AS n_users,
  AVG(CAST(churn_label AS FLOAT64)) AS churn_rate
FROM `netflix.churn_features_bucketed`
GROUP BY watch_time_bucket
ORDER BY
  CASE watch_time_bucket
    WHEN 'low' THEN 1 WHEN 'medium' THEN 2 ELSE 3
  END;

**Exploration:** Users in the 'low' watch time bucket (less than 100 minutes) had the lowest churn rate at around 7.7%. On the other hand, users in the 'medium' (100-300 minutes) and 'high' (greater than 300 minutes) watch time buckets showed significantly higher and very similar churn rates, around 14.7% and 14.8% respectively. This suggests that while low engagement users are less likely to churn, those with moderate to high watch times show a consistently higher likelihood to churn.


## Task 5.1: Create a Binary Flag Feature

**🎯 Goal:** Add a binary column flag_binge (1 if total_minutes > 500).  
**📌 Requirements:** Use IF logic to create a binary column in SQL.

---

### 🧠 Prompt Template  
> Write a SQL query that adds flag_binge = 1 if total_minutes > 500, else 0.

---

### 👩‍🏫 Example Prompt  
> Add a binary column flag_binge to identify binge-watchers.

---

### 🔍 Exploration  
Are binge-watchers more or less likely to churn?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Add a binary flag: 1 if total_minutes > 500 else 0
CREATE OR REPLACE TABLE `netflix.churn_features_flagged` AS
SELECT
  *,
  CASE
    WHEN total_minutes > 500 THEN 1
    ELSE 0
  END AS flag_binge
FROM `netflix.churn_features_bucketed`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
SELECT
  *
FROM
  `netflix.churn_features_flagged`
LIMIT 10;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Compare churn between binge (1) vs non-binge (0)
SELECT
  flag_binge,
  COUNT(*) AS n_users,
  AVG(CAST(churn_label AS FLOAT64)) AS churn_rate
FROM `netflix.churn_features_flagged`
GROUP BY flag_binge
ORDER BY flag_binge DESC;

**Exploration:** Users flagged as binge-watchers (over 500 minutes) had a churn rate of around 14.9%, while non-binge watchers had a slightly lower churn rate of around 14.6%. Although the difference is not big, it shows that binge watchers have a slightly higher likelihood of churning.


## Task 5.2: Create an Interaction Term

**🎯 Goal:** Create plan_region_combo by combining plan_tier and region.  
**📌 Requirements:** Use CONCAT or STRING functions.

---

### 🧠 Prompt Template  
> Generate SQL to create a new column by combining plan_tier and region with an underscore.

---

### 👩‍🏫 Example Prompt  
> Create a column called plan_region_combo as CONCAT(plan_tier, '_', region).

---

### 🔍 Exploration  
Which plan-region combos have highest churn?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Create plan_region_combo by combining plan_tier and region
CREATE OR REPLACE TABLE `netflix.churn_features_combined` AS
SELECT
  *,
  CONCAT(plan_tier, '_', region) AS plan_region_combo
FROM
  `netflix.churn_features_flagged`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
SELECT
  plan_region_combo,
  COUNT(*) AS n_users,
  AVG(CAST(churn_label AS FLOAT64)) AS churn_rate
FROM
  `netflix.churn_features_combined`
GROUP BY
  plan_region_combo
ORDER BY
  churn_rate DESC;

**Exploration:** Users on the Basic plan in the USA and those on the Standard plan in Canada are the most likely to churn, with churn rates around 15.9% and 15.8%. Premium users in both the USA and Canada also have pretty high churn rates, in the 14% to 15% range. On the other hand, Basic plan users in Canada seem less likely to churn, around 13.4%, and Standard and Premium+ users in the USA are also on the lower end, around 14.0% to 14.1%. Based on this, it would be recommended to focus efforts on keeping Basic plan users in the USA and Standard plan users in Canada to reduce churn.


## Task 5.3: Add Missingness Indicator Flags

**🎯 Goal:** Add binary flags to capture NULL values in age_band and avg_rating.  
**📌 Requirements:** Use IS NULL logic to create new flag columns.

---

### 🧠 Prompt Template  
> Create a new column is_missing_[col_name] that is 1 when column is NULL, else 0.

---

### 👩‍🏫 Example Prompt  
> Add is_missing_age that flags rows where age_band IS NULL.

---

### 🔍 Exploration  
Do missing values correlate with churn?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Add binary flags to capture NULL values in age_band and avg_rating
CREATE OR REPLACE TABLE `netflix.churn_features_missing_flags` AS
SELECT
  *,
  CASE
    WHEN age_band IS NULL THEN 1
    ELSE 0
  END AS is_missing_age_band,
  CASE
    WHEN avg_rating IS NULL THEN 1
    ELSE 0
  END AS is_missing_avg_rating
FROM
  `netflix.churn_features_combined`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
SELECT
  is_missing_age_band,
  is_missing_avg_rating,
  COUNT(*) AS n_users,
  AVG(CAST(churn_label AS FLOAT64)) AS churn_rate
FROM
  `netflix.churn_features_missing_flags`
GROUP BY
  is_missing_age_band,
  is_missing_avg_rating
ORDER BY
  is_missing_age_band,
  is_missing_avg_rating;

**Exploration:** There are no instances of missing data for `age_band` and `avg_rating`. Hence we can't really observe any correlation between missing values of the two variables and churn based on this data.


## Task 5.4: Create Time-Based Features (Optional)

**🎯 Goal:** Add a column days_since_last_login.  
**📌 Requirements:** Use DATE_DIFF with CURRENT_DATE and last_login_date.

---

### 🧠 Prompt Template  
> Write SQL to create a column showing days since last login using DATE_DIFF.

---

### 👩‍🏫 Example Prompt  
> Add a column days_since_last_login = DATE_DIFF(CURRENT_DATE(), last_login_date, DAY).

---

### 🔍 Exploration  
Does login recency affect churn rate?



## Task 5.5: Assemble Enhanced Feature Table

**🎯 Goal:** Create churn_features_enhanced with all engineered columns.  
**📌 Requirements:** Include all prior features + engineered columns.

---

### 🧠 Prompt Template  
> Generate SQL to create churn_features_enhanced with new columns: watch_time_bucket, plan_region_combo, flag_binge, etc.

---

### 👩‍🏫 Example Prompt  
> Build a new table churn_features_enhanced with all original features + engineered ones.

---

### 🔍 Exploration  
Are row counts stable? Any NULLs introduced?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Create churn_features_enhanced with all engineered columns
CREATE OR REPLACE TABLE `netflix.churn_features_enhanced` AS
SELECT
  *
FROM
  `netflix.churn_features_missing_flags`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
SELECT
  *
FROM
  `netflix.churn_features_enhanced`
LIMIT 10;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Check row counts
SELECT
  (SELECT COUNT(*) FROM `netflix.churn_features`)            AS n_base,
  (SELECT COUNT(*) FROM `netflix.churn_features_enhanced`)   AS n_enhanced;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Check for NULLs in engineered features
SELECT
  SUM(CASE WHEN watch_time_bucket     IS NULL THEN 1 ELSE 0 END) AS null_watch_time_bucket,
  SUM(CASE WHEN flag_binge            IS NULL THEN 1 ELSE 0 END) AS null_flag_binge,
  SUM(CASE WHEN plan_region_combo     IS NULL THEN 1 ELSE 0 END) AS null_plan_region_combo,
  SUM(CASE WHEN is_missing_age_band   IS NULL THEN 1 ELSE 0 END) AS null_is_missing_age_band,
  SUM(CASE WHEN is_missing_avg_rating IS NULL THEN 1 ELSE 0 END) AS null_is_missing_avg_rating
FROM `netflix.churn_features_enhanced`;

**Exploration:** Comparing the row count to the original `netflix.churn_features` table, we can see that the number of rows is stable, hence no data was lost or duplicated. None of the engineered feature columns contain any NULL values so we are ready for model training.


## Task 6: Retrain Model on Engineered Features

**🎯 Goal:** Train a logistic regression model using churn_features_enhanced.  
**📌 Requirements:** Use BQML logistic_reg model with new feature columns.

---

### 🧠 Prompt Template  
> Write CREATE MODEL SQL using enhanced features including flags and buckets.

---

### 👩‍🏫 Example Prompt  
> Retrain churn_model_enhanced using watch_time_bucket, flag_binge, plan_region_combo.

---

### 🔍 Exploration  
Does model accuracy improve?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Train a logistic regression model using original features
CREATE OR REPLACE MODEL
  `netflix.churn_model_base`
OPTIONS
  (model_type='logistic_reg',
    input_label_cols=['churn_label']
  ) AS
SELECT
  region,
  plan_tier,
  age_band,
  avg_rating,
  total_minutes,
  churn_label
FROM
  `netflix.churn_features`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Evaluate the base model
SELECT
  *
FROM
  ML.EVALUATE(MODEL `netflix.churn_model_base`,
    (
    SELECT
      region,
      plan_tier,
      age_band,
      avg_rating,
      total_minutes,
      churn_label
    FROM
      `netflix.churn_features`
    )
  );

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Train a logistic regression model using churn_features_enhanced
CREATE OR REPLACE MODEL
  `netflix.churn_model_enhanced`
OPTIONS
  (model_type='logistic_reg',
    input_label_cols=['churn_label']
  ) AS
SELECT
  -- Original features
  region,
  plan_tier,
  age_band,
  avg_rating,
  total_minutes,
  -- Engineered features
  watch_time_bucket,
  flag_binge,
  plan_region_combo,
  is_missing_age_band,
  is_missing_avg_rating,
  churn_label
FROM
  `netflix.churn_features_enhanced`;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Evaluate the enhanced model
SELECT
  *
FROM
  ML.EVALUATE(MODEL `netflix.churn_model_enhanced`,
    (
    SELECT
      -- Include all features used for training
      region,
      plan_tier,
      age_band,
      avg_rating,
      total_minutes,
      watch_time_bucket,
      flag_binge,
      plan_region_combo,
      is_missing_age_band,
      is_missing_avg_rating,
      churn_label
    FROM
      `netflix.churn_features_enhanced`
    )
  );

**Exploration:** The accuracy remained the same for both models at 0.852039. However the enhanced model showed a slight improvement in Log Loss as it decreased from 0.419079 in the base model to 0.418972 in the enhanced model. The ROC AUC also increased from 0.517648 in the base model to 0.521578 in the enhanced model, suggesting that the engineered features did slightly improve the model's ability to differentiate between churn and non-churn cases.


## Task 7: Compare Model Performance

**🎯 Goal:** Compare base model vs enhanced model using ML.EVALUATE.  
**📌 Requirements:** Use same evaluation query for both models.

---

### 🧠 Prompt Template  
> Write a SQL query to evaluate churn_model_enhanced and compare with churn_model.

---

### 👩‍🏫 Example Prompt  
> Compare ML.EVALUATE output from both models side-by-side.

---

### 🔍 Exploration  
Which features made the most difference?


In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Compare Base vs Enhanced metrics side-by-side
WITH base AS (
  SELECT
    'baseline' AS model,
    *
  FROM
    ML.EVALUATE(MODEL `netflix.churn_model_base`,
      (
      SELECT
        region,
        plan_tier,
        age_band,
        avg_rating,
        total_minutes,
        churn_label
      FROM
        `netflix.churn_features`
      )
    )
),
enhanced AS (
  SELECT
    'enhanced' AS model,
    *
  FROM
    ML.EVALUATE(MODEL `netflix.churn_model_enhanced`,
      (
      SELECT
        -- Include all features used for training
        region,
        plan_tier,
        age_band,
        avg_rating,
        total_minutes,
        watch_time_bucket,
        flag_binge,
        plan_region_combo,
        is_missing_age_band,
        is_missing_avg_rating,
        churn_label
      FROM
        `netflix.churn_features_enhanced`
      )
    )
)
SELECT * FROM base
UNION ALL
SELECT * FROM enhanced;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Get feature weights for the enhanced model
SELECT
  *
FROM
  ML.WEIGHTS(MODEL `netflix.churn_model_enhanced`);

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Unnest category weights for easier viewing
SELECT
  processed_input,
  category_weights.category,
  category_weights.weight
FROM
  ML.WEIGHTS(MODEL `netflix.churn_model_enhanced`),
  UNNEST(category_weights) AS category_weights
WHERE
  -- Filter for features that have category weights (i.e., are categorical)
  -- Removing the problematic ARRAY_LENGTH condition
  processed_input IS NOT NULL; -- Keep this condition as processed_input can be NULL for intercept

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Analyze weights of individual engineered features
SELECT
  processed_input,
  category_weights.category,
  category_weights.weight,  -- Specify weight from unnested table
  ABS(category_weights.weight) AS abs_weight -- Specify weight from unnested table for ABS
FROM
  ML.WEIGHTS(MODEL `netflix.churn_model_enhanced`),
  UNNEST(category_weights) AS category_weights
WHERE
  processed_input IN ('watch_time_bucket', 'plan_region_combo', 'flag_binge', 'is_missing_age_band', 'is_missing_avg_rating')
ORDER BY
  abs_weight DESC
LIMIT 20; -- Limit to top 20 for clarity

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Calculate total absolute weight by feature group (revised)
WITH FeatureWeights AS (
  SELECT
    processed_input,
    ABS(weight) AS abs_weight
  FROM ML.WEIGHTS(MODEL `netflix.churn_model_enhanced`)
  WHERE processed_input IS NOT NULL AND category_weights IS NULL -- For numerical and binary features
  UNION ALL
  SELECT
    processed_input,
    ABS(category_weights.weight) AS abs_weight
  FROM ML.WEIGHTS(MODEL `netflix.churn_model_enhanced`),
  UNNEST(category_weights) AS category_weights -- For categorical features
  WHERE processed_input IS NOT NULL
),
FeatureGroups AS (
  SELECT
    processed_input,
    CASE
      WHEN processed_input = 'watch_time_bucket' THEN 'watch_time_bucket'
      WHEN processed_input = 'plan_region_combo' THEN 'plan_region_combo'
      WHEN processed_input = 'flag_binge' THEN 'flag_binge'
      WHEN processed_input = 'is_missing_age_band' OR processed_input = 'is_missing_avg_rating' THEN 'missing_flags'
      ELSE 'original_features' -- Group all other features
    END AS feature_group,
    abs_weight
  FROM FeatureWeights
)
SELECT
  feature_group,
  SUM(abs_weight) AS total_absolute_weight
FROM FeatureGroups
GROUP BY
  feature_group
ORDER BY
  total_absolute_weight DESC;

In [None]:
%%bigquery --project boxwood-veld-471119-r6
-- Calculate total absolute weight by feature group (similar to sample code)
WITH W AS (
  SELECT
    CASE
      WHEN processed_input LIKE 'watch_time_bucket=%' THEN 'watch_time_bucket'
      WHEN processed_input LIKE 'plan_region_combo=%' THEN 'plan_region_combo'
      WHEN processed_input = 'flag_binge' THEN 'flag_binge'
      WHEN processed_input LIKE 'is_missing_%' THEN 'missing_flags'
      ELSE 'other'
    END AS feature_group,
    ABS(weight) AS abs_weight
  FROM ML.WEIGHTS(MODEL `netflix.churn_model_enhanced`)
  WHERE processed_input IS NOT NULL
)
SELECT feature_group, SUM(abs_weight) AS total_abs_weight
FROM W
GROUP BY feature_group
ORDER BY total_abs_weight DESC;

**Exploration:** Looking at the total absolute weights by feature group, the original features combined together had the highest influence on the model. Among the engineered features, the `plan_region_combo` and `watch_time_bucket` groups had the highest collective influence. Analyzing the individual engineered feature weights showed that the 'low' category of `watch_time_bucket` and specific categories within `plan_region_combo` had the largest individual impacts. Features like `flag_binge` had very little influence in this model.

Overall, while the engineered features provided a slight improvement in model performance (in Log Loss and ROC AUC), the original features were more dominant in influencing model's predictions.