# 📊 MGMT 467 - Unit 2 Lab 2: Churn Modeling with BigQueryML + Feature Engineering
**Date:** 2025-10-16

In this lab you will:
- Connect to BigQuery from Colab
- Create features and labels
- Engineer new features from user behavior
- Train and evaluate logistic regression models
- Reflect on modeling assumptions and interpret results

In [1]:
# ✅ Authenticate and set up GCP project
from google.colab import auth
auth.authenticate_user()

project_id = "mgmt-471819-i5"  # <-- Replace with your actual project ID
!gcloud config set project $project_id

Updated property [core/project].


In [2]:
# ✅ Verify BigQuery access
%%bigquery --project $project_id
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,today,user
0,2025-10-26,pattersonsean4533@gmail.com


In [3]:
# ✅ Prepare base churn features
%%bigquery --project $project_id
CREATE OR REPLACE TABLE `netflix.churn_features` AS
SELECT
  t1.user_id,
  t1.country,
  t1.subscription_plan,
  t1.age,
  t2.avg_rating,
  t2.total_minutes,
  t2.avg_progress,
  t2.num_sessions,
  t3.churn_next_month AS churn_label
FROM
  `netflix.users` AS t1
JOIN
  (
    SELECT
      user_id,
      AVG(user_rating) AS avg_rating,
      SUM(watch_duration_minutes_capped) AS total_minutes,
      AVG(progress_percentage) AS avg_progress,
      COUNT(*) AS num_sessions
    FROM
      `netflix.watch_history_robust`
    GROUP BY
      user_id
  ) AS t2
ON
  t1.user_id = t2.user_id
JOIN
  `netflix.feat_churn_lite` AS t3
ON
  t1.user_id = t3.user_id;

Query is running:   0%|          |

In [4]:
# ✅ Train base logistic regression model
%%bigquery --project $project_id
CREATE OR REPLACE MODEL `netflix.churn_model`
OPTIONS(model_type='logistic_reg', input_label_cols=['churn_label']) AS
SELECT
  country,
  subscription_plan,
  age,
  avg_rating,
  total_minutes,
  avg_progress,
  num_sessions,
  churn_label
FROM `netflix.churn_features`;

Query is running:   0%|          |

In [5]:
# ✅ Evaluate base model
%%bigquery --project $project_id
SELECT *
FROM ML.EVALUATE(MODEL `netflix.churn_model`);

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.670478,0.964243,0.663143,0.790965,0.626005,0.60278


In [10]:
# ✅ Predict churn with base model
%%bigquery --project $project_id
SELECT
  user_id,
  predicted_churn_label,
  predicted_churn_label_probs
FROM ML.PREDICT(MODEL `netflix.churn_model`,
                (SELECT * FROM `netflix.churn_features` LIMIT 1000)); -- Pass all columns including user_id, limit for demonstration

-- For predicting on a large dataset, consider using batch prediction or exporting the model
-- and predicting outside of BigQuery, or implementing a looping mechanism to process in chunks.

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_id,predicted_churn_label,predicted_churn_label_probs
0,user_00008,1,"[{'label': 1, 'prob': 0.6630738048919885}, {'l..."
1,user_00008,1,"[{'label': 1, 'prob': 0.6630738048919885}, {'l..."
2,user_00008,1,"[{'label': 1, 'prob': 0.6630738048919885}, {'l..."
3,user_00008,1,"[{'label': 1, 'prob': 0.6630738048919885}, {'l..."
4,user_00008,1,"[{'label': 1, 'prob': 0.6630738048919885}, {'l..."
...,...,...,...
995,user_00285,1,"[{'label': 1, 'prob': 0.5543333520059378}, {'l..."
996,user_00285,1,"[{'label': 1, 'prob': 0.5543333520059378}, {'l..."
997,user_00285,1,"[{'label': 1, 'prob': 0.5543333520059378}, {'l..."
998,user_00285,1,"[{'label': 1, 'prob': 0.5543333520059378}, {'l..."



## 🛠️ Feature Engineering Section

We will now engineer new features to improve model performance:

- Bucket continuous variables
- Create interaction terms
- Add behavioral flags


In [7]:
# ✅ Create enhanced feature set
%%bigquery --project $project_id
CREATE OR REPLACE TABLE `netflix.churn_features_enhanced` AS
SELECT
  user_id,
  country, -- Using country instead of region
  subscription_plan, -- Using subscription_plan instead of plan_tier
  CASE -- Creating age_band from age
    WHEN age < 25 THEN '18-24'
    WHEN age BETWEEN 25 AND 34 THEN '25-34'
    WHEN age BETWEEN 35 AND 44 THEN '35-44'
    WHEN age BETWEEN 45 AND 54 THEN '45-54'
    ELSE '55+'
  END AS age_band,
  avg_rating,
  total_minutes,
  CASE
    WHEN total_minutes < 100 THEN 'low'
    WHEN total_minutes BETWEEN 100 AND 300 THEN 'medium'
    ELSE 'high'
  END AS watch_time_bucket,
  avg_progress,
  num_sessions,
  CONCAT(subscription_plan, '_', country) AS plan_country_combo, -- Adjusting combo name
  IF(total_minutes > 500, 1, 0) AS flag_binge,
  churn_label -- Getting churn_label from churn_features
FROM `netflix.churn_features`; -- Using churn_features as the source

Query is running:   0%|          |

In [8]:
# ✅ Train enhanced model
%%bigquery --project $project_id
CREATE OR REPLACE MODEL `netflix.churn_model_enhanced`
OPTIONS(model_type='logistic_reg', input_label_cols=['churn_label']) AS
SELECT
  country,
  subscription_plan,
  age_band,
  watch_time_bucket,
  avg_rating,
  avg_progress,
  num_sessions,
  plan_country_combo, -- Using the corrected combo name
  flag_binge,
  churn_label
FROM `netflix.churn_features_enhanced`;

Query is running:   0%|          |

In [9]:
# ✅ Evaluate enhanced model
%%bigquery --project $project_id
SELECT *
FROM ML.EVALUATE(MODEL `netflix.churn_model_enhanced`);

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.654105,0.997242,0.653931,0.790023,0.631202,0.60337



## 🤔 Chain-of-Thought Prompts: Feature Engineering

### 1. Why bucket continuous values like watch time?
- What patterns become clearer by using categories like "low", "medium", "high"?

### 2. What value do interaction terms (e.g., `plan_tier_region`) add?
- Could some plans behave differently in different regions?

### 3. What’s the purpose of binary flags like `flag_binge`?
- Can these capture unique behaviors not reflected in raw totals?

### 4. After evaluating the enhanced model:
- Which new features helped the most?
- Did any surprise you?

✍️ Write your responses in a text cell below or in a shared doc for discussion.


1. Bucketing continuous variables like watch time into categories ("low", "medium", "high") can help in several ways. It can make the relationship between watch time and churn more interpretable. For example, it might be clearer that users in the "low" watch time bucket have a significantly higher churn rate than those in the "high" bucket, even if the relationship isn't perfectly linear across the entire continuous range. Bucketing can also help to handle outliers and potentially improve the performance of models that are sensitive to the scale of continuous features.

2.  Interaction terms capture how the effect of one feature depends on the value of another feature. For example, a plan_country_combo interaction term allows the model to learn if a particular subscription plan has a different impact on churn in one country compared to another. This can be very valuable because user behavior and preferences can vary significantly across different demographics and regions, and an interaction term can capture these nuanced relationships that wouldn't be evident if you only included the features independently.

3. Binary flags (like flag_binge) are useful for highlighting specific, potentially important behaviors or characteristics that might not be well-represented by continuous values or categories alone. A flag_binge tells the model whether a user engages in very high levels of watch time, which could be a strong indicator of engagement and lower churn, or perhaps a sign of a user who consumes content rapidly and might churn once they've exhausted their interest in available content. These flags can capture non-linear relationships or thresholds in the data that simpler features might miss.

4. Which new features helped the most? Did any surprise you? To answer this question, you would typically look at:
Model Evaluation Metrics: Compare the roc_auc, accuracy, precision, recall, and F1-score of the base model (evaluated in cell 69fc3475) and the enhanced model (evaluated in cell 423b6d00). If the enhanced model's metrics are significantly better, it indicates that the new features added value.
Model Coefficients or Feature Importance: For logistic regression in BigQueryML, you can use ML.WEIGHTS to inspect the learned coefficients for each feature in the enhanced model. Features with larger absolute coefficient values (especially when considering their scale) are generally more influential in the model's predictions. This can help you identify which types of features (e.g., behavioral flags, interaction terms, or the new buckets) had the strongest impact.