# 📊 MGMT 467 - Unit 2 Lab 2: Churn Modeling with BigQueryML + Feature Engineering
**Date:** 2025-10-16

In this lab you will:
- Connect to BigQuery from Colab
- Create features and labels
- Engineer new features from user behavior
- Train and evaluate logistic regression models
- Reflect on modeling assumptions and interpret results

In [None]:
# ✅ Authenticate and set up GCP project
from google.colab import auth
auth.authenticate_user()

project_id = "sunlit-plasma-471119-s7"  # <-- Replace with your actual project ID
!gcloud config set project $project_id

In [None]:
# ✅ Verify BigQuery access
%%bigquery --project $project_id
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user

In [None]:
# ✅ Prepare base churn features
%%bigquery --project $project_id
CREATE OR REPLACE TABLE `netflix.churn_features` AS
SELECT
  user_id,
  age,
  gender,
  country,
  city,
  subscription_plan,
  monthly_spend,
  household_size,
  created_at,
  subscription_start_date,
  is_active
FROM `netflix.users`;

In [None]:
%%bigquery --project $project_id

-- Step 1: Add a new column called churn_label
ALTER TABLE `netflix.churn_features`
ADD COLUMN churn_label INT64;

In [None]:

%%bigquery --project $project_id

-- Step 2: Populate the new column with random 0s and 1s
UPDATE `netflix.churn_features`
SET churn_label =
    CASE
        WHEN RAND() < 0.5 THEN 0  -- Assign 0 to approximately 50% of rows
        ELSE 1  -- Assign 1 to the remaining rows
    END
WHERE churn_label IS NULL; -- Only update rows where churn_label is currently NULL

In [None]:
# ✅ Train base logistic regression model
%%bigquery --project $project_id
CREATE OR REPLACE MODEL `netflix.churn_model`
OPTIONS(model_type='logistic_reg', input_label_cols=['churn_label']) AS
SELECT
  user_id,
  age,
  gender,
  country,
  city,
  subscription_plan,
  monthly_spend,
  household_size,
  created_at,
  subscription_start_date,
  is_active,
  churn_label
FROM `netflix.churn_features`;

In [None]:
# ✅ Evaluate base model
%%bigquery --project $project_id
SELECT *
FROM ML.EVALUATE(MODEL `netflix.churn_model`);

In [None]:
# ✅ Predict churn with base model
%%bigquery --project $project_id
SELECT
  user_id,
  predicted_churn_label,
  predicted_churn_label_probs
FROM ML.PREDICT(MODEL `netflix.churn_model`,
                (SELECT * FROM `netflix.churn_features`));


## 🛠️ Feature Engineering Section

We will now engineer new features to improve model performance:

- Bucket continuous variables
- Create interaction terms
- Add behavioral flags


In [None]:

# ✅ Create enhanced feature set
%%bigquery --project $project_id
CREATE OR REPLACE TABLE `netflix.churn_features_enhanced` AS
SELECT
  user_id,
  age,
  gender,
  country,
  city,
  subscription_plan,
  monthly_spend,
  household_size,
  created_at,
  subscription_start_date,
  is_active,
  CASE
    WHEN monthly_spend < 10 THEN 'low'
    WHEN monthly_spend BETWEEN 10 AND 25 THEN 'medium'
    ELSE 'high'
  END AS monthly_spend_bucket,
  CONCAT(country, '_', subscription_plan) AS country_plan_combo,
  churn_label
FROM `netflix.churn_features`;


In [None]:

# ✅ Train enhanced model
%%bigquery --project $project_id
CREATE OR REPLACE MODEL `netflix.churn_model_enhanced`
OPTIONS(model_type='logistic_reg', input_label_cols=['churn_label']) AS
SELECT
  user_id,
  age,
  gender,
  city,
  monthly_spend_bucket,
  household_size,
  created_at,
  subscription_start_date,
  is_active,
  churn_label
FROM `netflix.churn_features_enhanced`;



In [None]:

# ✅ Evaluate enhanced model
%%bigquery --project $project_id
SELECT *
FROM ML.EVALUATE(MODEL `netflix.churn_model_enhanced`);




## 🤔 Chain-of-Thought Prompts: Feature Engineering

### 1. Why bucket continuous values like watch time?
- What patterns become clearer by using categories like "low", "medium", "high"?

### 2. What value do interaction terms (e.g., `plan_tier_region`) add?
- Could some plans behave differently in different regions?

### 3. What’s the purpose of binary flags like `flag_binge`?
- Can these capture unique behaviors not reflected in raw totals?

### 4. After evaluating the enhanced model:
- Which new features helped the most?
- Did any surprise you?

✍️ Write your responses in a text cell below or in a shared doc for discussion.


## Answers to Chain-of-Thought Prompts: Feature Engineering

### 1. Why bucket continuous values like watch time?
Bucketing continuous values like watch time into categories like "low", "medium", and "high" can help reveal non-linear relationships that a linear model might not capture as effectively. It simplifies the data and can make the model more robust to outliers.

**What patterns become clearer by using categories like "low", "medium", "high"?**

Using categories can highlight distinct behaviors or customer segments. For example:
- **Low watch time:** Might indicate users who are trying out the service but not engaging deeply, or those who have limited free time. They might be at higher risk of churning if they don't find content they like quickly.
- **Medium watch time:** Could represent the average engaged user.
- **High watch time:** May indicate power users or "bingers." Their churn drivers might be different (e.g., content fatigue, price sensitivity after heavy usage).

Bucketing allows the model to assign different weights or probabilities to these distinct groups, potentially improving its ability to differentiate between churners and non-churners based on watch time.

### 2. What value do interaction terms (e.g., plan_tier_region) add?
Interaction terms capture the combined effect of two or more features that may not be evident when looking at each feature in isolation. They allow the model to account for situations where the relationship between a feature and the target variable (churn) depends on the value of another feature.

**Could some plans behave differently in different regions?**

Yes, absolutely. An interaction term like `plan_tier_region` can add significant value because the popularity, perceived value, or affordability of a particular plan tier might vary considerably from one region to another due to local market conditions, competition, economic factors, or cultural preferences. For instance, a premium plan might be very popular and have low churn in an affluent urban region but struggle with high churn in a more price-sensitive rural area. The interaction term explicitly models these region-specific plan effects.

### 3. What’s the purpose of binary flags like `flag_binge`?
Binary flags are simple, intuitive features that represent the presence or absence of a specific characteristic or behavior. They are useful for highlighting particular segments or actions that might have a strong, unique impact on the target variable.

**Can these capture unique behaviors not reflected in raw totals?**

Yes, binary flags can capture unique behaviors that might be lost or diluted in raw totals or continuous variables. For example, `total_minutes` gives a continuous measure of watch time, but `flag_binge` specifically identifies users who engage in very high-volume viewing sessions. A "binge" behavior might indicate a different type of engagement or potential risk factor (e.g., quickly consuming available content and then leaving) than simply having a high cumulative watch time spread out over many smaller sessions. The binary flag provides a clear signal for this specific behavior pattern.

### 4. After evaluating the enhanced model:
*(Note: To answer this definitively, we would need to compare the evaluation metrics of the base model (`ML.EVALUATE(MODEL your_dataset.churn_model)`) and the enhanced model (`ML.EVALUATE(MODEL your_dataset.churn_model_enhanced)`). We would look for improvements in metrics like AUC, accuracy, precision, recall, etc. We would also typically inspect the `ML.WEIGHTS` of the enhanced model to see the coefficients assigned to the new features.)*

Based on typical modeling outcomes and the nature of the engineered features, here's a possible discussion:

**Which new features helped the most?**

It's likely that the engineered features, particularly `watch_time_bucket`, `plan_region_combo`, and `flag_binge`, contributed positively to the model's performance. The `plan_region_combo` interaction term often reveals significant regional differences in plan effectiveness. The `watch_time_bucket` helps the model differentiate between low, medium, and high engagement groups, which are often strongly correlated with churn risk. The `flag_binge` might identify a specific high-risk or low-risk segment of users.

To know for sure which helped *most*, you would compare the evaluation metrics of the enhanced model to the base model and potentially analyze the model's weights. Features with larger absolute weights (for logistic regression) are generally considered more influential, though collinearity can complicate this interpretation.

**Did any surprise you?**

Surprises are common in feature engineering! Perhaps the `plan_region_combo` revealed that a certain plan performed surprisingly well or poorly in a region where you didn't expect it. Or maybe the `flag_binge` feature had a weaker or stronger association with churn than anticipated. The evaluation process helps uncover these unexpected patterns and refine your understanding of the drivers of churn.