<a href="https://colab.research.google.com/github/MaxMatteucci/mgmt467-analytics-portfolio/blob/main/Unit2_Lab2_Churn_Modeling_FeatureEngineering_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 MGMT 467 - Unit 2 Lab 2: Churn Modeling with BigQueryML + Feature Engineering
**Date:** 2025-10-16

In this lab you will:
- Connect to BigQuery from Colab
- Create features and labels
- Engineer new features from user behavior
- Train and evaluate logistic regression models
- Reflect on modeling assumptions and interpret results

In [2]:
# ✅ Authenticate and set up GCP project
from google.colab import auth
auth.authenticate_user()

project_id = "database-project-467"  # <-- your actual project ID
!gcloud config set project $project_id


Updated property [core/project].


In [3]:
# ✅ Verify BigQuery access
%%bigquery --project $project_id
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,today,user
0,2025-10-25,32ma2chi@gmail.com


In [4]:
%%bigquery --project $project_id
SELECT *
FROM `database-project-467.netflix.feat_churn_lite`
LIMIT 5;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_id,month,r3_sess,r3_min,unique_days_watched,avg_watch_duration,days_since_last_month_start,subscription_plan,country,age,churn_next_month
0,user_00001,2025-11-01,3,0.0,0,0.0,31,Basic,USA,43.0,1
1,user_00001,2025-08-01,3,73.2,0,0.0,31,Basic,USA,43.0,1
2,user_00001,2025-11-01,3,0.0,0,0.0,31,Basic,USA,43.0,1
3,user_00001,2025-04-01,6,679.8,0,0.0,31,Basic,USA,43.0,0
4,user_00001,2025-02-01,6,226.8,0,0.0,31,Basic,USA,43.0,0


In [5]:
# ✅ Prepare base churn features (matching actual columns in feat_churn_lite)
%%bigquery --project $project_id
CREATE OR REPLACE TABLE `database-project-467.netflix.churn_features` AS
SELECT
  user_id,
  country AS region,
  subscription_plan AS plan_tier,
  age AS age_band,
  avg_watch_duration AS avg_rating,
  r3_min AS total_minutes,
  r3_sess AS num_sessions,
  churn_next_month AS churn_label
FROM `database-project-467.netflix.feat_churn_lite`
WHERE churn_next_month IS NOT NULL;


Query is running:   0%|          |

In [7]:
# ✅ Train base logistic regression model (Task 2 — with label specified)
%%bigquery --project $project_id
CREATE OR REPLACE MODEL `database-project-467.netflix.churn_model`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['churn_label']
) AS
SELECT
  region,
  plan_tier,
  age_band,
  avg_rating,
  total_minutes,
  num_sessions,
  churn_label
FROM `database-project-467.netflix.churn_features`;


Query is running:   0%|          |

In [8]:
# ✅ Evaluate base model
%%bigquery --project $project_id
SELECT *
FROM ML.EVALUATE(MODEL `database-project-467.netflix.churn_model`);


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.661717,1.0,0.661717,0.796426,0.639896,0.501293


In [9]:
# ✅ Predict churn with base model
%%bigquery --project $project_id
SELECT
  user_id,
  predicted_churn_label,
  predicted_churn_label_probs
FROM ML.PREDICT(
  MODEL `database-project-467.netflix.churn_model`,
  (
    SELECT * FROM `database-project-467.netflix.churn_features`
  )
);


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_id,predicted_churn_label,predicted_churn_label_probs
0,user_00001,1,"[{'label': 1, 'prob': 0.663404769246476}, {'la..."
1,user_00001,1,"[{'label': 1, 'prob': 0.663404769246476}, {'la..."
2,user_00001,1,"[{'label': 1, 'prob': 0.663404769246476}, {'la..."
3,user_00001,1,"[{'label': 1, 'prob': 0.663404769246476}, {'la..."
4,user_00001,1,"[{'label': 1, 'prob': 0.663404769246476}, {'la..."
...,...,...,...
710695,user_02028,1,"[{'label': 1, 'prob': 0.6648101929464486}, {'l..."
710696,user_02028,1,"[{'label': 1, 'prob': 0.6648101929464486}, {'l..."
710697,user_08692,1,"[{'label': 1, 'prob': 0.6645090775211072}, {'l..."
710698,user_08692,1,"[{'label': 1, 'prob': 0.6645090775211072}, {'l..."



## 🛠️ Feature Engineering Section

We will now engineer new features to improve model performance:

- Bucket continuous variables
- Create interaction terms
- Add behavioral flags


In [10]:
# ✅ Create enhanced feature set (Netflix dataset)
%%bigquery --project $project_id
CREATE OR REPLACE TABLE `database-project-467.netflix.churn_features_enhanced` AS
SELECT
  user_id,
  region,
  plan_tier,
  age_band,
  avg_rating,
  total_minutes,
  CASE
    WHEN total_minutes < 100 THEN 'low'
    WHEN total_minutes BETWEEN 100 AND 300 THEN 'medium'
    ELSE 'high'
  END AS watch_time_bucket,
  num_sessions,
  CONCAT(plan_tier, '_', region) AS plan_region_combo,
  IF(total_minutes > 500, 1, 0) AS flag_binge,
  churn_label
FROM `database-project-467.netflix.churn_features`;


Query is running:   0%|          |

In [11]:
# ✅ Train enhanced logistic regression model (Netflix dataset)
%%bigquery --project $project_id
CREATE OR REPLACE MODEL `database-project-467.netflix.churn_model_enhanced`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['churn_label']
) AS
SELECT
  region,
  plan_tier,
  age_band,
  watch_time_bucket,
  avg_rating,
  num_sessions,
  plan_region_combo,
  flag_binge,
  churn_label
FROM `database-project-467.netflix.churn_features_enhanced`;


Query is running:   0%|          |

In [12]:
# ✅ Evaluate enhanced model (Netflix dataset)
%%bigquery --project $project_id
SELECT *
FROM ML.EVALUATE(MODEL `database-project-467.netflix.churn_model_enhanced`);


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.660357,1.0,0.660357,0.79544,0.640807,0.501309



## 🤔 Chain-of-Thought Prompts: Feature Engineering

### 1. Why bucket continuous values like watch time?
- What patterns become clearer by using categories like "low", "medium", "high"?

### 2. What value do interaction terms (e.g., `plan_tier_region`) add?
- Could some plans behave differently in different regions?

### 3. What’s the purpose of binary flags like `flag_binge`?
- Can these capture unique behaviors not reflected in raw totals?

### 4. After evaluating the enhanced model:
- Which new features helped the most?
- Did any surprise you?

✍️ Write your responses in a text cell below or in a shared doc for discussion.


### 1. Why bucket continuous values like watch time?
Bucketing continuous values such as watch time into groups like “low,” “medium,” and “high” makes patterns easier to see. It helps show where behavior meaningfully changes. For example, users who watch under 30 minutes might churn at similar rates, while those who watch over two hours rarely do. Turning the continuous range into categories highlights these thresholds and makes the insights more intuitive when explaining results to others.  It also can sometimes be useful for machine learning models.

### 2. What value do interaction terms (e.g., plan_tier_region) add?
Interaction terms help capture how two factors work together. For example, a premium plan might have high retention overall, but in certain regions it could perform differently due to price sensitivity or content preferences. By including interactions like plan_tier_region, the model can learn those differences and give a more complete picture of how plan type and region combine to affect churn.  It is also important for finding casual links, beyond our model interaction terms can help unpack the "why" of what is happening.

### 3. What’s the purpose of binary flags like flag_binge?
Binary flags are useful for identifying unique behaviors that totals don’t fully capture. For instance, someone might have average total watch time but still watch several episodes in a row, which signals strong engagement. A simple flag like flag_binge = 1 tells the model about that specific viewing style, helping it recognize patterns that raw totals would miss.

### 4. After evaluating the enhanced model
The new features that helped the most were the watch-time buckets and the binge flag. Both made it easier for the model to separate engaged users from those likely to churn. The interaction terms also added value, especially for understanding how certain plans performed in specific regions. What surprised me most was how powerful the simple binary flags were.  They added a lot of predictive strength despite being so straightforward.
