<div style="border:solid blue 2px; padding: 20px">

**Overall Summary of the Project**

Hi Joshua! You've done a solid job walking through the key stages of a customer churn prediction task — from data cleaning to model comparison. Your notebook shows a clear effort to explore different imbalance strategies and assess their impact on performance. You correctly applied **class weighting and undersampling**, and your final **Random Forest model** reached an F1 score of **0.591**, which just meets the project’s target.

---

**✅ Strengths**

- **Structured Approach**:
  - Followed a logical pipeline: data cleaning → encoding → model evaluation.
  - Split the project into meaningful steps with code and markdown.

- **Imbalance Strategy Comparison**:
  - Well-structured comparison between **undersampling** and **class weighting** across models.
  - Identified the best strategy and reported it with clarity.

- **Model Variety**:
  - You used three different algorithms and compared them under consistent conditions.

---

**⚠️ Areas for Improvement**

- **Evaluation Depth**:
  - It would have been helpful to include the **AUC-ROC** metric and **confusion matrices** to support the F1 score.
  - Presenting **precision and recall** for all models (not just the best) would strengthen your findings.

- **Threshold Tuning**:
  - Consider adjusting the decision threshold (default = 0.5) to improve the recall of churners — an important metric in churn detection.

- **Label Encoding**:
  - `LabelEncoder` can mislead tree-based models when applied to unordered categories. Consider using `OneHotEncoding` for categorical features like Geography/Gender next time.

- **Presentation**:
  - The project could benefit from more consistent formatting and polished markdown (e.g., fix capitalization, consistent spacing, better section headers like `## Modeling`).

---

**✅ Required Changes for Approval**

✅ Your final model **meets the minimum requirement** (F1 ≥ 0.59), so **no changes are required for approval**. However, deeper evaluation and clearer presentation would bring it to the next level.

---

🎯 Great work, Joshua! With a few refinements in model evaluation and reporting, you’ll take your projects from solid to standout. Keep it up!

# Sprint 8

# project description 
creating a model that will predict whether a customer will leave the bank soon, and will give us the maximum F1 score.

In [7]:
#import libaries 
import pandas as pd 
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import f1_score, classification_report
from sklearn.utils import resample
import numpy as np

In [8]:
#open and look through the data file 
df = pd.read_csv('/datasets/Churn.csv')

In [9]:
#looking at the data 
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


In [10]:
#cleaning up the data 
# Check how many values are missing in each column
missing_values = df.isnull().sum()

# For now, we focus on the 'Tenure' column which has missing values
tenure_mean = df['Tenure'].mean()

# Fill missing 'Tenure' values with the mean (rounded to the nearest integer)
df['Tenure'] = df['Tenure'].fillna(round(tenure_mean))

# Confirm that there are no missing values left
missing_after_cleaning = df.isnull().sum()

display(
missing_values, missing_after_cleaning)

RowNumber            0
CustomerId           0
Surname              0
CreditScore          0
Geography            0
Gender               0
Age                  0
Tenure             909
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Exited               0
dtype: int64

RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

In [11]:
#Examine the balance of classes 

# Drop unnecessary columns
df_model = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

# Encode categorical features
label_encoders = {}
for col in ['Geography', 'Gender']:
    le = LabelEncoder()
    df_model[col] = le.fit_transform(df_model[col])
    label_encoders[col] = le

# Define features and target
X = df_model.drop('Exited', axis=1)
y = df_model['Exited']

# Split the data
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

# Check class balance
class_distribution = y.value_counts(normalize=True)

# Train model without handling imbalance
basic_model = RandomForestClassifier(random_state=42)
basic_model.fit(X_train, y_train)
y_pred_basic = basic_model.predict(X_valid)

# Evaluate the model
f1_basic = f1_score(y_valid, y_pred_basic)
report_basic = classification_report(y_valid, y_pred_basic, output_dict=True)

class_distribution, f1_basic, report_basic

(0    0.7963
 1    0.2037
 Name: Exited, dtype: float64,
 0.5916870415647922,
 {'0': {'precision': 0.8781378366042902,
   'recall': 0.9663485685585133,
   'f1-score': 0.9201339072214251,
   'support': 1991},
  '1': {'precision': 0.7831715210355987,
   'recall': 0.47544204322200395,
   'f1-score': 0.5916870415647922,
   'support': 509},
  'accuracy': 0.8664,
  'macro avg': {'precision': 0.8306546788199445,
   'recall': 0.7208953058902586,
   'f1-score': 0.7559104743931087,
   'support': 2500},
  'weighted avg': {'precision': 0.8588026947545045,
   'recall': 0.8664,
   'f1-score': 0.8532621253737347,
   'support': 2500}})

### Findings
 Class Balance:
Not Churned (0): 79.6%

Churned (1): 20.4%

This shows a class imbalance, with significantly fewer churned customers.

Model Without Handling Imbalance:
F1 Score for Churned Class (1): 0.59

Precision (1): 0.78

Recall (1): 0.48

Overall Accuracy: 86.6%

The model is biased toward predicting the majority class (Not Churned).

It performs well in terms of overall accuracy, but recall for churned customers is low (only ~48%).

The F1 score for churned customers (our minority class of interest) is 0.59, which is relatively low.

In [12]:
#building a models to find the best F1 score 
# Drop identifier columns
df_model = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

# Encode categorical columns
label_encoders = {}
for column in ['Geography', 'Gender']:
    le = LabelEncoder()
    df_model[column] = le.fit_transform(df_model[column])
    label_encoders[column] = le

# Features and target
X = df_model.drop('Exited', axis=1)
y = df_model['Exited']

# Split the data
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

# Combine training features and target
train_df = pd.concat([X_train, y_train], axis=1)

# Undersample majority class
not_churn = train_df[train_df['Exited'] == 0]
churn = train_df[train_df['Exited'] == 1]
not_churn_downsampled = resample(not_churn, replace=False, n_samples=len(churn), random_state=42)
undersampled_train_df = pd.concat([not_churn_downsampled, churn])

X_undersampled = undersampled_train_df.drop('Exited', axis=1)
y_undersampled = undersampled_train_df['Exited']

# Define models
models = {
    'RandomForest': RandomForestClassifier(random_state=42),
    'LogisticRegression': LogisticRegression(max_iter=1000, random_state=42),
    'DecisionTree': DecisionTreeClassifier(random_state=42)
}

# Store results
results = []

# Train and evaluate models with two strategies
for model_name, model in models.items():
    # 1. Using class weights
    if hasattr(model, 'class_weight'):
        model.set_params(class_weight='balanced')
        model.fit(X_train, y_train)
        y_pred = model.predict(X_valid)
        f1_weighted = f1_score(y_valid, y_pred)
        results.append({'Sampling': 'ClassWeight', 'Model': model_name, 'F1 Score': f1_weighted})

    # 2. Using undersampling
    model.fit(X_undersampled, y_undersampled)
    y_pred_under = model.predict(X_valid)
    f1_under = f1_score(y_valid, y_pred_under)
    results.append({'Sampling': 'Undersampling', 'Model': model_name, 'F1 Score': f1_under})

# Compile results
results_df = pd.DataFrame(results)
best_result = results_df.loc[results_df['F1 Score'].idxmax()]

best_result

Sampling    Undersampling
Model        RandomForest
F1 Score         0.591506
Name: 1, dtype: object

# conclusion 
I tested two methods to address class imbalance:

Class Weighting (class_weight='balanced')

Undersampling the majority class to match the number of churn cases

Each method was applied to three models:

Random Forest

Logistic Regression

Decision Tree

 Best Model:
Model: Random Forest

Sampling Strategy: Undersampling

F1 Score: 0.591

📝 Findings:
Undersampling helped the Random Forest model deliver the best F1 score, outperforming class weighting.

Models trained with class weights had competitive scores but didn’t surpass the undersampled Random Forest.

This confirms that resampling techniques can effectively improve detection of churned customers in imbalanced datasets.