<div style="border: solid blue 2px; padding: 15px; margin: 10px">
<b>Overall Summary of the Project ‚Äì Iteration 2</b>

Hi Sebastian, great job! You've successfully addressed all previous comments and brought the project to completion. Your final version is well-structured, thoughtful, and meets all the requirements ‚Äî the project is now approved. ‚úÖ

---

<b>Nice work on:</b>  
‚úîÔ∏è Thorough data preparation and handling of missing values  
‚úîÔ∏è Effective use of class balancing with both class weights and upsampling  
‚úîÔ∏è Strong model tuning with GridSearchCV and clear final evaluation using the test set only

<hr>

Just a quick reminder:  
üü¢ Green comments highlight great solutions worth keeping.  
üü° Yellow comments are suggestions for optimization.  
üî¥ Red comments must be fixed for a project to be approved.  
üîµ You can use blue to leave your own comments or questions if needed.

<hr>

Please make sure all cells run top to bottom and produce outputs before submitting.  
And don‚Äôt move, change, or delete reviewer comments ‚Äî it helps us track your progress more easily.

<br><br>
<b>Best,</b><br>
<b>Victor Camargo (Discord: camargo.victor)</b><br>
Feel free to reach out in the DS channel (<code>ds-questions</code>) if you need further help!

P.S. Don‚Äôt forget to rate your experience by leaving feedback here:  
https://form.typeform.com/to/wIDK4zE5
</div>

# Beta Bank
## Project Overview
The goal of this project is to predict whether a customer will leave the bank soon. With the data provided on clients‚Äô past behavior and termination of contracts with the bank, building a model that will complete this task with the best possible results for the bank. 

## 1. Environment Setup and Required Libraries

<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Clear and concise project overview! You‚Äôve effectively communicated the business problem and the goal of the machine learning model. Great start.
</div>

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score, roc_auc_score
from sklearn.dummy import DummyClassifier
from sklearn.utils import shuffle
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

In [2]:
data = pd.read_csv('/datasets/Churn.csv')

In [3]:
display(data)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5.0,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10.0,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7.0,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3.0,75075.31,2,1,0,92888.52,1


In [4]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB
None


In [5]:
data.isnull().sum()

RowNumber            0
CustomerId           0
Surname              0
CreditScore          0
Geography            0
Gender               0
Age                  0
Tenure             909
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Exited               0
dtype: int64

In [6]:
#Checking for missing values 
missing_percentage = (data['Tenure'].isnull().sum() / len(data['Tenure'])) * 100
print(missing_percentage)

9.09


In [7]:
#Find the median of the values in the column with missing values
print(data['Tenure'].median())

5.0


In [8]:
#Replace the missing values with the median
data['Tenure'].fillna(5.0, inplace=True)

In [9]:
data.isnull().sum()

RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Great job here ‚Äî everything is correctly implemented and clearly explained!  
  You‚Äôve successfully loaded the data, checked for null values, and handled missing values in the <code>Tenure</code> column by imputing the median. This is a solid start to your preprocessing.
</div>

In [10]:
# Remove only the irrelevant categorical columns
data_processed = data.drop(['Surname', 'RowNumber', 'CustomerId'], axis=1)

# Encode the potentially useful categorical columns
le_geography = LabelEncoder()
le_gender = LabelEncoder()

data_processed['Geography'] = le_geography.fit_transform(data_processed['Geography'])
data_processed['Gender'] = le_gender.fit_transform(data_processed['Gender'])

In [11]:
display(data_processed)

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,0,0,42,2.0,0.00,1,1,1,101348.88,1
1,608,2,0,41,1.0,83807.86,1,0,1,112542.58,0
2,502,0,0,42,8.0,159660.80,3,1,0,113931.57,1
3,699,0,0,39,1.0,0.00,2,0,0,93826.63,0
4,850,2,0,43,2.0,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...
9995,771,0,1,39,5.0,0.00,2,1,0,96270.64,0
9996,516,0,1,35,10.0,57369.61,1,1,1,101699.77,0
9997,709,0,0,36,7.0,0.00,1,0,1,42085.58,1
9998,772,1,1,42,3.0,75075.31,2,1,0,92888.52,1


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Well done ‚Äî everything looks great now!
</div>

<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Nice work encoding the <code>Geography</code> and <code>Gender</code> columns ‚Äî LabelEncoder is a suitable choice here, and the implementation is clean.  
  <br><br>
  Dropping <code>'Surname'</code> also makes sense since it's unlikely to have predictive value. However, it‚Äôs worth questioning why <code>'RowNumber'</code> and <code>'CustomerId'</code> are being kept ‚Äî these are identifiers and typically don‚Äôt provide meaningful information for modeling. Consider dropping them to keep only relevant features.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>

The other 2 columns with irrelvant information for the model were removed from the data set using drop(). and then confirmed that they were removed using display(). 
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment: </b>

First I loaded all the necessary libraries to work on this project, then inspected the given data set using info() to check for missing values, data type, number and name of columns and size of the data set, after inspecting, only one column (Tenure) showed missing values, so I made the choice to replace them with the median, next checked again to confirm the changes appeared, as the final step I removed the column surname since its data type it's object and the model will only work with numerical values and the other 2 columns with object as their data type I label encoded, and created the data_processed variable to work with for the rest of the project.
    
</div>

## 2. Split the source data into a training set, a validation set, and a test set

In [12]:
# 1. Split into a temporary training+validation set and a test set
data_temp, data_test = train_test_split(data_processed, test_size=0.2, random_state=54321) 

# 2. Split the temporary set into the actual training and validation sets
data_train, data_valid = train_test_split(data_temp, test_size=0.25, random_state=54321)

# 2. Checking the sizes of each set
print(f"Training set size: {len(data_train)}")
print(f"Validation set size: {len(data_valid)}")
print(f"Test set size: {len(data_test)}")

Training set size: 6000
Validation set size: 2000
Test set size: 2000


In [13]:
# Define the variables the model will work with

X_train = data_train.drop('Exited', axis=1)
y_train = data_train['Exited']
X_valid = data_valid.drop('Exited', axis=1)
y_valid = data_valid['Exited']
X_test = data_test.drop('Exited', axis=1)
y_test = data_test['Exited']

<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Excellent work splitting the dataset into training, validation, and test sets ‚Äî the proportions are reasonable, and the separation logic is clearly implemented. Defining your feature and target variables right after the split is also a great practice.
</div>

In [14]:
# Basic statistics for all numeric columns
numeric_candidates = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
print(data[numeric_candidates].describe())

        CreditScore           Age       Tenure        Balance  NumOfProducts  \
count  10000.000000  10000.000000  10000.00000   10000.000000   10000.000000   
mean     650.528800     38.921800      4.99790   76485.889288       1.530200   
std       96.653299     10.487806      2.76001   62397.405202       0.581654   
min      350.000000     18.000000      0.00000       0.000000       1.000000   
25%      584.000000     32.000000      3.00000       0.000000       1.000000   
50%      652.000000     37.000000      5.00000   97198.540000       1.000000   
75%      718.000000     44.000000      7.00000  127644.240000       2.000000   
max      850.000000     92.000000     10.00000  250898.090000       4.000000   

         HasCrCard  IsActiveMember  EstimatedSalary  
count  10000.00000    10000.000000     10000.000000  
mean       0.70550        0.515100    100090.239881  
std        0.45584        0.499797     57510.492818  
min        0.00000        0.000000        11.580000  
25%      

In [15]:
X_train

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
7296,663,1,0,37,8.0,155303.71,1,1,0,118716.63
1437,670,0,0,31,5.0,0.00,1,0,1,76254.83
4734,590,0,0,54,4.0,0.00,2,1,1,93820.49
7474,704,0,1,50,4.0,165438.26,1,1,0,120770.75
5625,508,0,0,60,7.0,143262.04,1,1,1,129562.74
...,...,...,...,...,...,...,...,...,...,...
6714,825,0,0,36,3.0,146053.66,1,1,1,138344.70
5751,645,0,1,40,6.0,131411.24,1,1,1,194656.11
2885,660,2,1,42,5.0,0.00,2,1,0,115509.59
7112,670,1,0,35,2.0,79585.96,1,0,1,198802.90


In [16]:
#Standarizing all columns with wide value ranges

numeric = ['CreditScore', 
           'Geography', 
           'Gender', 
           'Age', 
           'Tenure', 
           'Balance', 
           'NumOfProducts', 
           'HasCrCard', 
           'IsActiveMember', 
           'EstimatedSalary']

scaler = StandardScaler()
scaler.fit(X_train[numeric])
X_train[numeric] = scaler.transform(X_train[numeric])
X_valid[numeric] = scaler.transform(X_valid[numeric])
X_valid[numeric] = scaler.transform(X_test[numeric])
print(X_train.head())

      CreditScore  Geography    Gender       Age    Tenure   Balance  \
7296     0.127297   0.296110 -1.108151 -0.183892  1.082399  1.254134   
1437     0.200118  -0.909222 -1.108151 -0.750537 -0.000662 -1.235448   
4734    -0.632117  -0.909222 -1.108151  1.421602 -0.361682 -1.235448   
7474     0.553817  -0.909222  0.902404  1.043838 -0.361682  1.416595   
5625    -1.485157  -0.909222 -1.108151  1.988246  0.721379  1.061101   

      NumOfProducts  HasCrCard  IsActiveMember  EstimatedSalary  
7296      -0.920246   0.643501       -1.015791         0.319048  
1437      -0.920246  -1.553999        0.984454        -0.422309  
4734       0.824296   0.643501        0.984454        -0.115623  
7474      -0.920246   0.643501       -1.015791         0.354911  
5625      -0.920246   0.643501        0.984454         0.508414  


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Good job!
</div>

<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Good job applying <code>StandardScaler</code> to several numeric columns. However, a few things still need your attention here:
  <br><br>
  The <code>X_test</code> set is not being scaled ‚Äî this is necessary to ensure consistent feature representation during final evaluation.  
  Also, consider whether columns like <code>Geography</code>, <code>Gender</code>, <code>HasCrCard</code>, and <code>IsActiveMember</code> should also be scaled. These features are numerical (after encoding) and are used directly in many models ‚Äî leaving them on a different scale may affect performance, especially for linear classifiers.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>

The correction was aplied: The test set was scaled and the columns with numerical features left out in the original code were also scaled. 
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment: </b>

The data set was split according to the instrucions using <b>train_test_split()</b> and the sizes of each set checked, then the variables for feautres and target are defined, next I checked the statistics of each column to see their ranges from min to max and determine which ones needed to be standarized as the last step.
    
</div>

## 3. Testing different models without fixing the class imbalance

In [17]:
# Training model without taking into account the imbalance

# Initiating LogisticRegression constructor with same random_state and solver='liblinear'
model = LogisticRegression(random_state=54321, solver='liblinear')

# Training model with training sets 
model.fit(X_train, y_train)

#Creating predictions 
predicted_valid = model.predict(X_valid)

# Finding metric scores of LogisticRegression model on validation set
accuracy_score_valid = model.score(X_valid, y_valid)

recall_score_valid = recall_score(y_valid, predicted_valid)

precision_score_valid = precision_score(y_valid, predicted_valid)

f1_score_valid = f1_score(y_valid, predicted_valid)


# Printing results

#Accuracy
print(f"Accuracy Score for Validation Set: {accuracy_score_valid}")
print()

#Recall
print(f"Recall Score for Validation Set: {recall_score_valid}")
print()

#Precision
print(f"Precision Score for Validation Set: {precision_score_valid}")
print()

#F1 Score
print(f"F1 Score for Validation Set: {f1_score_valid}")

Accuracy Score for Validation Set: 0.7535

Recall Score for Validation Set: 0.05203619909502263

Precision Score for Validation Set: 0.23711340206185566

F1 Score for Validation Set: 0.08534322820037106


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Great job!
</div>

<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  You're correctly testing a baseline model here, but a few critical issues need to be addressed:
  <br><br>
  ‚Ä¢ Predicting on the training set is not necessary in this context ‚Äî it's better to focus on validation performance unless you're checking for overfitting.<br>
  ‚Ä¢ More importantly, getting an F1 score of 0 for both training and validation sets is a major red flag. This suggests the model is failing to predict any positive cases at all.
  <br><br>
  One likely cause is that you are still including non-informative identifier columns like <code>RowNumber</code> and <code>CustomerId</code> in your feature set. These can confuse the model and severely hurt performance. Make sure to remove any columns that don‚Äôt carry meaningful information before training.
</div>


<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>
The correction was applied and the model's predictions were only done on the validation set, since the numerical columns were also properly scaled now the scores for F1, recall and precision have increased values after being 0.0. 
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment: </b>

As isntructed in the project's description, the first model training was done with the raw data set and no class imbalance was fixed yet at this point, which shows a high accuracy scores for both training and validation set, however this result is misleading since other metrics scores are also calculated below accuracy to compare, specifically recall, precision and F1, all show scores of 0.0 meaning the model is not predicting correctly if at all when costumers are going to leave the bank. This is most likely due to the imbalance of class in the target column, and the bias of the model toward the majority class, which is 0 = stayed. 
    
</div>

## 4. Fixing the imbalance of classes using at least 2 different methods

In [18]:
#Checking class imbalance for the entire data and the test set

print(data['Exited'].value_counts())
print()
print(data_test['Exited'].value_counts())

0    7963
1    2037
Name: Exited, dtype: int64

0    1610
1     390
Name: Exited, dtype: int64


## Method 1: Class Weight adjustment for Logistic Regression Model.

In [19]:
#Calculating F1 score for the validation set.

model = LogisticRegression(random_state=54321, solver='liblinear', class_weight='balanced')
model.fit(X_train, y_train)
predicted_valid = model.predict(X_valid)
print('F1 Score:', f1_score(y_valid, predicted_valid))

F1 Score: 0.2740676496097138


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Nicely done! Applying <code>class_weight='balanced'</code> to Logistic Regression clearly helped address the imbalance, and your resulting F1 score confirms the model is now identifying both classes. This is a strong improvement over the baseline.
</div>


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Issue addressed!
</div>


<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  This test set evaluation should be removed ‚Äî at this stage, you're still comparing models, and the test set should be reserved strictly for evaluating your final, best-performing model.  
  <br><br>
  Using the test set too early can introduce data leakage and lead to overfitting. Stick to validation set results while tuning and comparing approaches.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>

Correction applied and the early evaluation of the test set at this point in the cell above was removed. 
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment: </b>

I adjusted the class_weight to = 'balanced' for both test and validation sets and compared the results of the F1 scores obtained, it slightly imoproved from 0.0 however did not meet the threshold of 0.59 so this technique did not work. 
    
</div>

## Method 2: Upsampling for Logistic Regression Model.

In [20]:
#defining the variables to be used in upsampling 

features_zeros = X_train[y_train == 0]
features_ones = X_train[y_train == 1]
target_zeros = y_train[y_train == 0]
target_ones = y_train[y_train == 1]

print(features_zeros.shape)
print(features_ones.shape)
print(target_zeros.shape)
print(target_ones.shape)

(4795, 10)
(1205, 10)
(4795,)
(1205,)


In [21]:
#Define a function to perform the upsampling on the training set

def upsample(X_train, y_train, repeat):
    features_zeros = X_train[y_train == 0]
    features_ones = X_train[y_train == 1]
    target_zeros = y_train[y_train == 0]
    target_ones = y_train[y_train == 1]

    arg1 = pd.concat([features_zeros] + [features_ones] * repeat)
    arg2 = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        arg1, arg2, random_state=54321
    )

    return features_upsampled, target_upsampled, arg1, arg2


features_upsampled, target_upsampled, arg_1, arg_2 = upsample(
    X_train, y_train, 2
)

print(features_upsampled.shape)
print(target_upsampled.shape)

(7205, 10)
(7205,)


In [22]:
model = LogisticRegression(random_state=54321, solver='liblinear')
model.fit(features_upsampled, target_upsampled)
predicted_valid = model.predict(X_valid)

print('F1:', f1_score(y_valid, predicted_valid))

F1: 0.19121447028423774


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Issue addressed!
</div>


<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Good job implementing upsampling ‚Äî the function is well-written and the logic is sound. However, the fact that your model is still producing an F1 score of 0 after balancing the classes strongly suggests a deeper issue.  
  <br><br>
  Most likely, the model is being negatively affected by non-informative identifier columns like <code>RowNumber</code> and <code>CustomerId</code>, which are still present in the feature set. These columns don‚Äôt provide meaningful information and may disrupt the learning process. Be sure to drop them before training your models.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment: </b>

After getting a F1 score of 0.0 again, I came to the conlusion that the model is being overly conservative - even with upsampling, it's still reluctant to predict churn (class 1) with high confidence. So this method made the F1 score worse. 
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1 : </b>

After fixing the issues in  the data set columns with numeric and not meaningful values/information for the model, the F1 score increased from 0.0 to 0.19, not great yet at this point but it shows improvement. 
    
</div>

### Investigating the Model's Behavior.

In [23]:
# Checking what is happening with the model's predictions. 

print("Unique values in y_valid:", np.unique(y_valid))
print("Unique values in predicted_valid:", np.unique(predicted_valid))
print("Count of each class in y_valid:", np.bincount(y_valid))
print("Count of each class in predicted_valid:", np.bincount(predicted_valid))

Unique values in y_valid: [0 1]
Unique values in predicted_valid: [0 1]
Count of each class in y_valid: [1558  442]
Count of each class in predicted_valid: [1668  332]


In [24]:
#Checking if the upsampling worked.

print("Training data after upsampling:")
print("Class distribution:", np.bincount(target_upsampled))
print("Ratio:", np.bincount(target_upsampled)[0] / np.bincount(target_upsampled)[1])

Training data after upsampling:
Class distribution: [4795 2410]
Ratio: 1.9896265560165975


In [25]:
#Checking the model's confidence when predicting classes. 

predicted_proba = model.predict_proba(X_valid)
print("Sample of prediction probabilities (first 10 rows):")
print(predicted_proba[:10])
print("Max probability for class 1:", predicted_proba[:, 1].max())

Sample of prediction probabilities (first 10 rows):
[[0.48932985 0.51067015]
 [0.65555523 0.34444477]
 [0.80324851 0.19675149]
 [0.46326845 0.53673155]
 [0.69743554 0.30256446]
 [0.74707831 0.25292169]
 [0.50105074 0.49894926]
 [0.75496351 0.24503649]
 [0.56441904 0.43558096]
 [0.65907835 0.34092165]]
Max probability for class 1: 0.9099013829348903


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Excellent work investigating the model‚Äôs behavior. You clearly checked the predicted classes, verified that upsampling was applied, and explored the output probabilities to understand the model's confidence levels. This type of diagnostic thinking is crucial when debugging poor model performance ‚Äî great job!
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment: </b>

I ran a diagnosis to find out why the F1 score went back to 0.0 after umsampling, the maximum probability for the class 1 (churn) to be predicted by the model was 33.9% and the default threshold is 50%, the model will likely always ignore class 1 when making predictions and the number of positive answers 0.0 for the F1 score.
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment ‚Äì Iteration 1: </b>

After applying all the corrections from the reviwer, the max probabilitie for class 1 to be predicted changed dramatically drom 33.9% to 91%
    
</div>

### Method 1: Class Weight adjustment for Random Forest Classifier Model.

<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Issue addressed!
</div>


<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Although the model is correctly defined and trained, calculating the F1 score on the training set alone isn‚Äôt particularly useful at this stage. It's more important to focus on validation performance to understand how well the model generalizes. Unless you're specifically analyzing overfitting, you can skip evaluating on the training set. Consider removing this cell.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>

The cell above containing the F1 score for the training set was removed. 
    
</div>

In [26]:
#Define the Model for validation set
model_2 = RandomForestClassifier(random_state=54321, n_estimators=69, max_depth=10, class_weight='balanced')
model_2.fit(X_train, y_train)
predicted_valid = model_2.predict(X_valid)
print('F1 Score for Validation Set:', f1_score(y_valid, predicted_valid))

F1 Score for Validation Set: 0.1884057971014493


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Great job applying class weighting with Random Forest and properly evaluating the model on the validation set. The F1 score exceeds the project threshold, which shows this model is performing well and learning to handle the imbalance. This is exactly the kind of evaluation we want to see at this stage.
</div>

<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Issue addressed!
</div>


<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Be careful using the test set at this stage ‚Äî it should only be used once, with the final selected model, after all training and tuning are complete.  
  <br><br>
  Evaluating on the test set too early can leak information about the data distribution and lead to biased performance estimates. For now, focus on comparing models using the validation set only. Consider removing this cell.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>

The cell above with the calculation of the F1 score for the test set was also removed to prevent data leakage and only the validation set was left in place. 
    
</div>

### Finding the best parameters to improve the model.

In [27]:
# Define the model
model_2 = RandomForestClassifier(random_state=54321)

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [250, 300, 350],
    'max_depth': [30, 35, 40, 45],
    'min_samples_split': [25, 30, 35],
    'class_weight': ['balanced']
}

# Set up GridSearchCV
grid_search = GridSearchCV(estimator=model_2, param_grid=param_grid, cv=5, scoring='f1')

# Fit the model
grid_search.fit(data_train.drop('Exited', axis=1), data_train['Exited'])

# Find the best parameters
best_params = grid_search.best_params_
best_score = grid_search.best_score_

# Output
print(f"Best Parameters: {best_params}")
print(f"Best Cross-Validation Score: {best_score}")

Best Parameters: {'class_weight': 'balanced', 'max_depth': 30, 'min_samples_split': 25, 'n_estimators': 250}
Best Cross-Validation Score: 0.598603184301911


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  Excellent work setting up and running <code>GridSearchCV</code> for hyperparameter tuning. The parameter grid is well thought out, and using <code>f1</code> as the scoring metric is a smart choice given the class imbalance. Great job including the best parameters and cross-validation score in the output ‚Äî this is a strong, well-executed tuning step.
</div>


## 5. Perform the final test. 

In [28]:
# Initializing with the best parameters from GridSearch
   
model_2_best = model_2_best = RandomForestClassifier(
    n_estimators=best_params['n_estimators'],
    max_depth=best_params['max_depth'],
    min_samples_split=best_params['min_samples_split'],
    class_weight='balanced',
    random_state=54321
)

# Train the model with the best parameters
model_2_best.fit(data_train.drop('Exited', axis=1), data_train['Exited'])

# Evaluate on test set
test_score = model_2_best.score(data_test.drop('Exited', axis=1), data_test['Exited'])
print(f"Test Score: {test_score}")

Test Score: 0.859


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Excellent!
</div>

<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  You've successfully applied the best parameters from GridSearch and retrained the model ‚Äî great! However, since this is the final test phase, you should now evaluate the model on the <code>test</code> set only.  
  <br><br>
  The validation set was already used during model selection, so including it here again doesn‚Äôt add value and could be confusing. At this point, focus strictly on reporting the final test performance.
</div>


<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Student's Comment‚Äì Iteration 1: </b>

The correction was applied and the evaluation with the best parameters was done only on the test set. 
    
</div>

In [29]:
# Making predictions on the test set. 
test_predictions = model_2_best.predict(data_test.drop('Exited', axis=1))

# Calculate F1 score
test_f1 = f1_score(data_test['Exited'], test_predictions)

#Calculate ROC curve
probabilities_test = model_2_best.predict_proba(X_test) 
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(y_test, probabilities_one_test)

print(f"Test F1 Score: {test_f1}")
print(f"Test ROC AUC Score: {auc_roc}")

Test F1 Score: 0.6552567237163814
Test ROC AUC Score: 0.8740149705367096


<div style="border: 3px solid #5cb85c; padding: 12px; margin: 10px; border-radius: 5px; background-color: #f5fdf5">
  <b>Reviewer's comment ‚Äì Iteration 2:</b><br>
  Perfect ‚Äî your final test is now well-isolated and correctly executed.
</div>

<div style="border: 3px solid #d9534f; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fdf5f5">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  At this stage ‚Äî performing final evaluation ‚Äî you should focus only on the <code>test</code> set. The validation set has already been used for tuning and comparisons, so including it again here is unnecessary.  
  <br><br>
  Also, there's a mismatch in the ROC AUC calculation: you're using predicted probabilities from the <code>test</code> set, but comparing them against <code>y_valid</code> (validation labels). Be sure to use the correct target values for the dataset being evaluated. This might be skewing your results.
</div>


<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b>Student's comment‚Äì Iteration 1: </b>

The missmatch in the calculation of ROC AUC score was corrected, and all the calculations in the model were done on the test set only.
    
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b> Final conclusion: </b>

After manual hyperparameter tuning and systematic optimization using GridSearchCV, the model successfully exceeded the F1 score threshold of 0.59, achieving 0.65. The combination of preparing the data properly dealing with missing and numeric values, addressing class imbalance through class_weight = 'balanced' and optimizing hyperparameters was crucial - the model now properly identifies customers likely to leave rather than ignoring the minority class as it did initially. This performance demonstrates good generalization and real business value: a bank using this model could proactively identify at-risk customers and implement retention strategies.
    
</div>

<div style="border: 3px solid #f0ad4e; padding: 12px; margin: 10px; border-radius: 5px; background-color: #fcf8e3">
  <b>Reviewer's comment ‚Äì Iteration 1:</b><br>
  This is a well-crafted and thoughtful conclusion ‚Äî great job summarizing the impact of your tuning and handling of class imbalance.  
  <br><br>
  Just make sure to revisit this summary after correcting the issues in the final evaluation (particularly the test set ROC AUC and validation/test split usage). Once those are fixed, your conclusion will align perfectly with the final results.
</div>

<div class="alert alert-info" style="border-radius: 15px; box-shadow: 4px 4px 4px; border: 1px solid ">
<b>Student's comment‚Äì Iteration 1: </b>

The final conclusion was edidted according to the new results for the model' scores after all the corrections were made.
    
</div>