INTRODUCTION
We will be using machine learning models to predict whether a customer will leave Beta Bank soon. The data we have is on clients’ past behavior and termination of contracts with the bank.
We will use various models and optimize parameters to obtain the most accruate machine learning model. Our goal is to acheive a F1 score of .59.

In [1]:
#import requred python packages and import data
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
from sklearn.preprocessing import OrdinalEncoder
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('Churn.csv')

#display dataframe info for preprocessing
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


In [2]:
#fill in missing values in the 'Tenure' column with the average (rouded) and change column to int type
mean = round(df['Tenure'].mean())
df['Tenure'] = df['Tenure'].fillna(mean)
df['Tenure'] = df['Tenure'].astype(int)

#for our machine learning models, we don't need these columns
df = df.drop(['RowNumber','CustomerId','Surname'], axis=1)

In [3]:

# Select the columns to encode
cols_to_encode = ['Geography', 'Gender']

# Create a new dataframe with only those columns
data_to_encode = df[cols_to_encode]

# Apply OrdinalEncoder to the new dataframe
encoder = OrdinalEncoder()
encoded_data = encoder.fit_transform(data_to_encode)

# Convert the encoded data to a dataframe with the original column names
encoded_df = pd.DataFrame(encoded_data, columns=cols_to_encode)

# Join the encoded columns back to the original dataframe
df_encoded = df.drop(cols_to_encode, axis=1).join(encoded_df)

# Now df_encoded contains all the original columns plus the encoded columns

#checking our pre-processed and optimized for machine learning dataframe 
df_encoded.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CreditScore      10000 non-null  int64  
 1   Age              10000 non-null  int64  
 2   Tenure           10000 non-null  int64  
 3   Balance          10000 non-null  float64
 4   NumOfProducts    10000 non-null  int64  
 5   HasCrCard        10000 non-null  int64  
 6   IsActiveMember   10000 non-null  int64  
 7   EstimatedSalary  10000 non-null  float64
 8   Exited           10000 non-null  int64  
 9   Geography        10000 non-null  float64
 10  Gender           10000 non-null  float64
dtypes: float64(4), int64(7)
memory usage: 859.5 KB


In [4]:
#create new dataframe and series for the features and target
features = df_encoded.drop(['Exited'], axis=1)
target = df_encoded['Exited']

In [5]:
#make the 20% of the dataset into a testing dating set, and 80% for training
features_train, features_test, target_train, target_test = train_test_split(
    features, target, test_size=0.2, random_state=12345) 

#split 25% of traning set towards validation. Now we have 60/20/20 split for training, testing, and validation respectively
features_train, features_val, target_train, target_val = train_test_split(
    features_train, target_train, test_size=0.25, random_state=12345)

print(features_train.shape)
print(features_test.shape)
print(features_val.shape)

print(target_train.shape)
print(target_test.shape)
print(target_val.shape)


(6000, 10)
(2000, 10)
(2000, 10)
(6000,)
(2000,)
(2000,)


In [6]:
#Standardize the continuous numerical features
numeric = ['CreditScore', 'Age', 'Tenure',
       'Balance', 'NumOfProducts', 'EstimatedSalary']

scaler = StandardScaler()
scaler.fit(features_train[numeric])
features_train[numeric] = scaler.transform(features_train[numeric])
features_test[numeric] = scaler.transform(features_test[numeric])
features_val[numeric] = scaler.transform(features_val[numeric])

#this improved the logistic Regression Model

Part 1:

First, we will use three different models (Logistic Regression, Decision Tree, and Random Forest) on our training dataset. WE WILL NOT FIX CLASS INBALANCE in this part.
However after each model, we will alter the model's threshold and obtain the best F1 Score.

In [7]:
#Logistic Regression WITHOUT any balancing

model_lg = LogisticRegression(random_state=12345, solver='liblinear')
model_lg.fit(features_train, target_train)
predicted_test = model_lg.predict(features_test)
print('F1:', f1_score(target_test, predicted_test).round(2), "without altering model's thresold")
print()

# Predict on validation set and tune threshold
best_f1_lg = 0
best_threshold_lg = 0

probs_valid_lg = model_lg.predict_proba(features_val)
probs_one_valid_lg = probs_valid_lg[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_lg = probs_one_valid_lg > threshold
    f1 = f1_score(target_val, predicted_valid_lg)
    if f1 > best_f1_lg:
        best_f1_lg = f1
        best_threshold_lg = threshold

# Predict on test set using best threshold
probs_test_lg = model_lg.predict_proba(features_test)
probs_one_test_lg = probs_test_lg[:, 1]
predicted_test_lg = probs_one_test_lg > best_threshold_lg
best_f1_lg_test = f1_score(target_test, predicted_test_lg)

print('Best threshold:', best_threshold_lg.round(2))
print('Best F1 score on validation set:', best_f1_lg.round(2))
print('F1 score on test set:', best_f1_lg_test.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_lg).round(2))

F1: 0.26 without altering model's thresold

Best threshold: 0.24
Best F1 score on validation set: 0.47
F1 score on test set: 0.49
AUC-ROC: 0.75


In [8]:
#Decision Tree WITHOUT any balancing

best_depth = 0
best_dt_score = 0
for depth in range(1, 11):
    dt_model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    dt_model.fit(features_train, target_train)
    dt_score_val = dt_model.score(features_val, target_val) #testing the model with the testing set
    if dt_score_val > best_dt_score:
        best_dt_score = dt_score_val#if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_depth = depth #same idea with depth
        
best_dt_model = DecisionTreeClassifier(random_state=12345,max_depth=best_depth)
best_dt_model.fit(features_train, target_train)        

predicted_test_dt = best_dt_model.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_dt).round(2), "with depth of", best_depth)
print()

# Predict on validation set and tune threshold
best_f1_dt = 0
best_threshold_dt = 0

probs_valid_dt = best_dt_model.predict_proba(features_val)
probs_one_valid_dt = probs_valid_dt[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_dt = probs_one_valid_dt > threshold
    f1 = f1_score(target_val, predicted_valid_dt)
    if f1 > best_f1_dt:
        best_f1_dt = f1
        best_threshold_dt = threshold        
               
# Predict on test set using best threshold
probs_test_dt = best_dt_model.predict_proba(features_test)
probs_one_test_dt = probs_test_dt[:, 1]
predicted_test_dt = probs_one_test_dt > best_threshold_dt
best_f1_dt_test = f1_score(target_test, predicted_test_dt)

print('Best threshold:', best_threshold_dt.round(2))
print('Best F1 score on validation set:', best_f1_dt.round(2))
print('F1 score on test set:', best_f1_dt_test.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_dt).round(2))


F1: 0.5 with depth of 6

Best threshold: 0.28
Best F1 score on validation set: 0.57
F1 score on test set: 0.61
AUC-ROC: 0.84


In [9]:
#Random Forest Classifier WITHOUT any balancing

best_score = 0
best_est = 0
for est in range(1, 21):
    forest_model = RandomForestClassifier(random_state=12345, n_estimators=est)
    forest_model.fit(features_train, target_train)
    forest_score_val = forest_model.score(features_val, target_val)
    if forest_score_val > best_score:
        best_score = forest_score_val #if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_est = est #same thing with est

final_forest_model = RandomForestClassifier(random_state=12345, n_estimators=best_est)
final_forest_model.fit(features_train, target_train)

predicted_test = final_forest_model.predict(features_test)

print('F1:',f1_score(target_test, predicted_test).round(2), "with est of", best_est)
print()

# Predict on validation set and tune threshold
best_f1_rf = 0
best_threshold_rf = 0

probs_valid_rf = final_forest_model.predict_proba(features_val)
probs_one_valid_rf = probs_valid_rf[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_rf = probs_one_valid_rf > threshold
    f1 = f1_score(target_val, predicted_valid_rf)
    if f1 > best_f1_rf:
        best_f1_rf = f1
        best_threshold_rf = threshold         
               
# Predict on test set using best threshold
probs_test_rf = final_forest_model.predict_proba(features_test)
probs_one_test_rf = probs_test_rf[:, 1]
predicted_test_rf = probs_one_test_rf > best_threshold_rf
best_f1_rf_test = f1_score(target_test, predicted_test_rf)

print('Best threshold:', best_threshold_rf.round(2))
print('Best F1 score on validation set:', best_f1_rf.round(2))
print('F1 score on test set:', best_f1_rf_test.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_rf).round(2))


F1: 0.53 with est of 14

Best threshold: 0.44
Best F1 score on validation set: 0.55
F1 score on test set: 0.56
AUC-ROC: 0.83


Part 1 Conclusion:

Altering the thresholds of each model resulted in higher F1 scores, thus meaning, higher quality models.
Decision Tree preformed better than Random Forest. Decision Tee also had a slightly higher AUC-ROC score.
Decision Tree is the only model who acheived a F1 Score highter than .59.
Logistic Regression preformed the worst.

Part 2:

We will use the excact same models, but will use the hyperparameter class_weight ='balanced' in each model.

In [10]:
#Logistic Regression wiith class_weight='balanced'

model_lg_bal = LogisticRegression(random_state=12345, class_weight ='balanced', solver='liblinear')
model_lg_bal.fit(features_train, target_train)
predicted_test_bal = model_lg_bal.predict(features_test)
print('F1:', f1_score(target_test, predicted_test_bal).round(2), "without altering model's thresold")
print()

# Predict on validation set and tune threshold
best_f1_lg_bal = 0
best_threshold_lg_bal = 0

probs_valid_lg_bal = model_lg_bal.predict_proba(features_val)
probs_one_valid_lg_bal = probs_valid_lg_bal[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_lg_bal = probs_one_valid_lg_bal > threshold
    f1 = f1_score(target_val, predicted_valid_lg_bal)
    if f1 > best_f1_lg_bal:
        best_f1_lg_bal = f1
        best_threshold_lg_bal = threshold

# Predict on test set using best threshold
probs_test_lg_bal = model_lg_bal.predict_proba(features_test)
probs_one_test_lg_bal = probs_test_lg_bal[:, 1]
predicted_test_lg_bal = probs_one_test_lg_bal > best_threshold_lg
best_f1_lg_test_bal = f1_score(target_test, predicted_test_lg_bal)

print('Best threshold:', best_threshold_lg_bal.round(2))
print('Best F1 score on validation set:', best_f1_lg_bal.round(2))
print('F1 score on test set:', best_f1_lg_test_bal.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_lg_bal).round(2))

F1: 0.48 without altering model's thresold

Best threshold: 0.58
Best F1 score on validation set: 0.48
F1 score on test set: 0.4
AUC-ROC: 0.75


In [11]:
#Decision Tree with class_weight='balanced'

best_depth = 0
best_dt_score_bal = 0
for depth in range(1, 11):
    dt_model_bal = DecisionTreeClassifier(random_state=12345, class_weight ='balanced', max_depth=depth)
    dt_model_bal.fit(features_train, target_train)
    dt_score_val_bal = dt_model_bal.score(features_val, target_val) #testing the model with the testing set
    if dt_score_val_bal > best_dt_score_bal:
        best_dt_score_bal = dt_score_val_bal#if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_depth = depth #same idea with depth
        
best_dt_model_bal = DecisionTreeClassifier(random_state=12345,max_depth=best_depth)
best_dt_model_bal.fit(features_train, target_train)        

predicted_test_dt_bal = best_dt_model_bal.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_dt_bal).round(2), "with depth of", best_depth)
print()

# Predict on validation set and tune threshold
best_f1_dt_bal = 0
best_threshold_dt_bal = 0

probs_valid_dt_bal = best_dt_model_bal.predict_proba(features_val)
probs_one_valid_dt_bal = probs_valid_dt[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_dt_bal = probs_one_valid_dt_bal > threshold
    f1 = f1_score(target_val, predicted_valid_dt_bal)
    if f1 > best_f1_dt_bal:
        best_f1_dt_bal = f1
        best_threshold_dt_bal = threshold        
               
# Predict on test set using best threshold
probs_test_dt_bal = best_dt_model_bal.predict_proba(features_test)
probs_one_test_dt_bal = probs_test_dt_bal[:, 1]
predicted_test_dt_bal = probs_one_test_dt_bal > best_threshold_dt_bal
best_f1_dt_test_bal = f1_score(target_test, predicted_test_dt_bal)

print('Best threshold:', best_threshold_dt_bal.round(2))
print('Best F1 score on validation set:', best_f1_dt_bal.round(2))
print('F1 score on test set:', best_f1_dt_test_bal.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_dt_bal).round(2))


F1: 0.5 with depth of 6

Best threshold: 0.28
Best F1 score on validation set: 0.57
F1 score on test set: 0.61
AUC-ROC: 0.84


In [12]:
#Random Forest Classifier with class_weight='balanced'

best_score = 0
best_est = 0
for est in range(1, 21):
    forest_model_bal = RandomForestClassifier(random_state=12345, class_weight='balanced', n_estimators=est)
    forest_model_bal.fit(features_train, target_train)
    forest_score_val_bal = forest_model_bal.score(features_val, target_val)
    if forest_score_val_bal > best_score:
        best_score = forest_score_val_bal #if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_est = est #same thing with est

final_forest_model_bal = RandomForestClassifier(random_state=12345, n_estimators=best_est)
final_forest_model_bal.fit(features_train, target_train)

predicted_test = final_forest_model_bal.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_bal).round(2), "with est of", best_est)
print()

# Predict on validation set and tune threshold
best_f1_rf_bal = 0
best_threshold_rf_bal = 0

probs_valid_rf_bal = final_forest_model_bal.predict_proba(features_val)
probs_one_valid_rf_bal = probs_valid_rf_bal[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_rf_bal = probs_one_valid_rf_bal > threshold
    f1 = f1_score(target_val, predicted_valid_rf_bal)
    if f1 > best_f1_rf_bal:
        best_f1_rf_bal = f1
        best_threshold_rf_bal = threshold         
               
# Predict on test set using best threshold
probs_test_rf_bal = final_forest_model_bal.predict_proba(features_test)
probs_one_test_rf_bal = probs_test_rf_bal[:, 1]
predicted_test_rf_bal = probs_one_test_rf_bal > best_threshold_rf_bal
best_f1_rf_test_bal = f1_score(target_test, predicted_test_rf_bal)

print('Best threshold:', best_threshold_rf_bal.round(2))
print('Best F1 score on validation set:', best_f1_rf_bal.round(2))
print('F1 score on test set:', best_f1_rf_test_bal.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_rf_bal).round(2))


F1: 0.48 with est of 16

Best threshold: 0.38
Best F1 score on validation set: 0.56
F1 score on test set: 0.59
AUC-ROC: 0.83


Part 2 Conclusion:

Logistic Regression Model yielded much worse results. Decision Tree remained the same. Random Forest improved slightly.

In [13]:
#checking the inbalance of classes
features_zeros = features_train[target_train == 0]
features_ones = features_train[target_train == 1]
target_zeros = target_train[target_train == 0]
target_ones = target_train[target_train == 1]

print(features_zeros.shape)
print(features_ones.shape)
print(target_zeros.shape)
print(target_ones.shape)

(4781, 10)
(1219, 10)
(4781,)
(1219,)


It seems the there are almost four times more negative classes than positive.
For our upsampling section, we will increase the 'ones' or positive class by a multiplication factor of 4. 
For our downsampling section, we will decreae the 'zero' or negative class by a factor of .25. 

Part 3:

We will use the same three model, but we will fix inbalancing by upsampling.
Once again after each model, we will alter the model's threshold and obtain the best F1 Score.

In [14]:
#creating function for upsample

def upsample(features, target, repeat):

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(features_upsampled, target_upsampled, random_state=12345)

    return features_upsampled, target_upsampled

#upsampled training sets
features_upsampled, target_upsampled = upsample(features_train, target_train, 4)

In [15]:
#Logistic Regression with Upsampling

model_lg_up = LogisticRegression(random_state=12345, solver='liblinear')
model_lg_up.fit(features_upsampled, target_upsampled)
predicted_test_lg_up = model_lg_up.predict(features_test)
print('F1:', f1_score(target_test, predicted_test_lg_up).round(2))
print()

# Predict on validation set and tune threshold
best_f1_lg_up = 0
best_threshold_lg_up = 0

probs_valid_lg_up = model_lg_up.predict_proba(features_val)
probs_one_valid_lg_up = probs_valid_lg_up[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_lg_up = probs_one_valid_lg_up > threshold
    f1 = f1_score(target_val, predicted_valid_lg_up)
    if f1 > best_f1_lg_up:
        best_f1_lg_up = f1
        best_threshold_lg_up = threshold

# Predict on test set using best threshold
probs_test_lg_up = model_lg_up.predict_proba(features_test)
probs_one_test_lg_up = probs_test_lg_up[:, 1]
predicted_test_lg_up = probs_one_test_lg_up > best_threshold_lg_up
best_f1_lg_test_up = f1_score(target_test, predicted_test_lg_up)

print('Best threshold:', best_threshold_lg_up.round(2))
print('Best F1 score on validation set:', best_f1_lg_up.round(2))
print('F1 score on test set:', best_f1_lg_test_up.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_lg_up).round(2))

F1: 0.48

Best threshold: 0.54
Best F1 score on validation set: 0.48
F1 score on test set: 0.49
AUC-ROC: 0.75


In [16]:
#Decision Tree with Upsampling 

best_depth = 0
best_dt_up_score = 0
for depth in range(1, 11):
    dt_model_up = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    dt_model_up.fit(features_upsampled, target_upsampled)
    dt_up_score_val = dt_model_up.score(features_val, target_val) #testing the model_up with the testing set
    if dt_up_score_val > best_dt_up_score:
        best_dt_up_score = dt_up_score_val#if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_depth = depth #same idea with depth
        
best_dt_model_up = DecisionTreeClassifier(random_state=12345,max_depth=best_depth)
best_dt_model_up.fit(features_upsampled, target_upsampled)        

predicted_test_dt_up = best_dt_model_up.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_dt_up).round(2), "with depth of", best_depth)
print()

# Predict on validation set and tune threshold
best_f1_dt_up = 0
best_threshold_dt_up = 0

probs_valid_dt_up = best_dt_model_up.predict_proba(features_val)
probs_one_valid_dt_up = probs_valid_dt_up[:, 1]
    
for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_dt_up = probs_one_valid_dt_up > threshold
    f1 = f1_score(target_val, predicted_valid_dt_up)
    if f1 > best_f1_dt_up:
        best_f1_dt_up = f1
        best_threshold_dt_up = threshold    
    
# Predict on test set using best threshold
probs_test_dt_up = best_dt_model_up.predict_proba(features_test)
probs_one_test_dt_up = probs_test_dt_up[:, 1]
predicted_test_dt_up = probs_one_test_dt_up > best_threshold_dt
best_f1_dt_test_up = f1_score(target_test, predicted_test_dt_up)

print('Best threshold:', best_threshold_dt_up.round(2))
print('Best F1 score on validation set:', best_f1_dt_up.round(2))
print('F1 score on test set:', best_f1_dt_test_up.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_dt_up).round(2))

      

F1: 0.59 with depth of 6

Best threshold: 0.56
Best F1 score on validation set: 0.57
F1 score on test set: 0.48
AUC-ROC: 0.83


In [17]:
#Random Forest Classifier with Upsampling 

best_score = 0
best_est = 0
for est in range(1, 31):
    forest_model_up = RandomForestClassifier(random_state=12345, n_estimators=est)
    forest_model_up.fit(features_upsampled, target_upsampled)
    forest_score_val = forest_model_up.score(features_val, target_val)
    if forest_score_val > best_score:
        best_score = forest_score_val #if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_est = est #same thing with est

final_forest_model_up = RandomForestClassifier(random_state=12345, n_estimators=best_est)
final_forest_model_up.fit(features_train, target_train)

predicted_test_rf_up = final_forest_model_up.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_rf_up).round(2), "with est of", best_est)
print()

# Predict on validation set and tune threshold
best_f1_rf_up = 0
best_threshold_rf_up = 0

probs_valid_rf_up = final_forest_model_up.predict_proba(features_val)
probs_one_valid_rf_up = probs_valid_rf_up[:, 1]
      
for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_rf_up = probs_one_valid_rf_up > threshold
    f1 = f1_score(target_val, predicted_valid_rf_up)
    if f1 > best_f1_rf_up:
        best_f1_rf_up = f1
        best_threshold_rf_up = threshold         
               
# Predict on test set using best threshold
probs_test_rf_up = final_forest_model.predict_proba(features_test)
probs_one_test_rf_up = probs_test_rf_up[:, 1]
predicted_test_rf_up = probs_one_test_rf_up > best_threshold_rf_up
best_f1_rf_test_up = f1_score(target_test, predicted_test_rf)

print('Best threshold:', best_threshold_rf_up.round(2))
print('Best F1 score on validation set:', best_f1_rf_up.round(2))
print('F1 score on test set:', best_f1_rf_test_up.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_rf_up).round(2))



F1: 0.54 with est of 18

Best threshold: 0.34
Best F1 score on validation set: 0.57
F1 score on test set: 0.56
AUC-ROC: 0.83


Part 3 Conclusion:

This time Random Forest preformed the best, but still under our desired F1 score of .59 and worse than when weight_class = 'balanced'. 
Decision Tree's performance dratiscally decreased, even slightly worse than Logistic Regression. Logistic Regression preformed the same as when the dataset was unbalanced.

Part 4:

We will use the same three model, but we will fix inbalancing by downsampling.
Once again after each model, we will alter the model's threshold and obtain the best F1 Score.

In [18]:
#creating function for downsample

def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat([features_zeros.sample(frac=fraction, random_state=12345)] + [features_ones])
    
    target_downsampled = pd.concat([target_zeros.sample(frac=fraction, random_state=12345)] + [target_ones])

    features_downsampled, target_downsampled = shuffle(features_downsampled, target_downsampled, random_state=12345)

    return features_downsampled, target_downsampled

#down sampled training sets
features_downsampled, target_downsampled = downsample(features_train, target_train, .25)

In [19]:
#Logistic Regression with Downsampling

model_lg_down = LogisticRegression(random_state=12345, solver='liblinear')
model_lg_down.fit(features_downsampled, target_downsampled)
predicted_test_lg_down = model_lg_down.predict(features_test)
print('F1:', f1_score(target_test, predicted_test_lg_down).round(2))
print()

# Predict on validation set and tune threshold
best_f1_lg_down = 0
best_threshold_lg_down = 0

probs_valid_lg_down = model_lg_down.predict_proba(features_val)
probs_one_valid_lg_down = probs_valid_lg_down[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_lg_down = probs_one_valid_lg_down > threshold
    if f1_score(target_val, predicted_valid_lg_down) > best_f1_lg_down:
        best_f1_lg_down = f1_score(target_val, predicted_valid_lg_down)
        best_threshold_lg_down = threshold

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_lg_down = probs_one_valid_lg_down > threshold
    f1 = f1_score(target_val, predicted_valid_lg_down)
    if f1 > best_f1_lg_down:
        best_f1_lg_down = f1
        best_threshold_lg_down = threshold

# Predict on test set using best threshold
probs_test_lg_down = model_lg_down.predict_proba(features_test)
probs_one_test_lg_down = probs_test_lg_down[:, 1]
predicted_test_lg_down = probs_one_test_lg_down > best_threshold_lg_down
best_f1_lg_test_down = f1_score(target_test, predicted_test_lg_down)

print('Best threshold:', best_threshold_lg_down.round(2))
print('Best F1 score on validation set:', best_f1_lg_down.round(2))
print('F1 score on test set:', best_f1_lg_test_down.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_lg_down).round(2))

F1: 0.49

Best threshold: 0.54
Best F1 score on validation set: 0.48
F1 score on test set: 0.49
AUC-ROC: 0.76


In [20]:
#Decision Tree with Downsampling 

best_depth = 0
best_dt_down_score = 0
for depth in range(1, 11):
    dt_model_down = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    dt_model_down.fit(features_downsampled, target_downsampled)
    dt_down_score_val = dt_model_down.score(features_val, target_val) #testing the model_down with the testing set
    if dt_down_score_val > best_dt_down_score:
        best_dt_down_score = dt_down_score_val#if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_depth = depth #same idea with depth
        
best_dt_model_down = DecisionTreeClassifier(random_state=12345,max_depth=best_depth)
best_dt_model_down.fit(features_downsampled, target_downsampled)        

predicted_test_dt_down = best_dt_model_down.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_dt_down), "with depth of", best_depth)
print()

# Predict on test set using best threshold
best_f1_dt_down = 0
best_threshold_dt_down = 0

probs_valid_dt_down = best_dt_model_down.predict_proba(features_val)
probs_one_valid_dt_down = probs_valid_dt_down[:, 1]
    
for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_dt_down = probs_one_valid_dt_down > threshold
    f1 = f1_score(target_val, predicted_valid_dt_down)
    if f1 > best_f1_dt_down:
        best_f1_dt_down = f1
        best_threshold_dt_down = threshold    
    
probs_test_dt_down = best_dt_model_down.predict_proba(features_test)
probs_one_test_dt_down = probs_test_dt_down[:, 1]
predicted_test_dt_down = probs_one_test_dt_down > best_threshold_dt
best_f1_dt_test_down = f1_score(target_test, predicted_test_dt_down)

print('Best threshold:', best_threshold_dt_down.round(2))
print('Best F1 score on validation set:', best_f1_dt_down.round(2))
print('F1 score on test set:', best_f1_dt_test_down.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_dt_down).round(2))

      


F1: 0.5416227608008429 with depth of 4

Best threshold: 0.5
Best F1 score on validation set: 0.51
F1 score on test set: 0.46
AUC-ROC: 0.81


In [21]:
#Random Forest Classifier with Downsampling 

best_score = 0
best_est = 0
for est in range(1, 31):
    forest_model_rf_down = RandomForestClassifier(random_state=12345, n_estimators=est)
    forest_model_rf_down.fit(features_downsampled, target_downsampled)
    forest_score_val = forest_model_rf_down.score(features_val, target_val)
    if forest_score_val > best_score:
        best_score = forest_score_val #if new score is better or more accurate than previous score, we keep the new one. Otherwise, we we stick with the previous
        best_est = est #same thing with est

final_forest_model_rf_down = RandomForestClassifier(random_state=12345, n_estimators=best_est)
final_forest_model_rf_down.fit(features_train, target_train)

predicted_test_rf_down = final_forest_model_rf_down.predict(features_test)

print('F1:',f1_score(target_test, predicted_test_rf_down).round(2), "with est of", best_est)
print()

# Predict on validation set and tune threshold
best_f1_rf_down = 0
best_threshold_rf_down = 0

probs_valid_rf_down = final_forest_model_rf_down.predict_proba(features_val)
probs_one_valid_rf_down = probs_valid_rf_down[:, 1]

for threshold in np.arange(0, 0.8, 0.02):
    predicted_valid_rf_down = probs_one_valid_rf_down > threshold
    f1 = f1_score(target_val, predicted_valid_rf_down)
    if f1 > best_f1_rf_down:
        best_f1_rf_down = f1
        best_threshold_rf_down = threshold         
               
# Predict on test set using best threshold
probs_test_rf_down = final_forest_model.predict_proba(features_test)
probs_one_test_rf_down = probs_test_rf_down[:, 1]
predicted_test_rf_down = probs_one_test_rf_down > best_threshold_rf_down
best_f1_rf_test_down = f1_score(target_test, predicted_test_rf)

print('Best threshold:', best_threshold_rf_down.round(2))
print('Best F1 score on validation set:', best_f1_rf_down.round(2))
print('F1 score on test set:', best_f1_rf_test_down.round(2))
print('AUC-ROC:', roc_auc_score(target_test, probs_one_test_rf_down).round(2))


F1: 0.56 with est of 30

Best threshold: 0.34
Best F1 score on validation set: 0.58
F1 score on test set: 0.56
AUC-ROC: 0.83


Part 4 Conclusion:

Once again Random Forest preformed the best, but still under our desired F1 score of .59 and worse than when weight_class = 'balanced'. 
Decision Tree's performance decreased. Logistic Regression's performance did not change.

In [22]:
#presenting all the results in a clear 3x3 matrix
data = {'Logistic Regression': [best_f1_lg_test, best_f1_lg_test_bal, best_f1_lg_test_up, best_f1_lg_test_down],
        'Decision Tree': [best_f1_dt_test, best_f1_dt_test_bal, best_f1_dt_test_up, best_f1_dt_test_down],
        'Random Forest': [best_f1_rf_test, best_f1_rf_test_bal, best_f1_rf_test_up, best_f1_rf_test_down]}
analysis_matrix = pd.DataFrame(data, index=['No Balancing', 'class_weight = balanced', 'Upsampled', 'Downsampled'])
analysis_matrix = analysis_matrix.round(2) 
print(analysis_matrix)

                         Logistic Regression  Decision Tree  Random Forest
No Balancing                            0.49           0.61           0.56
class_weight = balanced                 0.40           0.61           0.59
Upsampled                               0.49           0.48           0.56
Downsampled                             0.49           0.46           0.56


FINAL CONCLUSION
The winning model is Decision Tree since it's the only model which acheived our goal of a F1 score higher than .59.
These results are interesting. Decision Tree was only accurate with the orignal unbalanced dataset or when class_weight was balanced. Upsampling or Downsampling the dataset, made the model the worst one!
Random Forest is the runner up. Random Forest acheived consistent results despite balancing changes to the dataset. Maybe with more tuning or other changes to parameters, it could be prove to be more accurate than Decision Tree.

All of the AUC-ROC values are consistent for each Model, respectively. We yielded median scores of .75 for Logistic Regression and as high as .84 for Decision Tree and Random Forest.