<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Thanks for taking the time to improve the project! It is now accepted. Keep up the good work on the next sprint!

</div>

**Review**

Hi, my name is Dmitry and I will be reviewing your project.
  
You can find my comments in colored markdown cells:
  
<div class="alert alert-success">
  If everything is done successfully.
</div>
  
<div class="alert alert-warning">
  If I have some (optional) suggestions, or questions to think about, or general comments.
</div>
  
<div class="alert alert-danger">
  If a section requires some corrections. Work can't be accepted with red comments.
</div>
  
Please don't remove my comments, as it will make further review iterations much harder for me.
  
Feel free to reply to my comments or ask questions using the following template:
  
<div class="alert alert-info">
  For your comments and questions.
</div>
  
First of all, thank you for turning in the project! You did a pretty good job overall, but there are some problems that need to be fixed before the project is accepted. Let me know if you have questions!

# Beta Bank Customer Retention Predictive Modeling

Having been provided data from Beta Bank, we will create a number of predictive models and select the best one to help determine whether customers will leave Beta Bank in the near future. Our threshold for minimum acceptance while creating the models and selecting one will be an F-score of 0.59 which is the relationship between `recall`, a model's abililty to correctly identify True Positive predictions while avoiding False Negatives, and `precision`, a model's ability to correctly identify True Positive predictions while avoiding False Positives.

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score 
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

### Import Data

In [2]:
df = pd.read_csv('/datasets/Churn.csv')

### Preprocessing

#### General Preprocessing

##### Data Overview

In [3]:
display(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


None

In [4]:
display(df.head())

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


##### Check for Missing Values

In [5]:
df.isna().sum()

RowNumber            0
CustomerId           0
Surname              0
CreditScore          0
Geography            0
Gender               0
Age                  0
Tenure             909
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Exited               0
dtype: int64

In [6]:
df.dropna(subset=['Tenure'], inplace=True)

In [7]:
df.isna().sum()

RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

`Tenure` contained 909 missing values, and when compared to the 10,000 rows of data, they represented just over 9% of the provided data, and so rather than filling the values with any kind of approximation, the rows were removed to help maintain data integrity.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Ok, that's one way to deal with missing values!

</div>

##### Check for Duplicates

In [8]:
df.duplicated().sum()

0

There are no duplicate rows.

##### Quality of Life Alterations

In [9]:
df.columns= df.columns.str.lower()

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9091 entries, 0 to 9998
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   rownumber        9091 non-null   int64  
 1   customerid       9091 non-null   int64  
 2   surname          9091 non-null   object 
 3   creditscore      9091 non-null   int64  
 4   geography        9091 non-null   object 
 5   gender           9091 non-null   object 
 6   age              9091 non-null   int64  
 7   tenure           9091 non-null   float64
 8   balance          9091 non-null   float64
 9   numofproducts    9091 non-null   int64  
 10  hascrcard        9091 non-null   int64  
 11  isactivemember   9091 non-null   int64  
 12  estimatedsalary  9091 non-null   float64
 13  exited           9091 non-null   int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.0+ MB


Column names were changed to lowercase for consistency in coding.

In [11]:
df['tenure'].value_counts()

1.0     952
2.0     950
8.0     933
3.0     928
5.0     927
7.0     925
4.0     885
9.0     882
6.0     881
10.0    446
0.0     382
Name: tenure, dtype: int64

In [12]:
df['tenure'] = df['tenure'].astype('int64')

In [13]:
df['tenure'].value_counts()

1     952
2     950
8     933
3     928
5     927
7     925
4     885
9     882
6     881
10    446
0     382
Name: tenure, dtype: int64

`tenure` values were changed to integer data type to assist with reducing the storage size and help with processing speed.

#### Machine Learning Modelling Preparations

##### Stratified split into training, validation, and testing sets

In [14]:
train, valid = train_test_split(df, test_size=0.2, random_state=12345, stratify=df['exited'] )

train, test = train_test_split(train, test_size=0.25, random_state=12345, stratify=train['exited'] )

We have split the data into three sets; training, validation, and testing, at a 3:1:1 ratio. First we took 20% of the data to create the validation set, then from the remaining 80%, we took 25% to make the testing set. This follows the ratio of 3:1:1 or 60%, 20%, and 20% since after the first split, 80% of the initial data remained, and 0.8 * 0.25 = 0.2, representing a second 20% share of the full data set.

The sets were also stratified on the `exited` column which is the target - this will ensure that there is an equal proportion as to the original dataset in the three new splits.

##### Check geography for viability of inclusion

In [15]:
df['geography'].value_counts()

France     4550
Germany    2293
Spain      2248
Name: geography, dtype: int64

With only 3 different values, `geography` will not need to get dropped but rather converted to a numeric representation for use with a ML model.

##### Create features and target sets

In [16]:
features_train = train.drop(['rownumber', 'customerid', 'surname', 'exited'] ,axis=1)
target_train = train['exited']
features_valid = valid.drop(['rownumber', 'customerid', 'surname', 'exited'] ,axis=1)
target_valid = valid['exited']
features_test = test.drop(['rownumber', 'customerid', 'surname', 'exited'] ,axis=1)
target_test = test['exited']

features_train = pd.get_dummies(features_train, drop_first=True)
features_valid = pd.get_dummies(features_valid, drop_first=True)
features_test = pd.get_dummies(features_test, drop_first=True)

Training, validation, and testing sets have been split into features and target sets. One-Hot Encoding was also used to turn columns that weren't previously usable into usable ones. The dummy trap has also been taken into account and extraneous columns created through OHE have been dropped.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Categorical features were encoded; good idea to drop uninformative columns

</div>

##### Check target distributions

In [17]:
display(target_train.value_counts())
display(target_valid.value_counts())
display(target_test.value_counts())

0    4342
1    1112
Name: exited, dtype: int64

0    1448
1     371
Name: exited, dtype: int64

0    1447
1     371
Name: exited, dtype: int64

There are approximately four times the number of customers who did not leave opposed to those that did leave Beta Bank.

<div class="alert alert-success">
<b>Reviewer's comment</b>

The data was split into train,, validation and test sets reasonably

</div>

<div class="alert alert-danger">
<s><b>Reviewer's comment</b>

Could you check the target distribution? Otherwise it's not clear why we're talking about imbalance at all :)

</div>

<div class="alert alert-info">
  Checked the value counts in the newly shuffled 3.2.4 section.
    <p></p>
  Because of an immense amount of shuffling in the order above, I definitely messed with where your comments should line up properly. I did however stratify the data so there should be the proper proportion in each of the sets and verified it as well with value counts for each of the three sets.
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Ok, no problem!

</div>

### Imbalanced Models

#### Decision Tree

In [18]:
best_model = None
best_result = 0
for depth in range(1, 50):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth) # create a model with the given depth
    model.fit(features_train, target_train) # train the model
    predictions_valid = model.predict(features_valid)
    result = f1_score(target_valid, predictions_valid)
    if result > best_result:
        best_model = model
        best_result = result
        best_depth = depth
        
predictions_test = best_model.predict(features_valid)
test_result = accuracy_score(target_valid, predictions_valid)

print("Depth of best model:", best_depth)
print("F1 score of the decision tree model on the validation set:", best_result)

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print("AUC_ROC score of the decision tree model on the validation set:", auc_roc)

Depth of best model: 7
F1 score of the decision tree model on the validation set: 0.5745682888540031
AUC_ROC score of the decision tree model on the validation set: 0.681033231076231


#### Random Forest

In [19]:
best_score = 0
best_est = 0
for est in range(1, 50): # choose hyperparameter range
    model = RandomForestClassifier(random_state=12345, n_estimators=est) # set number of trees
    model.fit(features_train, target_train) # train model on training set
    predictions_valid = model.predict(features_valid)
    score = f1_score(target_valid, predictions_valid)    
    if score > best_score:
        best_score = score
        best_est = est

print("n_estimator value of best model:", best_est)
print("F1 score of the random forest on the model validation set:", best_score)
        
best_model = RandomForestClassifier(random_state=12345, n_estimators=best_est)
best_model.fit(features_train, target_train)

probabilities_valid = best_model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print("AUC_ROC score of the random forest model on the validation set:", auc_roc)

n_estimator value of best model: 41
F1 score of the random forest on the model validation set: 0.5874799357945426
AUC_ROC score of the random forest model on the validation set: 0.8389255186073179


#### Logistic Regression

In [20]:
model = LogisticRegression(random_state=12345, solver='liblinear')  # initialize logistic regression constructor with parameters random_state=54321 and solver='liblinear'
model.fit(features_train, target_train)  # train model on training set

predictions_valid = model.predict(features_valid)
f1_score_lr = f1_score(target_valid, predictions_valid)

print("F1 score of the logistic regression model on the validation set:", f1_score_lr)

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print("AUC_ROC score of the logistic regression model on the validation set:", auc_roc)

F1 score of the logistic regression model on the validation set: 0.005376344086021506
AUC_ROC score of the logistic regression model on the validation set: 0.646782996530208


The best unbalanced model which successfully meets the threshold of an F-score of `0.59` is the Decision Tree model with an F-score of `0.594` on the validation data. We will be refreshing the data and utilizing a few techniques such as scaling, upsampling, and downsampling to apply with the same model types to optimize and improve the model.

A cause for these models utilizing unbalanced data could be that with the greatly varying numbers in some of the features, the weight of the predictions is off, causing the relationship between precision and recall represented by the F1-score to be lower than ideal, even if the accuracy of the model is calculated to be relatively high.

<div class="alert alert-warning">
<b>Reviewer's comment</b>

Great, you tried a few different models and did some hyperparameter tuning. I would suggest instead of optimizing accuracy (which is a poor metric for imbalanced data anyway) to optimize F1 score (which is our target metric in this project). Hyperparameters which maximize one metric are not necessarily the same as hyperparameters maximizing a different metric.

</div>

<div class="alert alert-danger">
<s><b>Reviewer's comment</b>

Whether to use some kind of balancing is a hypeparameter of a model, and thus has to be decided using the validation set. The test set should only be used to evaluate the final model.

</div>

<div class="alert alert-info">
  I adjusted all the above unbalanced models to work with F1 score optimization for finding parameters. I also utilized validation data for the final scores after fitting them on the training data. The test data is reserved for the final balanced models.
    <p></p>
I also removed the Decision Tree Regression - clearly I'm still working on figuring out what models go with what.
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Great!

</div>

### Modifying Data for Balanced Models

<div class="alert alert-warning">
<b>Reviewer's comment</b>

Why repeat the same code with exactly the same results?

</div>

<div class="alert alert-info">
  I can honestly say I do not recall why I repeated previously completed steps unnecessarily other than possibly being very much out of my element with Machine Learning. Likely I did the useless repetition just to feel sure the splits were proper - kind of a measure twice, cut once concept.
    <p></p>
<p>  I did however remove the extraneous code.
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Haha, I see :)

</div>

#### Scale features for weight in training model

In [21]:
numeric = ['creditscore', 'age', 'tenure', 'balance', 'numofproducts', 'estimatedsalary']

scaler = StandardScaler()
scaler.fit(features_train[numeric]) 
features_train[numeric] = scaler.transform(features_train[numeric])
features_valid[numeric] = scaler.transform(features_valid[numeric])
features_test[numeric] = scaler.transform(features_test[numeric])

The classes defined by the `numeric` variable are transformed to be represented by a numerical expression in relation to the standard deviation, square root of the variance. This will ensure they are not weighted disproportionately in the fitting of various models.

<div class="alert alert-warning">
<b>Reviewer's comment</b>

> The classes defined by the `numeric` variable are transformed to be represented between the values of `-1` and `1`.
    
That's not quite what standard scaling does. What it does is subtract the mean value and divide by standard deviation, so the result is not necessarily a value between -1 and 1 (there can be values more than one standard deviation from the mean)

</div>

<div class="alert alert-info">
  I adjusted my short post-scaling blurb, hopefully I understand it better and showed as much in my statement. Just in case I'm not mathing this right in my head; this would mean that 99.7% of the data should then fall between -3 and 3, assuming that the rules of a standard distribution apply to the data, while outliers would fall outside of that range. Am I closer than before at least? (It's clearly been a moment since I did any significant SDA)
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Yep, that's right! 

</div>

#### Downsampling to reduce frequentcy of frequent class data

In [22]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat([features_zeros.sample(frac=fraction, random_state=12345)] + [features_ones])
    target_downsampled = pd.concat([target_zeros.sample(frac=fraction, random_state=12345)] + [target_ones])

    features_downsampled, target_downsampled = shuffle(features_downsampled, target_downsampled, random_state=12345)

    return features_downsampled, target_downsampled


features_downsampled, target_downsampled = downsample(features_train, target_train, 0.25)

We created downsampled features and target based on the training variables in order to observe frequent classes less in the data.

#### Upsampling to make rare classes less rare

In [23]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(features_upsampled, target_upsampled, random_state=12345)

    return features_upsampled, target_upsampled


features_upsampled, target_upsampled = upsample(features_train, target_train, 4)

We also created upsampled features and target based on the training variables in order to have the option to observe rarer classes more in the data.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Downsampling and upsampling were correctly applied only to the train set

</div>

<div class="alert alert-warning">
<b>Reviewer's comment</b>

The `repeat` and `fraction` parameter seem to be chosen arbitrarily. I would suggest looking at the target distribution to infer the values that would make the data balanced.

</div>

<div class="alert alert-info">
  Based on the distribution, I went with `4` and `0.25` for the `repeat` and `fraction` parameters, respectively. Ideally this will get me closer to a good F1 score when using the upsampled and downsampled datasets. But these numbers being off makes sense as to why I was not seeing any realy improvement across the board with my balanced models before. So thank you for the guidance.
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Indeed, this makes sense! You're welcome :)

</div>

### Balanced Models

#### Upsampled Decision Tree

In [24]:
best_model = None
best_result = 0
for depth in range(1, 50):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    model.fit(features_upsampled, target_upsampled)
    predictions_valid = model.predict(features_valid)
    result = f1_score(target_valid, predictions_valid)
    if result > best_result:
        best_model = model
        best_result = result
        best_depth = depth
        
predictions_test = best_model.predict(features_test)
test_result = accuracy_score(target_test, predictions_test)

print("Depth of best model:", best_depth)
print("F1 score of the decision tree model on the validation set:", best_result)

f1_score_dt = f1_score(target_test, predictions_test)

print("F1 score of the decision tree model on the test set:", f1_score_dt)

probabilities_test = best_model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print("AUC_ROC score of the decision tree model on the test set:", auc_roc)


Depth of best model: 7
F1 score of the decision tree model on the validation set: 0.5599999999999999
F1 score of the decision tree model on the test set: 0.5690721649484536
AUC_ROC score of the decision tree model on the test set: 0.8248686286526451


#### Downsampled Decision Tree

In [25]:
best_model = None
best_result = 0
for depth in range(1, 50):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    model.fit(features_downsampled, target_downsampled)
    predictions_valid = model.predict(features_valid)
    result = f1_score(target_valid, predictions_valid)
    if result > best_result:
        best_model = model
        best_result = result
        best_depth = depth
        
predictions_test = best_model.predict(features_test)
test_result = accuracy_score(target_test, predictions_test)

print("Depth of best model:", best_depth)
print("F1 score of the decision tree model on the validation set:", best_result)

f1_score_dt = f1_score(target_test, predictions_test)

print("F1 score of the decision tree model on the test set:", f1_score_dt)

probabilities_test = best_model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print("AUC_ROC score of the decision tree model on the test set:", auc_roc)


Depth of best model: 4
F1 score of the decision tree model on the validation set: 0.5586808923375364
F1 score of the decision tree model on the test set: 0.5291214215202369
AUC_ROC score of the decision tree model on the test set: 0.8084642079439381


<div class="alert alert-success">
<b>Reviewer's comment</b>

Alright, here you're maximizing F1 score, very nice!

</div>

#### Upsampled Random Forest

In [26]:
best_score = 0
best_est = 0
for est in range(1, 50): 
    model = RandomForestClassifier(random_state=12345, n_estimators=est) 
    model.fit(features_upsampled, target_upsampled) 
    predictions_valid = model.predict(features_valid)
    score = f1_score(target_valid, predictions_valid)    
    if score > best_score:
        best_score = score
        best_est = est

print("n_estimator value of best model:", best_est)
print("F1 score of the random forest on the model validation set:", best_score)
        
best_model = RandomForestClassifier(random_state=12345, n_estimators=best_est)
best_model.fit(features_upsampled, target_upsampled)

predictions_test = best_model.predict(features_test)
f1_score_rf = f1_score(target_test, predictions_test)

print("F1 score of the random forest model on the test set:", f1_score_rf)

probabilities_test = best_model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print("AUC_ROC score of the random forest model on the test set:", auc_roc)

n_estimator value of best model: 43
F1 score of the random forest on the model validation set: 0.6213872832369942
F1 score of the random forest model on the test set: 0.5933734939759037
AUC_ROC score of the random forest model on the test set: 0.8531602702496289


#### Downsampled Random Forest

In [27]:
best_score = 0
best_est = 0
for est in range(1, 50): 
    model = RandomForestClassifier(random_state=12345, n_estimators=est) 
    model.fit(features_downsampled, target_downsampled) 
    predictions_valid = model.predict(features_valid)
    score = f1_score(target_valid, predictions_valid)    
    if score > best_score:
        best_score = score
        best_est = est

print("n_estimator value of best model:", best_est)
print("F1 score of the random forest on the model validation set:", best_score)
        
best_model = RandomForestClassifier(random_state=12345, n_estimators=best_est)
best_model.fit(features_downsampled, target_downsampled)

predictions_test = best_model.predict(features_test)
f1_score_rf = f1_score(target_test, predictions_test)

print("F1 score of the random forest model on the test set:", f1_score_rf)

probabilities_test = best_model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print("AUC_ROC score of the random forest model on the test set:", auc_roc)

n_estimator value of best model: 14
F1 score of the random forest on the model validation set: 0.5846153846153845
F1 score of the random forest model on the test set: 0.5730337078651685
AUC_ROC score of the random forest model on the test set: 0.8343407775544534


<div class="alert alert-danger">
<s><b>Reviewer's comment</b>

Neither upsampled, nor downsampled data are used to train the decision tree and random forest

</div>

<div class="alert alert-warning">
<b>Reviewer's comment V2</b>

Ok, now you're working both with upsampled and downsampled data. One note: I would still suggest to reserve the test set to only evaluate the model, when you've chosen whether to use upsampling or downsampling using the validation set. Ideally, the type of the model should be selected using the validation set as well to avoid getting a biased estimate of the final model's generalization performance (so then the test set would be used just once: to evaluate the final model).

</div>

#### Upsampled Logistic Regression

In [28]:
model = LogisticRegression(random_state=12345, solver='liblinear')  # initialize logistic regression constructor with parameters random_state=54321 and solver='liblinear'
model.fit(features_upsampled, target_upsampled)  # train model on training set

predictions_test = model.predict(features_test)
f1_score_lr = f1_score(target_test, predictions_test)

print("F1 score of the logistic regression model on the test set:", f1_score_lr)

probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print("AUC_ROC score of the logistic regression model on the test set:", auc_roc)

F1 score of the logistic regression model on the test set: 0.4938737040527804
AUC_ROC score of the logistic regression model on the test set: 0.7613186125397466


#### Downsampled Logistic Regression

In [29]:
model = LogisticRegression(random_state=12345, solver='liblinear')  # initialize logistic regression constructor with parameters random_state=54321 and solver='liblinear'
model.fit(features_downsampled, target_downsampled)  # train model on training set

predictions_test = model.predict(features_test)
f1_score_lr = f1_score(target_test, predictions_test)

print("F1 score of the logistic regression model on the test set:", f1_score_lr)

probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print("AUC_ROC score of the logistic regression model on the test set:", auc_roc)

F1 score of the logistic regression model on the test set: 0.49056603773584906
AUC_ROC score of the logistic regression model on the test set: 0.7608976281441108


<div class="alert alert-success">
<b>Reviewer's comment</b>

Alright, the logistic regression model was trained on downsampled data

</div>

<div class="alert alert-danger">
<s><b>Reviewer's comment</b>

Why use a decision tree regression model for a classification problem? :)
    
Note that logistic regression is actually a classification model despite its unfortunate name

</div>

<div class="alert alert-info">
  I removed the Decision Tree Regression model, by this point I think I had run each of the previous models at least ten times each so was kind of lost. I'm still very uncomfortable with Machine Learning as a whole, and clearly don't have a great grasp on what models to use when. I also expanded the three remaining models to include both an upscaled and downscaled version to ideally find the best fit.
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

Excellent! Yeah, don't worry, you're doing great! :)

</div>

# Conclusion

The best model for Beta Bank to use in their predictions of what customers will stay or leave is the Random Forest model, utilizing upsampled data, with a depth of 43. 

Before balancing, the Random Forest model resulted in and F1 Score of 0.587 with depth of 41. This was just under the threshold of `0.59`.

After balancing the data and fitting the model, the validation data set resulted in a F1 Score of 0.621, with depth of 43; slightly exceeding the threshold for an acceptable model and showing improvement. When the test data set was processed by the model, we recieved a F1 Score of 0.593 which still meets the minimum threshold. The AUC_ROC score, which can range from 0 to 1, is 0.853 with the testing set, indicating a relatively high quality model as the closer to 1 a model approaches, the greater the quality.

<div class="alert alert-info">
    As a note: I'm aware that I will likely need edits following my first submission, and will welcome any input and assistance I can get. At this point I'm floundering on how to get my F1 scores higher. I ran the base training sets along with downsampled and upsampled sets through all the models and played with which classes would get scaled to try and get the best results, selecting the best results for all of them at the end. I also tried eliminating various columns such as geography as well to reduce the number of variables in the modelling to no avail. I also adjusted the fraction of the downsampling and multiplier of the upsampling with no luck.
</div>

<div class="alert alert-danger">
<s><b>Reviewer's comment</b>

Yep, there are a couple of problems I noted above. Some suggestions for improving the F1 score:
    
1. Make sure that upsampled/downsampled data are actually used to train decision tree/random forest
2. Use stratification when splitting the data to make sure target distribution in train/validation/test is the same as in the original dataset
3. Maybe to enlarge the hyperparameter search space

</div>

<div class="alert alert-info">
  Hopefully this second submission is better. I was able to meet the F1 Score threshold with a model finally. So thank you for all the great advice, adjusting the multiplier and fraction of my up/downscaling along with stratifying the data sets certainly helped. So thiank you again.
</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>

It is! You're welcome! :)

</div>