### All the Lifecycle In A Data Science Projects
1. Exploratory Data Analysis
2. Feature Engineering
3. Feature Selection
4. **Model Building**   -->      ***--THIS SECTION--***
5. Model Deployment

## Import Necessary Libraries

In [76]:
# Data Analysis
import numpy as np
import pandas as pd

# Machine Learning Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
    
# Performance Metrics
from sklearn.metrics import classification_report, confusion_matrix, f1_score, precision_score, recall_score, accuracy_score
from sklearn.metrics import roc_auc_score

# Cross-Validate
from sklearn import model_selection
from sklearn.model_selection import cross_val_score

# Ignore Warnings
import warnings
warnings.filterwarnings('ignore')

## Loading the Dataset

In [2]:
df = pd.read_csv('Data.csv')
df.head()

Unnamed: 0,GENDER,DRIVING_EXPERIENCE,INCOME,CREDIT_SCORE,VEHICLE_OWNERSHIP,VEHICLE_YEAR,MARRIED,CHILDREN,ANNUAL_MILEAGE,PAST_ACCIDENTS,OUTCOME
0,0,0,3,0.629027,1.0,1,0.0,1.0,12000.0,0,0
1,1,0,0,0.357757,0.0,0,0.0,0.0,16000.0,0,1
2,0,0,1,0.493146,1.0,0,0.0,0.0,11000.0,0,0
3,1,0,1,0.206013,1.0,0,0.0,1.0,11000.0,0,0
4,1,1,1,0.388366,1.0,0,0.0,0.0,12000.0,1,1


### Splitting the Data into Train and Test

In [3]:
X = df.drop(columns= 'OUTCOME')
y = df[['OUTCOME']]

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.30, random_state= 42)

- We will try without any Re-Sampling Techniques and then compare with Sampled Models. 

In [6]:
print(X_train)
print()
print(y_train)

      GENDER  DRIVING_EXPERIENCE  INCOME  CREDIT_SCORE  VEHICLE_OWNERSHIP  \
9069       1                   2       3      0.594366                1.0   
2603       0                   1       3      0.644906                1.0   
7738       0                   0       0      0.343943                1.0   
1579       1                   1       3      0.597851                1.0   
5058       1                   0       3      0.635860                1.0   
...      ...                 ...     ...           ...                ...   
5734       1                   0       1      0.448070                0.0   
5191       0                   2       3      0.705328                1.0   
5390       0                   2       3      0.701491                1.0   
860        0                   1       3      0.646763                1.0   
7270       1                   2       2      0.594994                1.0   

      VEHICLE_YEAR  MARRIED  CHILDREN  ANNUAL_MILEAGE  PAST_ACCIDENTS  
906

#### We will be looking at Recall as we don't want to predict the customers who will actually claim their Insurance as those who will not claim. 

### Without ReSampling

#### 1. Logistic Regression

In [7]:
log_model = LogisticRegression()
kfold = model_selection.KFold(n_splits= 10, random_state= 7, shuffle= True)

## Fitting the Model
model1 = log_model.fit(X_train, y_train)

In [8]:
### Performance Metrics
LOG_accuracy = model1.score(X_train, y_train)
LOG_cv_accuracy = cross_val_score(model1, X_train, y_train, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the Logistic Model is: {LOG_accuracy}')
print(f'The Cross-Validated Accuracy of the Logistic Model is: {LOG_cv_accuracy.mean()}')

The Initial Accuracy of the Logistic Model is: 0.7761428571428571
The Cross-Validated Accuracy of the Logistic Model is: 0.7895714285714286


In [9]:
## Prediction
y_pred1 = model1.predict(X_test)

In [10]:
### Confusion Matrix
confusionmatrix1 = confusion_matrix(y_test, y_pred1)
print('------------Confusion Matrix--------------')
print(confusionmatrix1)

------------Confusion Matrix--------------
[[1828  204]
 [ 493  475]]


In [11]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred1))

              precision    recall  f1-score   support

           0       0.79      0.90      0.84      2032
           1       0.70      0.49      0.58       968

    accuracy                           0.77      3000
   macro avg       0.74      0.70      0.71      3000
weighted avg       0.76      0.77      0.75      3000



In [12]:
### Test Performance Metrics
LOG_precision = precision_score(y_test, y_pred1)
LOG_recall = recall_score(y_test, y_pred1)
LOG_f1Score = f1_score(y_test, y_pred1)

print(f'The Precision of the Logistic Model is: {LOG_precision}')
print(f'The Recall of the Logistic Model is: {LOG_recall}')
print(f'The F1-Score of the Logistic Model is: {LOG_f1Score}')

The Precision of the Logistic Model is: 0.6995581737849779
The Recall of the Logistic Model is: 0.490702479338843
The F1-Score of the Logistic Model is: 0.5768063145112325


In [13]:
### ROC-AUC

#### 2. Support Vector Machines

In [14]:
svc_model = SVC()

## Fitting the Model
model2 = svc_model.fit(X_train, y_train)

In [15]:
### Performance Metrics
SVC_accuracy = model2.score(X_train, y_train)
SVC_cv_accuracy = cross_val_score(model2, X_train, y_train, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the SVM Model is: {SVC_accuracy}')
print(f'The Cross-Validated Accuracy of the SVM Model is: {SVC_cv_accuracy.mean()}')

The Initial Accuracy of the SVM Model is: 0.6907142857142857
The Cross-Validated Accuracy of the SVM Model is: 0.6907142857142856


In [16]:
## Prediction
y_pred2 = model2.predict(X_test)

In [17]:
### Confusion Matrix
confusionmatrix2 = confusion_matrix(y_test, y_pred2)
print('------------Confusion Matrix--------------')
print(confusionmatrix2)

------------Confusion Matrix--------------
[[2032    0]
 [ 968    0]]


In [18]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred2))

              precision    recall  f1-score   support

           0       0.68      1.00      0.81      2032
           1       0.00      0.00      0.00       968

    accuracy                           0.68      3000
   macro avg       0.34      0.50      0.40      3000
weighted avg       0.46      0.68      0.55      3000



In [19]:
### Test Performance Metrics
SVC_precision = precision_score(y_test, y_pred2)
SVC_recall = recall_score(y_test, y_pred2)
SVC_f1Score = f1_score(y_test, y_pred2)

print(f'The Precision of the SVM Model is: {SVC_precision}')
print(f'The Recall of the SVM Model is: {SVC_recall}')
print(f'The F1-Score of the SVM Model is: {SVC_f1Score}')

The Precision of the SVM Model is: 0.0
The Recall of the SVM Model is: 0.0
The F1-Score of the SVM Model is: 0.0


#### 3. K-Nearest Neighbours

In [20]:
knn_model = KNeighborsClassifier()

## Fitting the Model
model3 = knn_model.fit(X_train, y_train)

In [21]:
### Performance Metrics
KNN_accuracy = model3.score(X_train, y_train)
KNN_cv_accuracy = cross_val_score(model3, X_train, y_train, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the KNN Model is: {KNN_accuracy}')
print(f'The Cross-Validated Accuracy of the KNN Model is: {KNN_cv_accuracy.mean()}')

The Initial Accuracy of the KNN Model is: 0.8647142857142858
The Cross-Validated Accuracy of the KNN Model is: 0.8082857142857144


In [22]:
## Prediction
y_pred3 = model3.predict(X_test)

In [23]:
### Confusion Matrix
confusionmatrix3 = confusion_matrix(y_test, y_pred3)
print('------------Confusion Matrix--------------')
print(confusionmatrix3)

------------Confusion Matrix--------------
[[1761  271]
 [ 315  653]]


In [24]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred3))

              precision    recall  f1-score   support

           0       0.85      0.87      0.86      2032
           1       0.71      0.67      0.69       968

    accuracy                           0.80      3000
   macro avg       0.78      0.77      0.77      3000
weighted avg       0.80      0.80      0.80      3000



In [25]:
### Test Performance Metrics
KNN_precision = precision_score(y_test, y_pred3)
KNN_recall = recall_score(y_test, y_pred3)
KNN_f1Score = f1_score(y_test, y_pred3)

print(f'The Precision of the KNN Model is: {KNN_precision}')
print(f'The Recall of the KNN Model is: {KNN_recall}')
print(f'The F1-Score of the KNN Model is: {KNN_f1Score}')

The Precision of the KNN Model is: 0.7067099567099567
The Recall of the KNN Model is: 0.6745867768595041
The F1-Score of the KNN Model is: 0.6902748414376322


#### 4. Decision Trees

In [26]:
dc_model = DecisionTreeClassifier()

## Fitting the Model
model4 = dc_model.fit(X_train, y_train)

In [27]:
### Performance Metrics
DC_accuracy = model4.score(X_train, y_train)
DC_cv_accuracy = cross_val_score(model4, X_train, y_train, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the DC Model is: {DC_accuracy}')
print(f'The Cross-Validated Accuracy of the DC Model is: {DC_cv_accuracy.mean()}')

The Initial Accuracy of the DC Model is: 1.0
The Cross-Validated Accuracy of the DC Model is: 0.7779999999999999


In [28]:
## Prediction
y_pred4 = model4.predict(X_test)

In [29]:
### Confusion Matrix
confusionmatrix4 = confusion_matrix(y_test, y_pred4)
print('------------Confusion Matrix--------------')
print(confusionmatrix4)

------------Confusion Matrix--------------
[[1675  357]
 [ 374  594]]


In [30]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred4))

              precision    recall  f1-score   support

           0       0.82      0.82      0.82      2032
           1       0.62      0.61      0.62       968

    accuracy                           0.76      3000
   macro avg       0.72      0.72      0.72      3000
weighted avg       0.76      0.76      0.76      3000



In [31]:
### Test Performance Metrics
DC_precision = precision_score(y_test, y_pred4)
DC_recall = recall_score(y_test, y_pred4)
DC_f1Score = f1_score(y_test, y_pred4)

print(f'The Precision of the DC Model is: {DC_precision}')
print(f'The Recall of the DC Model is: {DC_recall}')
print(f'The F1-Score of the DC Model is: {DC_f1Score}')

The Precision of the DC Model is: 0.6246056782334385
The Recall of the DC Model is: 0.6136363636363636
The F1-Score of the DC Model is: 0.6190724335591454


#### 5. Random Forest

In [32]:
rf_model = RandomForestClassifier()

## Fitting the Model
model5 = rf_model.fit(X_train, y_train)

In [33]:
### Performance Metrics
RF_accuracy = model5.score(X_train, y_train)
RF_cv_accuracy = cross_val_score(model5, X_train, y_train, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the RF Model is: {RF_accuracy}')
print(f'The Cross-Validated Accuracy of the RF Model is: {RF_cv_accuracy.mean()}')

The Initial Accuracy of the RF Model is: 1.0
The Cross-Validated Accuracy of the RF Model is: 0.823857142857143


In [34]:
## Prediction
y_pred5 = model5.predict(X_test)

In [35]:
### Confusion Matrix
confusionmatrix5 = confusion_matrix(y_test, y_pred5)
print('------------Confusion Matrix--------------')
print(confusionmatrix5)

------------Confusion Matrix--------------
[[1823  209]
 [ 339  629]]


In [36]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred5))

              precision    recall  f1-score   support

           0       0.84      0.90      0.87      2032
           1       0.75      0.65      0.70       968

    accuracy                           0.82      3000
   macro avg       0.80      0.77      0.78      3000
weighted avg       0.81      0.82      0.81      3000



In [37]:
### Test Performance Metrics
RF_precision = precision_score(y_test, y_pred5)
RF_recall = recall_score(y_test, y_pred5)
RF_f1Score = f1_score(y_test, y_pred5)

print(f'The Precision of the RF Model is: {RF_precision}')
print(f'The Recall of the RF Model is: {RF_recall}')
print(f'The F1-Score of the RF Model is: {RF_f1Score}')

The Precision of the RF Model is: 0.7505966587112172
The Recall of the RF Model is: 0.6497933884297521
The F1-Score of the RF Model is: 0.6965669988925803


In [39]:
Model_Metrics = pd.DataFrame({'Model':['Logistic Regression', 'Support Vectors', 'K-Nearest Neighbours', 'Decision Trees', 'Random Forest'],
                      'Train Accuracy':[LOG_cv_accuracy.mean(), SVC_cv_accuracy.mean(), KNN_cv_accuracy.mean(), DC_cv_accuracy.mean(), RF_cv_accuracy.mean()],
                      'Test Precision':[LOG_precision, SVC_precision, KNN_precision, DC_precision, RF_precision],
                      'Test Recall':[LOG_recall, SVC_recall, KNN_recall, DC_recall, RF_recall],
                      'Test F1 Score':[LOG_f1Score, SVC_f1Score, KNN_f1Score, DC_f1Score, RF_f1Score],})

print('-----Model Metrics Analysis-----')
Model_Metrics.nlargest(5,'Test Recall')


-----Model Metrics Analysis-----


Unnamed: 0,Model,Train Accuracy,Test Precision,Test Recall,Test F1 Score
2,K-Nearest Neighbours,0.808286,0.70671,0.674587,0.690275
4,Random Forest,0.823857,0.750597,0.649793,0.696567
3,Decision Trees,0.778,0.624606,0.613636,0.619072
0,Logistic Regression,0.789571,0.699558,0.490702,0.576806
1,Support Vectors,0.690714,0.0,0.0,0.0


### With ReSampling (APPLYING SMOTE-Tomek)

In [40]:
from collections import Counter
from imblearn.combine import SMOTETomek
from imblearn.under_sampling import TomekLinks

In [41]:
smote_tomek = SMOTETomek(random_state=42, tomek= TomekLinks())
X_res, y_res = smote_tomek.fit_resample(X_train, y_train['OUTCOME'])

In [42]:
print('Original dataset shape %s' % Counter(y_train['OUTCOME']))
print('Resampled dataset shape %s' % Counter(y_res))

Original dataset shape Counter({0: 4835, 1: 2165})
Resampled dataset shape Counter({0: 4835, 1: 4453})


#### 1. Logistic Regression

In [43]:
log_model = LogisticRegression()
kfold = model_selection.KFold(n_splits= 10, random_state= 7, shuffle= True)

## Fitting the Model
model1 = log_model.fit(X_res, y_res)

In [44]:
### Performance Metrics
LOG_accuracy = model1.score(X_res, y_res)
LOG_cv_accuracy = cross_val_score(model1, X_res, y_res, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the Logistic Model is: {LOG_accuracy}')
print(f'The Cross-Validated Accuracy of the Logistic Model is: {LOG_cv_accuracy.mean()}')

The Initial Accuracy of the Logistic Model is: 0.8557278208440999
The Cross-Validated Accuracy of the Logistic Model is: 0.8250491815448573


In [45]:
## Prediction
y_pred1 = model1.predict(X_test)

In [46]:
### Confusion Matrix
confusionmatrix6 = confusion_matrix(y_test, y_pred1)
print('------------Confusion Matrix--------------')
print(confusionmatrix6)

------------Confusion Matrix--------------
[[1689  343]
 [ 193  775]]


In [47]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred1))

              precision    recall  f1-score   support

           0       0.90      0.83      0.86      2032
           1       0.69      0.80      0.74       968

    accuracy                           0.82      3000
   macro avg       0.80      0.82      0.80      3000
weighted avg       0.83      0.82      0.82      3000



In [48]:
### Test Performance Metrics
LOG_precision = precision_score(y_test, y_pred1)
LOG_recall = recall_score(y_test, y_pred1)
LOG_f1Score = f1_score(y_test, y_pred1)

print(f'The Precision of the Logistic Model is: {LOG_precision}')
print(f'The Recall of the Logistic Model is: {LOG_recall}')
print(f'The F1-Score of the Logistic Model is: {LOG_f1Score}')

The Precision of the Logistic Model is: 0.6932021466905188
The Recall of the Logistic Model is: 0.8006198347107438
The F1-Score of the Logistic Model is: 0.7430488974113136


#### 2. Support Vector Machines

In [49]:
svc_model = SVC()

## Fitting the Model
model2 = svc_model.fit(X_res, y_res)

In [50]:
### Performance Metrics
SVC_accuracy = model2.score(X_res, y_res)
SVC_cv_accuracy = cross_val_score(model2, X_res, y_res, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the SVM Model is: {SVC_accuracy}')
print(f'The Cross-Validated Accuracy of the SVM Model is: {SVC_cv_accuracy.mean()}')

The Initial Accuracy of the SVM Model is: 0.5869939707149009
The Cross-Validated Accuracy of the SVM Model is: 0.5888267417690509


In [51]:
## Prediction
y_pred2 = model2.predict(X_test)

In [52]:
### Confusion Matrix
confusionmatrix7 = confusion_matrix(y_test, y_pred2)
print('------------Confusion Matrix--------------')
print(confusionmatrix7)

------------Confusion Matrix--------------
[[1453  579]
 [ 543  425]]


In [53]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred2))

              precision    recall  f1-score   support

           0       0.73      0.72      0.72      2032
           1       0.42      0.44      0.43       968

    accuracy                           0.63      3000
   macro avg       0.58      0.58      0.58      3000
weighted avg       0.63      0.63      0.63      3000



In [54]:
### Test Performance Metrics
SVC_precision = precision_score(y_test, y_pred2)
SVC_recall = recall_score(y_test, y_pred2)
SVC_f1Score = f1_score(y_test, y_pred2)

print(f'The Precision of the SVM Model is: {SVC_precision}')
print(f'The Recall of the SVM Model is: {SVC_recall}')
print(f'The F1-Score of the SVM Model is: {SVC_f1Score}')

The Precision of the SVM Model is: 0.42330677290836655
The Recall of the SVM Model is: 0.4390495867768595
The F1-Score of the SVM Model is: 0.4310344827586207


#### 3. K-Nearest Neighbours

In [55]:
knn_model = KNeighborsClassifier()

## Fitting the Model
model3 = knn_model.fit(X_res, y_res)

In [56]:
### Performance Metrics
KNN_accuracy = model3.score(X_res, y_res)
KNN_cv_accuracy = cross_val_score(model3, X_res, y_res, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the KNN Model is: {KNN_accuracy}')
print(f'The Cross-Validated Accuracy of the KNN Model is: {KNN_cv_accuracy.mean()}')

The Initial Accuracy of the KNN Model is: 0.8996554694229113
The Cross-Validated Accuracy of the KNN Model is: 0.8575588786607773


In [57]:
## Prediction
y_pred3 = model3.predict(X_test)

In [58]:
### Confusion Matrix
confusionmatrix8 = confusion_matrix(y_test, y_pred3)
print('------------Confusion Matrix--------------')
print(confusionmatrix8)

------------Confusion Matrix--------------
[[1626  406]
 [ 238  730]]


In [59]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred3))

              precision    recall  f1-score   support

           0       0.87      0.80      0.83      2032
           1       0.64      0.75      0.69       968

    accuracy                           0.79      3000
   macro avg       0.76      0.78      0.76      3000
weighted avg       0.80      0.79      0.79      3000



In [60]:
### Test Performance Metrics
KNN_precision = precision_score(y_test, y_pred3)
KNN_recall = recall_score(y_test, y_pred3)
KNN_f1Score = f1_score(y_test, y_pred3)

print(f'The Precision of the KNN Model is: {KNN_precision}')
print(f'The Recall of the KNN Model is: {KNN_recall}')
print(f'The F1-Score of the KNN Model is: {KNN_f1Score}')

The Precision of the KNN Model is: 0.6426056338028169
The Recall of the KNN Model is: 0.7541322314049587
The F1-Score of the KNN Model is: 0.6939163498098859


#### 4. Decision Trees

In [61]:
dc_model = DecisionTreeClassifier()

## Fitting the Model
model4 = dc_model.fit(X_res, y_res)

In [62]:
### Performance Metrics
DC_accuracy = model4.score(X_res, y_res)
DC_cv_accuracy = cross_val_score(model4, X_res, y_res, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the DC Model is: {DC_accuracy}')
print(f'The Cross-Validated Accuracy of the DC Model is: {DC_cv_accuracy.mean()}')

The Initial Accuracy of the DC Model is: 1.0
The Cross-Validated Accuracy of the DC Model is: 0.8450685061801714


In [63]:
## Prediction
y_pred4 = model4.predict(X_test)

In [64]:
### Confusion Matrix
confusionmatrix9 = confusion_matrix(y_test, y_pred4)
print('------------Confusion Matrix--------------')
print(confusionmatrix9)

------------Confusion Matrix--------------
[[1673  359]
 [ 333  635]]


In [65]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred4))

              precision    recall  f1-score   support

           0       0.83      0.82      0.83      2032
           1       0.64      0.66      0.65       968

    accuracy                           0.77      3000
   macro avg       0.74      0.74      0.74      3000
weighted avg       0.77      0.77      0.77      3000



In [66]:
### Test Performance Metrics
DC_precision = precision_score(y_test, y_pred4)
DC_recall = recall_score(y_test, y_pred4)
DC_f1Score = f1_score(y_test, y_pred4)

print(f'The Precision of the DC Model is: {DC_precision}')
print(f'The Recall of the DC Model is: {DC_recall}')
print(f'The F1-Score of the DC Model is: {DC_f1Score}')

The Precision of the DC Model is: 0.6388329979879276
The Recall of the DC Model is: 0.65599173553719
The F1-Score of the DC Model is: 0.6472986748216105


#### 5. Random Forest

In [67]:
rf_model = RandomForestClassifier()

## Fitting the Model
model5 = rf_model.fit(X_res, y_res)

In [68]:
### Performance Metrics
RF_accuracy = model5.score(X_res, y_res)
RF_cv_accuracy = cross_val_score(model5, X_res, y_res, cv= kfold, scoring= 'accuracy')    ### To overcome Overfitting.

print(f'The Initial Accuracy of the RF Model is: {RF_accuracy}')
print(f'The Cross-Validated Accuracy of the RF Model is: {RF_cv_accuracy.mean()}')

The Initial Accuracy of the RF Model is: 0.9997846683893196
The Cross-Validated Accuracy of the RF Model is: 0.888888102705913


In [69]:
## Prediction
y_pred5 = model5.predict(X_test)

In [70]:
### Confusion Matrix
confusionmatrix10 = confusion_matrix(y_test, y_pred5)
print('------------Confusion Matrix--------------')
print(confusionmatrix10)

------------Confusion Matrix--------------
[[1785  247]
 [ 327  641]]


In [71]:
### Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred5))

              precision    recall  f1-score   support

           0       0.85      0.88      0.86      2032
           1       0.72      0.66      0.69       968

    accuracy                           0.81      3000
   macro avg       0.78      0.77      0.78      3000
weighted avg       0.81      0.81      0.81      3000



In [72]:
### Test Performance Metrics
RF_precision = precision_score(y_test, y_pred5)
RF_recall = recall_score(y_test, y_pred5)
RF_f1Score = f1_score(y_test, y_pred5)

print(f'The Precision of the RF Model is: {RF_precision}')
print(f'The Recall of the RF Model is: {RF_recall}')
print(f'The F1-Score of the RF Model is: {RF_f1Score}')

The Precision of the RF Model is: 0.7218468468468469
The Recall of the RF Model is: 0.6621900826446281
The F1-Score of the RF Model is: 0.6907327586206897


In [73]:
Metrics = pd.DataFrame({'Model':['Logistic Regression', 'Support Vectors', 'K-Nearest Neighbours', 'Decision Trees', 'Random Forest'],
                      'Train Accuracy':[LOG_cv_accuracy.mean(), SVC_cv_accuracy.mean(), KNN_cv_accuracy.mean(), DC_cv_accuracy.mean(), RF_cv_accuracy.mean()],
                      'Test Precision':[LOG_precision, SVC_precision, KNN_precision, DC_precision, RF_precision],
                      'Test Recall':[LOG_recall, SVC_recall, KNN_recall, DC_recall, RF_recall],
                      'Test F1 Score':[LOG_f1Score, SVC_f1Score, KNN_f1Score, DC_f1Score, RF_f1Score],})

print('-----Model Metrics Analysis-----')
Metrics.nlargest(5,'Test Recall')


-----Model Metrics Analysis-----


Unnamed: 0,Model,Train Accuracy,Test Precision,Test Recall,Test F1 Score
0,Logistic Regression,0.825049,0.693202,0.80062,0.743049
2,K-Nearest Neighbours,0.857559,0.642606,0.754132,0.693916
4,Random Forest,0.888888,0.721847,0.66219,0.690733
3,Decision Trees,0.845069,0.638833,0.655992,0.647299
1,Support Vectors,0.588827,0.423307,0.43905,0.431034


<h4 align= 'center'><strong><b>I find that Logistic Regression has the highest Recall compares to all other Models. Also its Accuracy is moderate enough.</b></strong></h4>

In [84]:
# accuracy_score(y_test, y_pred1)