# RESULTS

__Libraries__ 

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from PIL import Image
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', 100)

__Results__

In [2]:
# Imports the Results Dataframes
LR =  pd.read_csv('LogisticRegression_Results.csv', index_col = 'Metrics')
LDA =  pd.read_csv('LDA_Results.csv', index_col = 'Metrics')
QDA =  pd.read_csv('QDA_Results.csv', index_col = 'Metrics')
KNN = pd.read_csv('KNN_Results.csv', index_col = 'Metrics')
GNB = pd.read_csv('GNB_Results.csv', index_col = 'Metrics')
RF = pd.read_csv('RF_Results.csv', index_col = 'Metrics')
SVC = pd.read_csv('SVC_Results.csv', index_col = 'Metrics')
XGBOOST = pd.read_csv('XGBOOST_Results.csv', index_col = 'Metrics')

In [3]:
# Concatenating
Results = pd.concat([LR,LDA, QDA, KNN, GNB, RF, SVC, XGBOOST], axis = 1 )

Results = Results.transpose()
Results.fillna('-------', inplace = True)
Results

Metrics,Training Accuracy,Training Recall,CV Accuracy,CV Recall,Test Accuracy,Test Recall
Logistic Regression,0.871,0.833,0.859,0.83,0.756,0.696
Linear Discriminant Analysis,0.855,0.781,0.859,0.8,0.756,0.696
Quadratic Discriminant Analysis,0.871,0.816,0.843,0.798,0.8,0.696
K-Nearest Neighbors,0.855,0.825,0.843,0.812,0.778,0.739
Gaussian Naive Bayes,0.843,0.807,0.827,0.784,0.844,0.783
Random Forest Classifier,0.937,0.895,0.827,0.803,0.778,0.739
Support Vector Classifier,0.808,0.798,0.804,0.845,0.8,0.739
XGBOOST,0.894,0.842,0.847,0.821,0.667,0.435


__Confusion Matrices__

In [4]:
# List of image file names
image_files = ['CM_te_LR.png', 'CM_te_LDA.png','CM_te_QDA.png','CM_te_KNN.png' ,
               'CM_te_GNB.png','CM_te_RF.png', 'CM_te_SVC.png', 'CM_te_XGBOOST.png']


images = [Image.open(image_file) for image_file in image_files]

# Get the width and height of the first image
width, height = images[0].size

# Calculate the number of rows and columns
num_rows = 4
num_columns = 2

# Create a new blank image with enough space to accommodate all images
combined_width = width * num_columns
combined_height = height * num_rows
combined_image = Image.new('RGB', (combined_width, combined_height))

# Paste each image onto the blank image
for i, image in enumerate(images):
    row = i // num_columns
    column = i % num_columns
    combined_image.paste(image, (column * width, row * height))

# Display the combined image
combined_image.show()

# Save the combined image
combined_image.save('combined_image.png')

__CONCLUSIONS__

* Based on our experimentation, it appears that XGBOOST has shown relatively weaker performance compared to other methods we've tested. This assessment is based on the disparity between its performance metrics in testing versus training scenarios.

* Models sorted first by CV Accuracy and then by CV Recall
    1) Logistic Regression(0.859, 0.83), Linear Discriminant Analysis (0.859, 0.8)
    2) XGBOOST (0.847, 0.821)
    3) K-Nearest Neighbors (0.843, 0.812), Quadratic Discriminant Analysis (0.843, 0.798)
    4) Random Forest Classifier (0.827, 0.803), Gaussian Naive Bayes (0.827, 0.784)
    5) Support Vector Classifier (0.804, 0.845)

* Models sorted first by Test Accuracy and then test CV Recall
    1) Gaussian Naive Bayes (0.844, 0.783)
    2) Quadratic Discriminant Analysis (0.8, 0.696), Support Vector Classifier (0.8, 0.739)
    3) K-Nearest Neighbors (0.778, 0.739), Random Forest Classifier (0.778, 0.739)
    4) Logistic Regression (0.756, 0.696), Linear Discriminant Analysis (0.756, 0.696)
    5) Random Forest Classifier (0.778, 0.739)

* Models sorted based on CV Recall
    1) Support Vector Classifier 0.845
    2) Logistic Regression 0.83
    3) XGBOOST 0.821
    4) K-Nearest Neighbors 0.812
    5) Random Forest Classifier 0.803

* Models sorted based on Test Recall
    1) Gaussian Naive Bayes 0.783
    2) K-Nearest Neighbors 0.739
    3) Random Forest Classifier 0.739
    4) Support Vector Classifier 0.739
    5) Random Forest Classifier 0.803



Selecting the best model from our options has proven challenging due to comparable performance metrics across the models. Nonetheless, a decision must be made. Therefore, we opt for the Gaussian Naive Bayes method for prediction. This choice is supported by its superior test accuracy and recall (Accuracy: 0.844, Recall: 0.783), as well as its closely aligned cross-validated metrics (Accuracy: 0.827, Recall: 0.784) with the best cross-validated metrics observed (Accuracy: 0.859, Recall: 0.845).

If prioritizing recall alone were necessary, the Support Vector Classifier would be the preferred choice.