### **DATA SCIENCE PROJECT ON INX FUTURE INC EMPLOYEE PERFORMANCE ANALYSIS**
### **BUISNESS CASE: BASED ON GIVEN FEATURE OF DATASET WE NEED TO PREDICT THE PERFOMANCE RATING OF EMPLOYEE**
##### MODEL CREATION & EVALUATION SUMMARY:
* Loading pre-process data
* Define dependant & independant features
* Balancing the target feature
* Split training and testing data
* Model creation, prediction & evaluation
* Model saving

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

import warnings # Used to supressed the warnings
warnings.filterwarnings('ignore')

#### LOADING PREPROCESS DATA

In [2]:
data = pd.read_csv('employee_performance_analysis_preprocessed_data.csv')
pd.set_option('display.max_columns',None) # Used to display the all features
data.drop('Unnamed: 0',axis=1,inplace=True) # Drop unwanted feature
data.head()

Unnamed: 0,pca1,pca2,pca3,pca4,pca5,pca6,pca7,pca8,pca9,pca10,pca11,pca12,pca13,pca14,pca15,pca16,pca17,pca18,pca19,pca20,pca21,pca22,pca23,pca24,pca25,PerformanceRating
0,-4.474617,-1.635606,1.359876,0.96549,-1.55008,0.179717,0.843739,-1.594458,0.749029,-0.020451,-1.286037,-0.05826,0.407345,-0.249001,-0.004244,0.70779,0.106931,-0.601961,0.09751,0.039537,-0.287547,-0.451346,0.310696,-0.273623,-0.162068,3
1,-4.357388,-0.057871,2.039849,1.537299,0.291088,1.628643,0.763073,0.15587,1.045583,0.797368,-1.703516,1.157972,-0.339029,0.286234,-0.145409,0.500531,-0.3584,0.423352,-0.879996,-0.539284,-0.275408,-0.922157,-0.175002,-0.684618,-0.000295,3
2,-4.244991,2.581279,4.424786,-0.16287,-1.914806,1.10265,-1.47936,0.442567,0.838079,1.505285,0.185119,2.371751,0.82704,0.13101,-0.736234,-0.795708,0.507253,0.461541,0.188903,-0.38044,0.173953,-0.417505,-0.24745,0.745477,-0.371272,4
3,3.012637,0.735434,2.433771,3.347248,1.326405,-2.357479,1.226972,0.340809,-0.223106,-0.053884,-0.14211,-1.405722,0.581401,1.134294,1.758583,-0.218867,0.891003,-1.504801,0.590889,0.202439,0.22384,-0.577573,-0.024454,-0.471628,-0.471033,3
4,-4.249783,5.975149,-0.464801,0.783218,2.877106,0.052133,-0.434443,-0.391564,0.845524,1.203842,-1.614499,0.128538,1.115261,-0.095232,-0.130931,0.812941,-0.300831,1.104789,-1.21627,0.843609,0.101158,-0.177141,0.471097,-0.151107,-0.44797,3


#### DEFINE INDEPENDANT & DEPENDANT FEATURES

In [3]:
X = data.iloc[:,:-1]
y = data.PerformanceRating

In [7]:
X.head()

Unnamed: 0,pca1,pca2,pca3,pca4,pca5,pca6,pca7,pca8,pca9,pca10,pca11,pca12,pca13,pca14,pca15,pca16,pca17,pca18,pca19,pca20,pca21,pca22,pca23,pca24,pca25
0,-4.474617,-1.635606,1.359876,0.96549,-1.55008,0.179717,0.843739,-1.594458,0.749029,-0.020451,-1.286037,-0.05826,0.407345,-0.249001,-0.004244,0.70779,0.106931,-0.601961,0.09751,0.039537,-0.287547,-0.451346,0.310696,-0.273623,-0.162068
1,-4.357388,-0.057871,2.039849,1.537299,0.291088,1.628643,0.763073,0.15587,1.045583,0.797368,-1.703516,1.157972,-0.339029,0.286234,-0.145409,0.500531,-0.3584,0.423352,-0.879996,-0.539284,-0.275408,-0.922157,-0.175002,-0.684618,-0.000295
2,-4.244991,2.581279,4.424786,-0.16287,-1.914806,1.10265,-1.47936,0.442567,0.838079,1.505285,0.185119,2.371751,0.82704,0.13101,-0.736234,-0.795708,0.507253,0.461541,0.188903,-0.38044,0.173953,-0.417505,-0.24745,0.745477,-0.371272
3,3.012637,0.735434,2.433771,3.347248,1.326405,-2.357479,1.226972,0.340809,-0.223106,-0.053884,-0.14211,-1.405722,0.581401,1.134294,1.758583,-0.218867,0.891003,-1.504801,0.590889,0.202439,0.22384,-0.577573,-0.024454,-0.471628,-0.471033
4,-4.249783,5.975149,-0.464801,0.783218,2.877106,0.052133,-0.434443,-0.391564,0.845524,1.203842,-1.614499,0.128538,1.115261,-0.095232,-0.130931,0.812941,-0.300831,1.104789,-1.21627,0.843609,0.101158,-0.177141,0.471097,-0.151107,-0.44797


In [8]:
y.head()

0    3
1    3
2    4
3    3
4    3
Name: PerformanceRating, dtype: int64

#### BALANCING THE TARGET FEATURE
* SMOTE: SMOTE (synthetic minority oversampling technique) is one of the most commonly used oversampling methods to solve the imbalance problem. It aims to balance class distribution by randomly increasing minority class examples by replicating them. SMOTE synthesises new minority instances between existing minority instances.

In [9]:
from collections import Counter
from imblearn.over_sampling import SMOTE #SMOTE(synthetic minority oversampling techinque)
sm = SMOTE() # obeject creation
print("unbalanced data   :  ",Counter(y))
X_sm,y_sm = sm.fit_resample(X,y)
print("balanced data:    :",Counter(y_sm))

unbalanced data   :   Counter({3: 874, 2: 194, 4: 132})
balanced data:    : Counter({3: 874, 4: 874, 2: 874})


Now target feature in balance

#### SPLIT TRAINING AND TESTING DATA

In [10]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X_sm,y_sm,random_state=42,test_size=0.20) # 20% data given to testing


In [11]:
# Check shape of train and test
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((2097, 25), (525, 25), (2097,), (525,))

#### MODEL CREATION, PREDICTION AND EVALUATION
 AIM
* Create a sweet spot model (Low bias, Low variance)
##### HERE WE WILL BE EXPERIMENTING WITH THREE ALGORITHM
* Support Vector Machine
* Random Forest
* XGBOOST
* Decision tree
* Artificial Neural Network [MLP Classifier]

#### **SVM**

In [17]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

# Initialize the SVM model
svm_model = SVC(kernel='rbf', probability=True, random_state=42)

# Train the model
svm_model.fit(X_train, y_train)

# Predict on training and test sets
y_train_pred_svm = svm_model.predict(X_train)
y_test_pred_svm = svm_model.predict(X_test)

# Evaluate training accuracy
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"SVM Training Accuracy: {train_accuracy_svm:.4f}")

# Evaluate test accuracy
test_accuracy_svm = accuracy_score(y_test, y_test_pred_svm)
print(f"SVM Test Accuracy: {test_accuracy_svm:.4f}")

# Classification report and confusion matrix for test set
print("Support Vector Machine (SVM) Performance on Test Data:")
print("Classification Report:")
print(classification_report(y_test, y_test_pred_svm))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_svm))


SVM Training Accuracy: 0.9995
SVM Test Accuracy: 0.9981
Support Vector Machine (SVM) Performance on Test Data:
Classification Report:
              precision    recall  f1-score   support

           2       0.99      1.00      1.00       184
           3       1.00      0.99      1.00       173
           4       1.00      1.00      1.00       168

    accuracy                           1.00       525
   macro avg       1.00      1.00      1.00       525
weighted avg       1.00      1.00      1.00       525

Confusion Matrix:
[[184   0   0]
 [  1 172   0]
 [  0   0 168]]


#### **Random Forest**

In [18]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Predict on training and test sets
y_train_pred_rf = rf_model.predict(X_train)
y_test_pred_rf = rf_model.predict(X_test)

# Evaluate training accuracy
train_accuracy_rf = accuracy_score(y_train, y_train_pred_rf)
print(f"Random Forest Training Accuracy: {train_accuracy_rf:.4f}")

# Evaluate test accuracy
test_accuracy_rf = accuracy_score(y_test, y_test_pred_rf)
print(f"Random Forest Test Accuracy: {test_accuracy_rf:.4f}")

# Classification report and confusion matrix for test set
print("Random Forest Performance on Test Data:")
print("Classification Report:")
print(classification_report(y_test, y_test_pred_rf))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_rf))


Random Forest Training Accuracy: 1.0000
Random Forest Test Accuracy: 0.9886
Random Forest Performance on Test Data:
Classification Report:
              precision    recall  f1-score   support

           2       0.98      0.99      0.98       184
           3       0.99      0.98      0.98       173
           4       1.00      1.00      1.00       168

    accuracy                           0.99       525
   macro avg       0.99      0.99      0.99       525
weighted avg       0.99      0.99      0.99       525

Confusion Matrix:
[[182   2   0]
 [  4 169   0]
 [  0   0 168]]


#### **DecisionTree**

In [19]:
from sklearn.tree import DecisionTreeClassifier

# Initialize the Decision Tree model
dt_model = DecisionTreeClassifier(random_state=42)

# Train the model
dt_model.fit(X_train, y_train)

# Predict on training and test sets
y_train_pred_dt = dt_model.predict(X_train)
y_test_pred_dt = dt_model.predict(X_test)

# Evaluate training accuracy
train_accuracy_dt = accuracy_score(y_train, y_train_pred_dt)
print(f"Decision Tree Training Accuracy: {train_accuracy_dt:.4f}")

# Evaluate test accuracy
test_accuracy_dt = accuracy_score(y_test, y_test_pred_dt)
print(f"Decision Tree Test Accuracy: {test_accuracy_dt:.4f}")

# Classification report and confusion matrix for test set
print("Decision Tree Performance on Test Data:")
print("Classification Report:")
print(classification_report(y_test, y_test_pred_dt))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_dt))


Decision Tree Training Accuracy: 1.0000
Decision Tree Test Accuracy: 0.9276
Decision Tree Performance on Test Data:
Classification Report:
              precision    recall  f1-score   support

           2       0.94      0.97      0.96       184
           3       0.92      0.86      0.89       173
           4       0.92      0.95      0.94       168

    accuracy                           0.93       525
   macro avg       0.93      0.93      0.93       525
weighted avg       0.93      0.93      0.93       525

Confusion Matrix:
[[179   5   0]
 [ 11 148  14]
 [  0   8 160]]


#### **XGBOOST**

In [22]:
from xgboost import XGBClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Encode the target variable to have values starting from 0
y_encoded = label_encoder.fit_transform(y_sm)

# Split the encoded data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_sm, y_encoded, random_state=42, test_size=0.20)

# Initialize the XGBoost model
xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', random_state=42)

# Train the model
xgb_model.fit(X_train, y_train)

# Predict on training and test sets
y_train_pred_xgb = xgb_model.predict(X_train)
y_test_pred_xgb = xgb_model.predict(X_test)

# Evaluate training accuracy
train_accuracy_xgb = accuracy_score(y_train, y_train_pred_xgb)
print(f"XGBoost Training Accuracy: {train_accuracy_xgb:.4f}")

# Evaluate test accuracy
test_accuracy_xgb = accuracy_score(y_test, y_test_pred_xgb)
print(f"XGBoost Test Accuracy: {test_accuracy_xgb:.4f}")

# Classification report and confusion matrix for test set
print("XGBoost Performance on Test Data:")
print("Classification Report:")
print(classification_report(y_test, y_test_pred_xgb))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_xgb))


XGBoost Training Accuracy: 1.0000
XGBoost Test Accuracy: 0.9810
XGBoost Performance on Test Data:
Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.99      0.98       184
           1       0.99      0.95      0.97       173
           2       0.99      1.00      0.99       168

    accuracy                           0.98       525
   macro avg       0.98      0.98      0.98       525
weighted avg       0.98      0.98      0.98       525

Confusion Matrix:
[[182   2   0]
 [  6 165   2]
 [  0   0 168]]


#### **MLP**

In [21]:
from sklearn.neural_network import MLPClassifier

# Initialize the MLP model
mlp_model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=300, random_state=42)

# Train the model
mlp_model.fit(X_train, y_train)

# Predict on training and test sets
y_train_pred_mlp = mlp_model.predict(X_train)
y_test_pred_mlp = mlp_model.predict(X_test)

# Evaluate training accuracy
train_accuracy_mlp = accuracy_score(y_train, y_train_pred_mlp)
print(f"MLP Classifier Training Accuracy: {train_accuracy_mlp:.4f}")

# Evaluate test accuracy
test_accuracy_mlp = accuracy_score(y_test, y_test_pred_mlp)
print(f"MLP Classifier Test Accuracy: {test_accuracy_mlp:.4f}")

# Classification report and confusion matrix for test set
print("MLP Classifier (Neural Network) Performance on Test Data:")
print("Classification Report:")
print(classification_report(y_test, y_test_pred_mlp))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_mlp))


MLP Classifier Training Accuracy: 1.0000
MLP Classifier Test Accuracy: 0.9962
MLP Classifier (Neural Network) Performance on Test Data:
Classification Report:
              precision    recall  f1-score   support

           2       0.99      1.00      1.00       184
           3       1.00      0.99      0.99       173
           4       0.99      1.00      1.00       168

    accuracy                           1.00       525
   macro avg       1.00      1.00      1.00       525
weighted avg       1.00      1.00      1.00       525

Confusion Matrix:
[[184   0   0]
 [  1 171   1]
 [  0   0 168]]


#### Hyperparameter Tuning for all Model

#### **SVM**

In [29]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# SVM Hyperparameter Tuning
svm_param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.01, 0.1, 1],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid']
}

svm_model = SVC(random_state=42)
random_search_svm = RandomizedSearchCV(svm_model, svm_param_grid, n_iter=20, cv=5, verbose=1, random_state=42, n_jobs=-1)
random_search_svm.fit(X_train, y_train)

print("Best Parameters for SVM:", random_search_svm.best_params_)
print("Best Score for SVM:", random_search_svm.best_score_)

# Evaluate on training and test set
svm_best = random_search_svm.best_estimator_
y_train_pred_svm = svm_best.predict(X_train)
y_test_pred_svm = svm_best.predict(X_test)

train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
test_accuracy_svm = accuracy_score(y_test, y_test_pred_svm)

print("SVM Training Accuracy:", train_accuracy_svm)
print("SVM Testing Accuracy:", test_accuracy_svm)
print("SVM Classification Report:\n", classification_report(y_test, y_test_pred_svm))


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters for SVM: {'kernel': 'linear', 'gamma': 'scale', 'C': 0.1}
Best Score for SVM: 1.0
SVM Training Accuracy: 1.0
SVM Testing Accuracy: 1.0
SVM Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       184
           1       1.00      1.00      1.00       173
           2       1.00      1.00      1.00       168

    accuracy                           1.00       525
   macro avg       1.00      1.00      1.00       525
weighted avg       1.00      1.00      1.00       525



#### **RandomForest**

In [30]:
from sklearn.ensemble import RandomForestClassifier

# Random Forest Hyperparameter Tuning
rf_param_grid = {
    'n_estimators': [100, 200, 300, 400, 500],
    'max_depth': [None, 10, 20, 30, 40],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'bootstrap': [True, False]
}

rf_model = RandomForestClassifier(random_state=42)
random_search_rf = RandomizedSearchCV(rf_model, rf_param_grid, n_iter=20, cv=5, verbose=1, random_state=42, n_jobs=-1)
random_search_rf.fit(X_train, y_train)

print("Best Parameters for Random Forest:", random_search_rf.best_params_)
print("Best Score for Random Forest:", random_search_rf.best_score_)

# Evaluate on training and test set
rf_best = random_search_rf.best_estimator_
y_train_pred_rf = rf_best.predict(X_train)
y_test_pred_rf = rf_best.predict(X_test)

train_accuracy_rf = accuracy_score(y_train, y_train_pred_rf)
test_accuracy_rf = accuracy_score(y_test, y_test_pred_rf)

print("Random Forest Training Accuracy:", train_accuracy_rf)
print("Random Forest Testing Accuracy:", test_accuracy_rf)
print("Random Forest Classification Report:\n", classification_report(y_test, y_test_pred_rf))


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters for Random Forest: {'n_estimators': 300, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_depth': 40, 'bootstrap': False}
Best Score for Random Forest: 0.9957086032503695
Random Forest Training Accuracy: 1.0
Random Forest Testing Accuracy: 0.9942857142857143
Random Forest Classification Report:
               precision    recall  f1-score   support

           0       0.99      0.99      0.99       184
           1       0.99      0.99      0.99       173
           2       1.00      1.00      1.00       168

    accuracy                           0.99       525
   macro avg       0.99      0.99      0.99       525
weighted avg       0.99      0.99      0.99       525



#### **DecisionTree**

In [31]:
from sklearn.tree import DecisionTreeClassifier

# Decision Tree Hyperparameter Tuning
dt_param_grid = {
    'criterion': ['gini', 'entropy'],
    'splitter': ['best', 'random'],
    'max_depth': [None, 10, 20, 30, 40],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

dt_model = DecisionTreeClassifier(random_state=42)
random_search_dt = RandomizedSearchCV(dt_model, dt_param_grid, n_iter=20, cv=5, verbose=1, random_state=42, n_jobs=-1)
random_search_dt.fit(X_train, y_train)

print("Best Parameters for Decision Tree:", random_search_dt.best_params_)
print("Best Score for Decision Tree:", random_search_dt.best_score_)

# Evaluate on training and test set
dt_best = random_search_dt.best_estimator_
y_train_pred_dt = dt_best.predict(X_train)
y_test_pred_dt = dt_best.predict(X_test)

train_accuracy_dt = accuracy_score(y_train, y_train_pred_dt)
test_accuracy_dt = accuracy_score(y_test, y_test_pred_dt)

print("Decision Tree Training Accuracy:", train_accuracy_dt)
print("Decision Tree Testing Accuracy:", test_accuracy_dt)
print("Decision Tree Classification Report:\n", classification_report(y_test, y_test_pred_dt))


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters for Decision Tree: {'splitter': 'best', 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_depth': None, 'criterion': 'entropy'}
Best Score for Decision Tree: 0.9356154108421411
Decision Tree Training Accuracy: 0.9938006676204101
Decision Tree Testing Accuracy: 0.9314285714285714
Decision Tree Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.94      0.94       184
           1       0.90      0.90      0.90       173
           2       0.97      0.96      0.96       168

    accuracy                           0.93       525
   macro avg       0.93      0.93      0.93       525
weighted avg       0.93      0.93      0.93       525



#### **XGBOOST**

In [32]:
from xgboost import XGBClassifier

# XGBoost Hyperparameter Tuning
xgb_param_grid = {
    'n_estimators': [100, 200, 300, 400, 500],
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'max_depth': [3, 5, 7, 10],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0]
}

xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', random_state=42)
random_search_xgb = RandomizedSearchCV(xgb_model, xgb_param_grid, n_iter=20, cv=5, verbose=1, random_state=42, n_jobs=-1)
random_search_xgb.fit(X_train, y_train)

print("Best Parameters for XGBoost:", random_search_xgb.best_params_)
print("Best Score for XGBoost:", random_search_xgb.best_score_)

# Evaluate on training and test set
xgb_best = random_search_xgb.best_estimator_
y_train_pred_xgb = xgb_best.predict(X_train)
y_test_pred_xgb = xgb_best.predict(X_test)

train_accuracy_xgb = accuracy_score(y_train, y_train_pred_xgb)
test_accuracy_xgb = accuracy_score(y_test, y_test_pred_xgb)

print("XGBoost Training Accuracy:", train_accuracy_xgb)
print("XGBoost Testing Accuracy:", test_accuracy_xgb)
print("XGBoost Classification Report:\n", classification_report(y_test, y_test_pred_xgb))


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters for XGBoost: {'subsample': 0.6, 'n_estimators': 200, 'max_depth': 5, 'learning_rate': 0.2, 'colsample_bytree': 0.6}
Best Score for XGBoost: 0.9961847937265599
XGBoost Training Accuracy: 1.0
XGBoost Testing Accuracy: 0.9866666666666667
XGBoost Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.99      0.98       184
           1       0.99      0.97      0.98       173
           2       0.99      1.00      1.00       168

    accuracy                           0.99       525
   macro avg       0.99      0.99      0.99       525
weighted avg       0.99      0.99      0.99       525



#### **MLP**

In [33]:
from sklearn.neural_network import MLPClassifier

# MLP Classifier Hyperparameter Tuning
mlp_param_grid = {
    'hidden_layer_sizes': [(50, 50), (100,), (100, 50)],
    'activation': ['tanh', 'relu'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate': ['constant', 'adaptive']
}

mlp_model = MLPClassifier(max_iter=1000, random_state=42)
random_search_mlp = RandomizedSearchCV(mlp_model, mlp_param_grid, n_iter=20, cv=5, verbose=1, random_state=42, n_jobs=-1)
random_search_mlp.fit(X_train, y_train)

print("Best Parameters for MLP Classifier:", random_search_mlp.best_params_)
print("Best Score for MLP Classifier:", random_search_mlp.best_score_)

# Evaluate on training and test set
mlp_best = random_search_mlp.best_estimator_
y_train_pred_mlp = mlp_best.predict(X_train)
y_test_pred_mlp = mlp_best.predict(X_test)

train_accuracy_mlp = accuracy_score(y_train, y_train_pred_mlp)
test_accuracy_mlp = accuracy_score(y_test, y_test_pred_mlp)

print("MLP Classifier Training Accuracy:", train_accuracy_mlp)
print("MLP Classifier Testing Accuracy:", test_accuracy_mlp)
print("MLP Classifier Classification Report:\n", classification_report(y_test, y_test_pred_mlp))


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters for MLP Classifier: {'solver': 'adam', 'learning_rate': 'constant', 'hidden_layer_sizes': (100, 50), 'alpha': 0.0001, 'activation': 'tanh'}
Best Score for MLP Classifier: 1.0
MLP Classifier Training Accuracy: 1.0
MLP Classifier Testing Accuracy: 1.0
MLP Classifier Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       184
           1       1.00      1.00      1.00       173
           2       1.00      1.00      1.00       168

    accuracy                           1.00       525
   macro avg       1.00      1.00      1.00       525
weighted avg       1.00      1.00      1.00       525



#### Compare Model

In [48]:
# Accuracies before tuning
train_accuracy_svm_before = 0.9995
test_accuracy_svm_before = 0.9981

train_accuracy_rf_before = 1.0000
test_accuracy_rf_before = 0.9886

train_accuracy_dt_before = 1.0000
test_accuracy_dt_before = 0.9276

train_accuracy_xgb_before = 1.0000
test_accuracy_xgb_before = 0.9810

train_accuracy_mlp_before = 1.0000
test_accuracy_mlp_before = 0.9962

# Accuracies after tuning
train_accuracy_svm_after = 1.0000
test_accuracy_svm_after = 1.0000

train_accuracy_rf_after = 1.0000
test_accuracy_rf_after = 0.9942

train_accuracy_dt_after = 0.9938
test_accuracy_dt_after = 0.9314

train_accuracy_xgb_after = 1.0000
test_accuracy_xgb_after = 0.9866

train_accuracy_mlp_after = 1.0000
test_accuracy_mlp_after = 1.0000

# Print results in a tabular format
print(f"{'Model':<20}{'Train Accuracy Before':<25}{'Test Accuracy Before':<25}{'Train Accuracy After Tuning':<30}{'Test Accuracy After Tuning':<30}")
print("="*127)

print(f"{'SVM':<20}{train_accuracy_svm_before:<30.6f}{test_accuracy_svm_before:<30.6f}{train_accuracy_svm_after:<30.6f}{test_accuracy_svm_after:<30.6f}")
print(f"{'Random Forest':<20}{train_accuracy_rf_before:<30.6f}{test_accuracy_rf_before:<30.6f}{train_accuracy_rf_after:<30.6f}{test_accuracy_rf_after:<30.6f}")
print(f"{'Decision Tree':<20}{train_accuracy_dt_before:<30.6f}{test_accuracy_dt_before:<30.6f}{train_accuracy_dt_after:<30.6f}{test_accuracy_dt_after:<30.6f}")
print(f"{'XGBoost':<20}{train_accuracy_xgb_before:<30.6f}{test_accuracy_xgb_before:<30.6f}{train_accuracy_xgb_after:<30.6f}{test_accuracy_xgb_after:<30.6f}")
print(f"{'MLP Classifier':<20}{train_accuracy_mlp_before:<30.6f}{test_accuracy_mlp_before:<30.6f}{train_accuracy_mlp_after:<30.6f}{test_accuracy_mlp_after:<30.6f}")


Model               Train Accuracy Before    Test Accuracy Before     Train Accuracy After Tuning   Test Accuracy After Tuning    
SVM                 0.999500                      0.998100                      1.000000                      1.000000                      
Random Forest       1.000000                      0.988600                      1.000000                      0.994200                      
Decision Tree       1.000000                      0.927600                      0.993800                      0.931400                      
XGBoost             1.000000                      0.981000                      1.000000                      0.986600                      
MLP Classifier      1.000000                      0.996200                      1.000000                      1.000000                      


#### Here is a summary of the model performance before and after hyperparameter tuning:
##### Summary Before Hyperparameter Tuning:
##### SVM:
* Training Accuracy: 99.95%
* Test Accuracy: 99.81%
##### Random Forest:
* Training Accuracy: 100.00%
* Test Accuracy: 98.86%
##### Decision Tree:
* Training Accuracy: 100.00%
* Test Accuracy: 92.76%
##### XGBoost:
* Training Accuracy: 100.00%
* Test Accuracy: 98.10%
##### MLP Classifier:
* Training Accuracy: 100.00%
* Test Accuracy: 99.62%
#### Summary After Hyperparameter Tuning:
##### SVM:
* Training Accuracy: 100.00%
* Test Accuracy: 100.00%
##### Random Forest:
* Training Accuracy: 100.00%
* Test Accuracy: 99.42%
##### Decision Tree:
* Training Accuracy: 99.38%
* Test Accuracy: 93.14%
##### XGBoost:
* Training Accuracy: 100.00%
* Test Accuracy: 98.66%
##### MLP Classifier:
* Training Accuracy: 100.00%
* Test Accuracy: 100.00%

**Based on these results, the MLP Classifier and SVM show the highest test accuracy after hyperparameter tuning (100.00%). Both models seem to be performing excellently.So I save the MLP model.**



In [54]:
import pickle

# Save the best MLP model
with open('best_mlp_model.pkl', 'wb') as file:
    pickle.dump(mlp_best, file)

print("Best MLP model saved successfully as 'best_mlp_model.pkl'.")


Best MLP model saved successfully as 'best_mlp_model.pkl'.
