## Problem Statment (Objective, Overview of Dataset, Solution & Conclusion)

- The primary objective is to detect fraudulent transactions within the dataset accurately. This involves identifying transactions that are not legitimate and distinguishing them from legitimate ones.

- The dataset contains transaction data with 786,363 records and 29 features. It includes details such as account and customer identifiers, transaction amounts, merchant information, and indicators of fraudulent     transactions. The "isFraud" column indicates whether a transaction is fraudulent, with approximately 1.58% of transactions labeled as fraud. Missing data is observed in several columns, including "acqCountry," "posEntryMode" and "transactionType." Additionally, a subset of 12,417 records pertains to fraudulent transactions.
  
- Most businesses use rule-based systems and supervised learning models for fraud detection. These models require substantial labeled data, which can be expensive and timeconsuming to collect.

- This project proposes a semi-supervised approach, utilizing a combination of labeled fraud data and a larger pool of unlabeled data. By leveraging the unlabeled data, the model can learn broader fraud patterns and adapt to new ones efficiently.

- By analyzing transaction features and patterns, the model can help financial institutions take proactive measures to prevent fraud and minimize losses.

> Python Version

In [176]:
import sys
sys.version

'3.10.9 | packaged by Anaconda, Inc. | (main, Mar  1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)]'

## Importing Library

In [177]:
import pandas as pd #Python data Analysis Lib
import numpy as np #Numerical Python Lib
import scipy  # Scientific & Technical Python Lib
import sklearn #Supervisied Machine Learning Python Lib
import matplotlib.pyplot as plt #Visulization Python Lib
import seaborn as sns  #Visulization Python Lib
import re #Regular Expression Lib
import matplotlib.ticker as tck
import random
import statsmodels.api as sma
import plotly.express as px

In [178]:
from scipy import stats
from statsmodels.stats.outliers_influence import variance_inflation_factor
from wordcloud import WordCloud
from IPython.display import Image
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report,log_loss,roc_curve, roc_auc_score,cohen_kappa_score,f1_score,recall_score,precision_score
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from statsmodels.stats.outliers_influence import variance_inflation_factor


In [179]:
## Prefixing the Display Asthetics
pd.set_option('display.max_columns', None)

import warnings
warnings.filterwarnings('ignore')

plt.rcParams['figure.figsize'] = [15, 10]

In [180]:
perf_score = pd.DataFrame(columns=["Model", "Accuracy", "Recall", "Precision", "F1 Score", "TN", "FN", "FP", "TP"])

def per_measures(model, test, pred):
    accuracy = accuracy_score(test, pred)
    f1score = f1_score(test, pred, average='weighted')  # Set average to 'weighted'
    recall = recall_score(test, pred, average='weighted')  # Set average to 'weighted'
    precision = precision_score(test, pred, average='weighted')  # Set average to 'weighted'
    cm = confusion_matrix(test, pred)

    # Extract values from the confusion matrix for each class
    tn = cm[0, 0]
    fn = cm[1, 0]
    fp = cm[0, 1]
    tp = cm[1, 1]

    return accuracy, recall, precision, f1score, tn, fn, fp, tp

def update_performance(name, model, test, pred):
    global perf_score
    accuracy, recall, precision, f1score, tn, fn, fp, tp = per_measures(model, test, pred)
    perf_score = perf_score.append({'Model': name,
                                    'Accuracy': accuracy,
                                    'Recall': recall,
                                    'Precision': precision,
                                    'F1 Score': f1score,
                                    'TN': tn,
                                    'FN': fn,
                                    'FP': fp,
                                    'TP': tp},
                                   ignore_index=True)

## Loading the Dataset

In [181]:
# from google.colab import drive
# drive.mount('/content/drive')

In [182]:
trxn_stratified_rs3 = pd.read_csv('trxn_stratified_rs3.csv')

## **Model building**

> Base Model

## Proof Reading if My Sample Target Variable have the Similar Porportion to Target Variable of Sample Population

In [183]:
trxn_stratified_rs3['isFraud'].value_counts(normalize=True)*100

0    98.100902
1     1.899098
Name: isFraud, dtype: float64

> yes it Is Similar to Sample Poppulation

In [184]:
trxn_stratified_rs3.shape

(97520, 17)

## Encoding Sample of Random State 3 Before SMOTE

In [185]:
LE = LabelEncoder()

In [186]:
trxn_stratified_rs3.head()

Unnamed: 0,creditLimit,availableMoney,transactionAmount,acqCountry,merchantCountryCode,posEntryMode,posConditionCode,merchantCategoryCode,transactionType,currentBalance,cardPresent,expirationDateKeyInMatch,isFraud,transcation_month,transcation_date,cvv_match,Days_after_Last Address Change
0,7500,6400.51,103.27,3,3,2,8,14,1,1099.49,0,0,0,7,7,1,450
1,2500,1701.28,23.17,3,3,5,8,14,1,798.72,0,0,0,2,9,1,21
2,20000,3282.42,66.87,3,3,9,1,14,1,16717.58,0,0,0,6,30,1,654
3,7500,7500.0,144.96,3,3,5,1,14,1,0.0,0,0,0,4,2,1,17
4,20000,14398.77,611.51,3,3,9,1,14,1,5601.23,0,0,0,3,24,1,620


In [187]:
to_encode = ['acqCountry','merchantCategoryCode','merchantCountryCode','transactionType','cardPresent','expirationDateKeyInMatch','isFraud','cvv_match']

In [188]:
for i in to_encode:
    trxn_stratified_rs3[i] = LE.fit_transform(trxn_stratified_rs3[i])

In [189]:
trxn_stratified_rs3[['acqCountry','merchantCategoryCode','merchantCountryCode','transactionType','cardPresent','expirationDateKeyInMatch','isFraud','cvv_match']].head()

Unnamed: 0,acqCountry,merchantCategoryCode,merchantCountryCode,transactionType,cardPresent,expirationDateKeyInMatch,isFraud,cvv_match
0,3,14,3,1,0,0,0,1
1,3,14,3,1,0,0,0,1
2,3,14,3,1,0,0,0,1
3,3,14,3,1,0,0,0,1
4,3,14,3,1,0,0,0,1


In [190]:
for i in to_encode:
    trxn_stratified_rs3[i] = LE.fit_transform(trxn_stratified_rs3[i])

In [191]:
trxn_stratified_rs3[['acqCountry','merchantCategoryCode','merchantCountryCode','transactionType','cardPresent','expirationDateKeyInMatch','isFraud','cvv_match']].head()

Unnamed: 0,acqCountry,merchantCategoryCode,merchantCountryCode,transactionType,cardPresent,expirationDateKeyInMatch,isFraud,cvv_match
0,3,14,3,1,0,0,0,1
1,3,14,3,1,0,0,0,1
2,3,14,3,1,0,0,0,1
3,3,14,3,1,0,0,0,1
4,3,14,3,1,0,0,0,1


> Proof Reading if all my Variables are encoded

In [192]:
trxn_stratified_rs3.head()

Unnamed: 0,creditLimit,availableMoney,transactionAmount,acqCountry,merchantCountryCode,posEntryMode,posConditionCode,merchantCategoryCode,transactionType,currentBalance,cardPresent,expirationDateKeyInMatch,isFraud,transcation_month,transcation_date,cvv_match,Days_after_Last Address Change
0,7500,6400.51,103.27,3,3,2,8,14,1,1099.49,0,0,0,7,7,1,450
1,2500,1701.28,23.17,3,3,5,8,14,1,798.72,0,0,0,2,9,1,21
2,20000,3282.42,66.87,3,3,9,1,14,1,16717.58,0,0,0,6,30,1,654
3,7500,7500.0,144.96,3,3,5,1,14,1,0.0,0,0,0,4,2,1,17
4,20000,14398.77,611.51,3,3,9,1,14,1,5601.23,0,0,0,3,24,1,620


> Since we Converted our Pre Encoded Independent Varaible to Category and Object for our Analysis purposes now we change them back to Numerical Format for Model Building

> Also Based on our Analysis and Logical understanding There is no Relationship with **Feature Engineered Varibale(Related to Date,Month,Year)** & **Target Varaible**  we will also remove them from our **Stratified Sample**

> Change the Variable to int

In [193]:
to_change = ['posEntryMode', 'posConditionCode', 'transcation_month', 'transcation_date']

for i in to_change:
  trxn_stratified_rs3[i] = trxn_stratified_rs3[i].astype('int')

In [194]:
trxn_stratified_rs3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97520 entries, 0 to 97519
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   creditLimit                     97520 non-null  int64  
 1   availableMoney                  97520 non-null  float64
 2   transactionAmount               97520 non-null  float64
 3   acqCountry                      97520 non-null  int64  
 4   merchantCountryCode             97520 non-null  int64  
 5   posEntryMode                    97520 non-null  int32  
 6   posConditionCode                97520 non-null  int32  
 7   merchantCategoryCode            97520 non-null  int64  
 8   transactionType                 97520 non-null  int64  
 9   currentBalance                  97520 non-null  float64
 10  cardPresent                     97520 non-null  int64  
 11  expirationDateKeyInMatch        97520 non-null  int64  
 12  isFraud                         

> Variable for Storing the All Model Performance Here on  

## Applying SMOTE only for Training Data for Random State 3

In [195]:
# Instantiate SMOTE
smote = SMOTE(random_state=3)

In [196]:
x3 = trxn_stratified_rs3.drop(columns=['isFraud'],axis=1)
y3 = trxn_stratified_rs3['isFraud']

> Spliting The Data

In [197]:
xtrain3,xtest3,ytrain3,ytest3 = train_test_split(x3,y3, test_size=0.2, random_state=3)

In [198]:
#Shape Of the Split Before Applying SMOTE
print(xtrain3.shape,xtest3.shape,ytrain3.shape,ytest3.shape)

(78016, 16) (19504, 16) (78016,) (19504,)


In [199]:
xtrain3,ytrain3 = smote.fit_resample(xtrain3,ytrain3)

In [200]:
#Shape Of the Split After Applying SMOTE
print(xtrain3.shape,xtest3.shape,ytrain3.shape,ytest3.shape)

(153048, 16) (19504, 16) (153048,) (19504,)


In [201]:
dtype_to_change = ['posEntryMode', 'posConditionCode', 'transcation_month', 'transcation_date','acqCountry','merchantCategoryCode','merchantCountryCode','transactionType','cardPresent','expirationDateKeyInMatch','cvv_match' ]

for i in dtype_to_change:
    xtrain3[i] = xtrain3[i].astype('category')
    xtest3[i] = xtest3[i].astype('category')

In [202]:
xtrain3.head()

Unnamed: 0,creditLimit,availableMoney,transactionAmount,acqCountry,merchantCountryCode,posEntryMode,posConditionCode,merchantCategoryCode,transactionType,currentBalance,cardPresent,expirationDateKeyInMatch,transcation_month,transcation_date,cvv_match,Days_after_Last Address Change
0,500,54.53,367.03,3,3,9,1,14,1,445.47,0,0,2,14,1,813
1,15000,3280.01,122.95,3,3,5,1,3,1,11719.99,1,0,4,28,1,44
2,5000,4405.83,13.83,3,3,90,1,14,1,594.17,0,0,12,11,1,1805
3,250,250.0,0.0,3,3,5,1,17,0,0.0,0,0,2,27,1,34
4,7500,5314.0,0.0,3,3,2,8,4,0,2186.0,0,0,3,21,1,3


In [203]:
xtest3.head()

Unnamed: 0,creditLimit,availableMoney,transactionAmount,acqCountry,merchantCountryCode,posEntryMode,posConditionCode,merchantCategoryCode,transactionType,currentBalance,cardPresent,expirationDateKeyInMatch,transcation_month,transcation_date,cvv_match,Days_after_Last Address Change
35732,5000,759.82,5.48,3,3,5,1,14,1,4240.18,0,0,11,17,1,275
73534,5000,831.58,384.35,3,3,2,8,3,1,4168.42,1,0,6,25,1,247
87688,5000,2155.17,0.0,3,3,9,8,13,0,2844.83,0,0,10,19,1,436
27625,5000,1591.01,24.7,3,3,9,1,14,1,3408.99,0,0,4,22,1,-104
59295,2500,1059.89,135.77,3,3,5,1,4,1,1440.11,1,0,6,15,1,-21


In [204]:
xtrain3.info(),xtest3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 153048 entries, 0 to 153047
Data columns (total 16 columns):
 #   Column                          Non-Null Count   Dtype   
---  ------                          --------------   -----   
 0   creditLimit                     153048 non-null  int64   
 1   availableMoney                  153048 non-null  float64 
 2   transactionAmount               153048 non-null  float64 
 3   acqCountry                      153048 non-null  category
 4   merchantCountryCode             153048 non-null  category
 5   posEntryMode                    153048 non-null  category
 6   posConditionCode                153048 non-null  category
 7   merchantCategoryCode            153048 non-null  category
 8   transactionType                 153048 non-null  category
 9   currentBalance                  153048 non-null  float64 
 10  cardPresent                     153048 non-null  category
 11  expirationDateKeyInMatch        153048 non-null  category
 12  tr

(None, None)

## Base Model Logistic Regression with Random State 3

In [205]:
LR3 = LogisticRegression()

In [206]:
model_rs3_1 = LR3.fit(xtrain3,ytrain3)

In [207]:
ypred_train_rs3_1 = model_rs3_1.predict(xtrain3)
ypred_test_rs3_1 = model_rs3_1.predict(xtest3)

In [208]:
update_performance(name='Logistic Regression RS 3 Train',model=LR3,test=ytrain3,pred=ypred_train_rs3_1)

perf_score

Unnamed: 0,Model,Accuracy,Recall,Precision,F1 Score,TN,FN,FP,TP
0,Logistic Regression RS 3 Train,0.61531,0.61531,0.618081,0.61304,52947,35299,23577,41225


In [209]:
update_performance(name='Logistic Regression RS 3 Test' ,model=LR3,test=ytest3,pred=ypred_test_rs3_1)

perf_score

Unnamed: 0,Model,Accuracy,Recall,Precision,F1 Score,TN,FN,FP,TP
0,Logistic Regression RS 3 Train,0.61531,0.61531,0.618081,0.61304,52947,35299,23577,41225
1,Logistic Regression RS 3 Test,0.681963,0.681963,0.970216,0.794779,13102,161,6042,199


## Models Build
* DT MODEL, DT WITH SOME PARAMETER
* RF MODEL, RF MODEL WITH SOME PRAMETER
* WITH AND WITHOUT SMOTE

### With Smote Data (x3,y3)

In [210]:
ytest3.shape

(19504,)

## Decision Tree

In [211]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

In [212]:
decision_tree_classification = DecisionTreeClassifier(random_state = 10)

decision_tree = decision_tree_classification.fit(xtrain3, ytrain3)

In [213]:
train_pred_dt = decision_tree_classification.predict(xtrain3)
print(classification_report(ytrain3, train_pred_dt))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     76524
           1       1.00      1.00      1.00     76524

    accuracy                           1.00    153048
   macro avg       1.00      1.00      1.00    153048
weighted avg       1.00      1.00      1.00    153048



In [214]:
test_pred_dt = decision_tree_classification.predict(xtest3)
print(classification_report(ytest3, test_pred_dt))

              precision    recall  f1-score   support

           0       0.98      0.94      0.96     19144
           1       0.03      0.10      0.05       360

    accuracy                           0.92     19504
   macro avg       0.51      0.52      0.50     19504
weighted avg       0.96      0.92      0.94     19504



In [215]:
update_performance(name='DT  Classifier Train' ,model=decision_tree_classification,test=ytrain3,pred=train_pred_dt)
update_performance(name='DT  Classifier Test' ,model=decision_tree_classification,test=ytest3,pred=test_pred_dt)

In [216]:
# Clearly Overfitting

#### Grid Search CV for Decision Tree

In [217]:
# tuned_paramaters = [{'criterion': ['entropy', 'gini'], 
#                      'max_depth': [30,35,40],
#                      'min_samples_split': range(50,100,10),
#                      'min_samples_leaf': range(10,50,10),
#                     }]
 
# decision_tree_classification = DecisionTreeClassifier(random_state = 10)

# tree_grid = GridSearchCV(estimator = decision_tree_classification, 
#                          param_grid = tuned_paramaters, 
#                          cv = 5)

# tree_grid_model = tree_grid.fit(xtrain3, ytrain3)
# print('Best parameters for decision tree classifier: ', tree_grid_model.best_params_, '\n')

Best parameters for decision tree classifier: {'criterion': 'entropy', 'max_depth': 35, 'min_samples_leaf': 10, 'min_samples_split': 50}

##### Model with best Parameter

In [218]:
decision_tree_classification = DecisionTreeClassifier(criterion = 'entropy', random_state = 10,max_depth= 35,min_samples_split = 50,min_samples_leaf = 10)

decision_tree = decision_tree_classification.fit(xtrain3, ytrain3)

train_pred_dt = decision_tree_classification.predict(xtrain3)
print(classification_report(ytrain3, train_pred_dt))

              precision    recall  f1-score   support

           0       0.95      0.95      0.95     76524
           1       0.95      0.95      0.95     76524

    accuracy                           0.95    153048
   macro avg       0.95      0.95      0.95    153048
weighted avg       0.95      0.95      0.95    153048



In [219]:
test_pred_dt = decision_tree_classification.predict(xtest3)
print(classification_report(ytest3, test_pred_dt))

              precision    recall  f1-score   support

           0       0.98      0.93      0.95     19144
           1       0.03      0.12      0.05       360

    accuracy                           0.91     19504
   macro avg       0.51      0.53      0.50     19504
weighted avg       0.96      0.91      0.94     19504



In [220]:
# updating score
update_performance(name='DT Classifier Train Best Param' ,model=decision_tree_classification,test=ytrain3,pred=train_pred_dt)
update_performance(name='DT Classifier Test Best Param' ,model=decision_tree_classification,test=ytest3,pred=test_pred_dt)

## Random Forest Model

In [221]:
from sklearn.ensemble import RandomForestClassifier

In [222]:
randomforest_classification = RandomForestClassifier()

decision_tree = randomforest_classification.fit(xtrain3, ytrain3)

train_pred_rf = randomforest_classification.predict(xtrain3)
print(classification_report(ytrain3, train_pred_rf))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     76524
           1       1.00      1.00      1.00     76524

    accuracy                           1.00    153048
   macro avg       1.00      1.00      1.00    153048
weighted avg       1.00      1.00      1.00    153048



In [223]:
test_pred_rf = randomforest_classification.predict(xtest3)
print(classification_report(ytest3, test_pred_rf))

              precision    recall  f1-score   support

           0       0.98      0.98      0.98     19144
           1       0.05      0.04      0.05       360

    accuracy                           0.97     19504
   macro avg       0.51      0.51      0.51     19504
weighted avg       0.96      0.97      0.97     19504



In [224]:
# updating score
update_performance(name='RF Classifier Train' ,model=randomforest_classification,test=ytrain3,pred=train_pred_dt)
update_performance(name='RF Classifier Test' ,model=randomforest_classification,test=ytest3,pred=test_pred_dt)

## Data without Smoting (x4,y4)

In [225]:
x4 = trxn_stratified_rs3.drop(columns=['isFraud'],axis=1)
y4 = trxn_stratified_rs3['isFraud']
xtrain4,xtest4,ytrain4,ytest4 = train_test_split(x4,y4, test_size=0.2, random_state=3)

### Decision Tree Model

In [226]:
decision_tree_classification = DecisionTreeClassifier(random_state = 10)
decision_tree = decision_tree_classification.fit(xtrain4, ytrain4)

In [227]:
train_pred_dt = decision_tree_classification.predict(xtrain4)
print(classification_report(ytrain4, train_pred_dt))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     76524
           1       1.00      1.00      1.00      1492

    accuracy                           1.00     78016
   macro avg       1.00      1.00      1.00     78016
weighted avg       1.00      1.00      1.00     78016



In [228]:
test_pred_dt = decision_tree_classification.predict(xtest4)
print(classification_report(ytest4, test_pred_dt))

              precision    recall  f1-score   support

           0       0.98      0.98      0.98     19144
           1       0.03      0.04      0.03       360

    accuracy                           0.96     19504
   macro avg       0.51      0.51      0.51     19504
weighted avg       0.96      0.96      0.96     19504



In [229]:
update_performance(name='DT  Classifier Train (Withoit SMOT)' ,model=decision_tree_classification,test=ytrain4,pred=train_pred_dt)
update_performance(name='DT  Classifier Test (Withoit SMOT)' ,model=decision_tree_classification,test=ytest4,pred=test_pred_dt)

### Grid Search CV for Decision Tree Without Smote

In [230]:
# tuned_paramaters = [{'criterion': ['entropy', 'gini'], 
#                      'max_depth': [30,35,40],
#                      'min_samples_split': range(50,100,10),
#                      'min_samples_leaf': range(10,50,10),
#                     }]
 
# decision_tree_classification = DecisionTreeClassifier(random_state = 10)

# tree_grid = GridSearchCV(estimator = decision_tree_classification, 
#                          param_grid = tuned_paramaters, 
#                          cv = 5)

# tree_grid_model = tree_grid.fit(xtrain3, ytrain3)
# print('Best parameters for decision tree classifier: ', tree_grid_model.best_params_, '\n')

Best parameters for decision tree classifier:  {'criterion': 'entropy', 'max_depth': 30, 'min_samples_leaf': 10, 'min_samples_split': 70} 

##### Model with best Parameter  (without Smote)

In [231]:
decision_tree_classification = DecisionTreeClassifier(criterion = 'entropy', random_state = 10,max_depth= 30,min_samples_split = 70,min_samples_leaf = 10)

decision_tree = decision_tree_classification.fit(xtrain4, ytrain4)

train_pred_dt = decision_tree_classification.predict(xtrain4)
print(classification_report(ytrain4, train_pred_dt))

              precision    recall  f1-score   support

           0       0.98      1.00      0.99     76524
           1       0.00      0.00      0.00      1492

    accuracy                           0.98     78016
   macro avg       0.49      0.50      0.50     78016
weighted avg       0.96      0.98      0.97     78016



In [232]:
test_pred_dt = decision_tree_classification.predict(xtest4)
print(classification_report(ytest4, test_pred_dt))

              precision    recall  f1-score   support

           0       0.98      1.00      0.99     19144
           1       0.00      0.00      0.00       360

    accuracy                           0.98     19504
   macro avg       0.49      0.50      0.50     19504
weighted avg       0.96      0.98      0.97     19504



In [233]:
update_performance(name='DT  Classifier Train Best Param (Withoit SMOT)' ,model=decision_tree_classification,test=ytrain4,pred=train_pred_dt)
update_performance(name='DT  Classifier Test Best Param (Withoit SMOT)' ,model=decision_tree_classification,test=ytest4,pred=test_pred_dt)

## Random Forest Model

In [234]:
randomforest_classification = RandomForestClassifier()

decision_tree = randomforest_classification.fit(xtrain4, ytrain4)

train_pred_rf = randomforest_classification.predict(xtrain4)
print(classification_report(ytrain4, train_pred_rf))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     76524
           1       1.00      1.00      1.00      1492

    accuracy                           1.00     78016
   macro avg       1.00      1.00      1.00     78016
weighted avg       1.00      1.00      1.00     78016



In [235]:
test_pred_rf = randomforest_classification.predict(xtest4)
print(classification_report(ytest4, test_pred_rf))

              precision    recall  f1-score   support

           0       0.98      1.00      0.99     19144
           1       0.00      0.00      0.00       360

    accuracy                           0.98     19504
   macro avg       0.49      0.50      0.50     19504
weighted avg       0.96      0.98      0.97     19504



In [236]:
# updating score
update_performance(name='RF  Classifier Train (Withoit SMOT)' ,model=randomforest_classification,test=ytrain4,pred=train_pred_dt)
update_performance(name='RF  Classifier Test (Withoit SMOT)' ,model=randomforest_classification,test=ytest4,pred=test_pred_dt)

In [240]:
perf_score

Unnamed: 0,Model,Accuracy,Recall,Precision,F1 Score,TN,FN,FP,TP
0,Logistic Regression RS 3 Train,0.61531,0.61531,0.618081,0.61304,52947,35299,23577,41225
1,Logistic Regression RS 3 Test,0.681963,0.681963,0.970216,0.794779,13102,161,6042,199
2,DT Classifier Train,1.0,1.0,1.0,1.0,76524,0,0,76524
3,DT Classifier Test,0.924682,0.924682,0.964749,0.94392,17999,324,1145,36
4,DT Classifier Train Best Param,0.947663,0.947663,0.947666,0.947663,72614,4100,3910,72424
5,DT Classifier Test Best Param,0.911916,0.911916,0.964992,0.937131,17741,315,1403,45
6,RF Classifier Train,0.947663,0.947663,0.947666,0.947663,72614,4100,3910,72424
7,RF Classifier Test,0.911916,0.911916,0.964992,0.937131,17741,315,1403,45
8,DT Classifier Train (Withoit SMOT),1.0,1.0,1.0,1.0,76524,0,0,1492
9,DT Classifier Test (Withoit SMOT),0.960316,0.960316,0.964311,0.962302,18716,346,428,14
