Credit-Card-Fraud-Detection-ML

Credit Card Fraud Detection using Logistic Regression

Name - Ayush Agarwal
Project - Credit Card Fraud Detection
Skills - Logistic regression , Support Vector Machine, K Nearest Neighbours, F1 Score, ROC-AUC Curve, Data Visualisation , Exploratory Data Analysis , Data Science application in Finance , Machine Learning
Tools - Google Colab , Jupyter Notebooks , Python , Numpy , Pandas , Matplotlib , Seaborn , Sklearn

Code :

The code is availaible at : https://github.com/ayush-agarwal-0502/Credit-Card-Fraud-Detection-ML/blob/main/credit_card_fraud_detection_project.ipynb (in this repository itself ) .

The Dataset :

The data was taken from Kaggle site : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud .

The Columns do not have physical significance directly visible since as per the source (Kaggle) , the data was compressed using Principle Component Analysis (PCA) in order to protect the privacy of the individuals while making a realistic secnario dataset availaible to public .

Data Preprocessing and Visualisation :

Correlations :

The columns do not seem to have correlations with each other , and seem to have great correlation with the Class and time variables , hence being a great indicator that simple models would be helpful here , and neural networks wont be needed hopefully .

Relation between target variables and columns :

A plot between different columns and amount along with different colours for target variable show that our output classes are separable by linear boundary even in case of graphing variables alone , hence LOGISTIC REGRESSION will help separate the multivariable data into 2 classes .

Class Imbalance in dataset :

This shows that we have way way less data for fraud cases than for non fraud cases , which is expected from the dataset .

To cure imbalance , we can use undersampling or oversampling . Here , I have decided to use SMOTE to counter the class imbalance in the dataset .

Training the model :

I have trained a Logistic Regression Model here . The model was showing a not converging warning , so I read its documentation and added the code to make it run for 150 iterations .

Results from part 1 :

The F1 score came 0.99 meaning the Classifier is working great . It managed to catch 91 out of 101 frauds , thus preventing frauds 90% of the time . The confusion matrix , precision , recall and F1 score has been displayed for your convenience . The confusion matrix readings and the F1 show the success of the project .

Results from part 2 :

I have also uploaded some raw code to this repository , here are the conclusions derived from it :

Frauds are time independent so we can drop time :

Lower Dimension Visualization is beautiful :

I also took advice from my seniors, decided to undersample the dataset since significance of the data would be more realistic if there was no synthetic dataset. I also decided to choose the ML model with most recall , reason being that I realized later that as a business, labelling a Non Fraud datapoint as fraudulent would be much more worse for the company, since nobody would like their card to decline and people would literally stop using that credit card, so we must focus more on achieving lower recall than only blindly improving F1 score . So I got Logistic regression as the winner again with the following results :

Other models weren't much far behind regarding performance too , but I decided to keep the final code clean and keep the trial and error part in the "raw_code" file .

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Credit_Card_Fraud_Detection_raw_code.ipynb		Credit_Card_Fraud_Detection_raw_code.ipynb
README.md		README.md
credit_card_fraud_detection_project.ipynb		credit_card_fraud_detection_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Credit_Card_Fraud_Detection_raw_code.ipynb

Credit_Card_Fraud_Detection_raw_code.ipynb

README.md

README.md

credit_card_fraud_detection_project.ipynb

credit_card_fraud_detection_project.ipynb

Repository files navigation

Credit-Card-Fraud-Detection-ML

Code :

The Dataset :

Data Preprocessing and Visualisation :

Correlations :

Relation between target variables and columns :

Class Imbalance in dataset :

Training the model :

Results from part 1 :

Results from part 2 :

About

Releases

Packages

Languages

ayush-agarwal-0502/Credit-Card-Fraud-Detection-ML

Folders and files

Latest commit

History

Repository files navigation

Credit-Card-Fraud-Detection-ML

Code :

The Dataset :

Data Preprocessing and Visualisation :

Correlations :

Relation between target variables and columns :

Class Imbalance in dataset :

Training the model :

Results from part 1 :

Results from part 2 :

About

Topics

Resources

Stars

Watchers

Forks

Languages