Skip to content

charumakhijani/credit-card-fraud-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Credit-Card-Fraud-Detection

Every year, millions of people fall victim to fraud that costs the global economy billions of dollars. If you're a victim, it can wreak havoc on your personal finances. Luckily, due to some modern fraud detection techniques many financial institutions have measures in place to help protect you from credit fraud.

Dataset is from below URL
https://www.kaggle.com/mlg-ulb/creditcardfraud

Fraud Detection

Fraud Detection is a technique used to identify unusual patterns that are different from the rest of the population and not behaving as expected. These unusual patterns are also called as outliers.

The fraud detection involves in-depth data analysis/data-mining to recognize the unusual patterns. In this dataset, most of the data analysis part is already done and most of the features are scaled. The names of the features are not shown due to privacy reasons.

Hence our main focus will be to balance the data and perform predective analysis.

Problem Statement

The Credit Card Fraud Detection dataset contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

Goals

Goal here is to identify as much fraudulent credit card transactions as possible. And as mentioned in the dataset insperation, I will calculate the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

Table of Contents

  1. Import Libraries
  2. Read Data
  3. Understand the data
  4. Exploratory Data Analysis
  5. Label Data
  6. Cluster data using Dimensionality reduction
  7. Split into train and test sets
  8. Scaling
  9. Predictive Analysis on unbalanced data
  10. Validate Unbalanced Data
  11. Balance Data using oversampling method
  12. Predictive Analysis on Balanced Data
  13. Validate Balanced Data
  14. Feature Importance
  15. Conclusion