# <center> **CARD FRAUD DETECTION IN MACHINE LEARNING** </center>

<p align="center">
  <img src="https://dataaspirant.com/wp-content/uploads/2020/09/3-Credit-Card-Fraud-Detection.png" alt="Sublime's custom image"/>
</p>

## Abstract
...

## About the Dataset
The dataset contains approximately 6,362,620 transactions made through various forms of online payments. Among these, about 8,213 transactions are fraudulent, indicating that this is a highly imbalanced dataset.

This dataset includes the following fields:

-   **step**: represents a unit of time where 1 step equals 1 hour
-   **type**: type of online transaction
-   **amount**: the amount of the transaction
-   **nameOrig**: customer starting the transaction
-   **oldbalanceOrg**: balance before the transaction
-   **newbalanceOrig**: balance after the transaction
-   **nameDest**: recipient of the transaction
-   **oldbalanceDest**: initial balance of the recipient before the transaction
-   **newbalanceDest**: the new balance of the recipient after the transaction
-   **isFraud**: fraud transaction

Among these fields, we will only use the necessary data and exclude columns such as 'nameOrig', 'nameDest', and 'isFlaggedFraud' for our processing.


## Process
1. Importing Libraries & Loading Datasets.

2. Data Preprocessing & Preparing Datasets.

3. Exploratoty Data Analysis(EDA) & Visualization.

4. Handling Imbalanced Datasets.

5. Conclusions.

6. Further Enhancements.

7. Acknowledgement and References.

## Step 1: Importing Libraries & Loading Dataset

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [5]:
dataset_main = pd.read_csv("Dataset/PS_20174392719_1491204439457_log.csv")

dataset_main.head(10)

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0
5,1,PAYMENT,7817.71,C90045638,53860.0,46042.29,M573487274,0.0,0.0,0,0
6,1,PAYMENT,7107.77,C154988899,183195.0,176087.23,M408069119,0.0,0.0,0,0
7,1,PAYMENT,7861.64,C1912850431,176087.23,168225.59,M633326333,0.0,0.0,0,0
8,1,PAYMENT,4024.36,C1265012928,2671.0,0.0,M1176932104,0.0,0.0,0,0
9,1,DEBIT,5337.77,C712410124,41720.0,36382.23,C195600860,41898.0,40348.79,0,0


## Step 2: Data exploration and Data preparation

In [15]:
if dataset_main.isna().any().any():
    print("Missing Values in Dataframe!")
else:
    print(dataset_main.isna().sum())

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    0
isFraud           0
isFlaggedFraud    0
dtype: int64


In [16]:
dataset_main.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6362620 entries, 0 to 6362619
Data columns (total 11 columns):
 #   Column          Dtype  
---  ------          -----  
 0   step            int64  
 1   type            object 
 2   amount          float64
 3   nameOrig        object 
 4   oldbalanceOrg   float64
 5   newbalanceOrig  float64
 6   nameDest        object 
 7   oldbalanceDest  float64
 8   newbalanceDest  float64
 9   isFraud         int64  
 10  isFlaggedFraud  int64  
dtypes: float64(5), int64(3), object(3)
memory usage: 534.0+ MB


In [17]:
dataset_main.describe()

Unnamed: 0,step,amount,oldbalanceOrg,newbalanceOrig,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
count,6362620.0,6362620.0,6362620.0,6362620.0,6362620.0,6362620.0,6362620.0,6362620.0
mean,243.3972,179861.9,833883.1,855113.7,1100702.0,1224996.0,0.00129082,2.514687e-06
std,142.332,603858.2,2888243.0,2924049.0,3399180.0,3674129.0,0.0359048,0.001585775
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,156.0,13389.57,0.0,0.0,0.0,0.0,0.0,0.0
50%,239.0,74871.94,14208.0,0.0,132705.7,214661.4,0.0,0.0
75%,335.0,208721.5,107315.2,144258.4,943036.7,1111909.0,0.0,0.0
max,743.0,92445520.0,59585040.0,49585040.0,356015900.0,356179300.0,1.0,1.0


Dropping the 'nameOrig', 'nameDest', and 'isFlaggedFraud'

In [None]:
dataset_main.drop('nameOrig', 'nameDest', 'isFlaggedFraud', axis=1, inplace=True)