Dataset Link : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

# Project Title: Fraud Detection in Credit Card Transactions

## Introduction

Welcome to my project on fraud detection in credit card transactions! In this project, I aim to develop machine learning models to detect fraudulent transactions in credit card data.

## Project Information

### Developer:
- Name: Anuj Rastogi

### Date of Completion:
- 09-01-2024

### Data Source:
The dataset used in this project is obtained from Kaggle. It contains credit card transactions made by European cardholders in September 2013, where we aim to identify fraudulent transactions.

You can find the dataset here: [Credit Card Fraud Detection - Kaggle](https://www.kaggle.com/mlg-ulb/creditcardfraud)

## About the Dataset
The dataset contains transactions made by credit cards in September 2013 by European cardholders. It includes a total of 284,807 transactions, of which 492 are fraudulent. The features in the dataset are anonymized due to privacy concerns.

### Features:
- Time: Time elapsed between each transaction and the first transaction in seconds.
- V1-V28: Anonymized features resulting from PCA transformation for confidentiality.
- Amount: Transaction amount.
- Class: Indicates whether the transaction is fraudulent (1) or not (0).

## Methodology
1. Data Preprocessing: Cleaning the data, handling missing values, and scaling features.
2. Exploratory Data Analysis: Understanding the distribution of features and identifying patterns.
3. Model Development: Training machine learning models for fraud detection.
4. Model Evaluation: Assessing model performance using metrics such as accuracy, precision, recall, and F1-score.
5. Conclusion: Summarizing findings and discussing potential areas for improvement.

Let's dive into the project and start exploring the data!

## Credit Card Kaggle Anamoly Detection

### Context
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

### Content
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

### Inspiration
Identify fraudulent credit card transactions.

Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('/Users/anujrastogi/Library/CloudStorage/OneDrive-Personal/Desktop/Assignments & Projects/Credit-Card-Fraudlent-master/creditcard.csv')

In [3]:
pd.options.display.max_columns = None

### 1. Display Top 5 Rows of The Dataset

In [4]:
data.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


### 2. Check Last 5 Rows of The Dataset

In [5]:
data.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,4.35617,-1.593105,2.711941,-0.689256,4.626942,-0.924459,1.107641,1.991691,0.510632,-0.68292,1.475829,0.213454,0.111864,1.01448,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.05508,2.03503,-0.738589,0.868229,1.058415,0.02433,0.294869,0.5848,-0.975926,-0.150189,0.915802,1.214756,-0.675143,1.164931,-0.711757,-0.025693,-1.221179,-1.545556,0.059616,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.24964,-0.557828,2.630515,3.03126,-0.296827,0.708417,0.432454,-0.484782,0.411614,0.063119,-0.183699,-0.510602,1.329284,0.140716,0.313502,0.395652,-0.577252,0.001396,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.24044,0.530483,0.70251,0.689799,-0.377961,0.623708,-0.68618,0.679145,0.392087,-0.399126,-1.933849,-0.962886,-1.042082,0.449624,1.962563,-0.608577,0.509928,1.113981,2.897849,0.127434,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.0,0
284806,172792.0,-0.533413,-0.189733,0.703337,-0.506271,-0.012546,-0.649617,1.577006,-0.41465,0.48618,-0.915427,-1.040458,-0.031513,-0.188093,-0.084316,0.041333,-0.30262,-0.660377,0.16743,-0.256117,0.382948,0.261057,0.643078,0.376777,0.008797,-0.473649,-0.818267,-0.002415,0.013649,217.0,0


### 3. Find Shape of Our Dataset (Number of Rows And Number of Columns)

In [6]:
data.shape

(284807, 31)

In [7]:
print("Number of Rows",data.shape[0])
print("Number of Columns",data.shape[1])

Number of Rows 284807
Number of Columns 31


### 4. Get Information About Our Dataset Like Total Number Rows, Total Number of Columns, Datatypes of Each Column And Memory Requirement

In [8]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     28

### 5. Check Null Values In The Dataset

In [9]:
data.isnull().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

In [10]:
data.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [11]:
from sklearn.preprocessing import StandardScaler

In [12]:
sc = StandardScaler()
data['Amount']=sc.fit_transform(pd.DataFrame(data['Amount']))

In [13]:
data.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,0.244964,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,-0.342475,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,1.160686,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,0.140534,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,-0.073403,0


In [14]:
data = data.drop(['Time'],axis=1)

In [15]:
data.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,0.244964,0
1,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,-0.342475,0
2,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,1.160686,0
3,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,0.140534,0
4,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,-0.073403,0


In [16]:
data.shape

(284807, 30)

In [17]:
data.duplicated().any()

True

In [18]:
data = data.drop_duplicates()

In [19]:
data.shape

(275663, 30)

In [20]:
284807- 275663

9144

### 6. Not Handling Imbalanced

In [21]:
data['Class'].value_counts()

Class
0    275190
1       473
Name: count, dtype: int64

In [22]:
import seaborn as sns

In [23]:
# sns.countplot(data['Class'])

### 7. Store Feature Matrix In X And Response (Target) In Vector y

In [24]:
X = data.drop('Class',axis=1)
y = data['Class']

### 8. Splitting The Dataset Into The Training Set And Test Set

In [25]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20,
                                                 random_state=42)

### 9. Handling Imbalanced Dataset

In [26]:
# Undersampling
# Oversampling

### Undersampling

In [27]:
normal = data[data['Class']==0]
fraud = data[data['Class']==1]

In [28]:
normal.shape

(275190, 30)

In [29]:
fraud.shape

(473, 30)

In [30]:
normal_sample=normal.sample(n=473)

In [31]:
normal_sample.shape

(473, 30)

In [32]:
new_data = pd.concat([normal_sample,fraud],ignore_index=True)

In [33]:
new_data['Class'].value_counts()

Class
0    473
1    473
Name: count, dtype: int64

In [34]:
new_data.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,-2.543046,-2.902032,1.77039,-1.21731,1.709779,-1.853984,-1.385152,0.422041,1.359186,-1.403147,0.534442,0.939113,-0.543258,0.087318,0.050802,-0.017076,-0.74133,1.236471,0.285918,0.796617,0.568922,0.777709,0.491167,-0.002744,0.092049,-0.745734,0.058376,0.204041,0.142533,0
1,-0.998412,1.744514,-1.318196,0.692285,0.122599,0.698069,2.180317,-0.162797,-0.754212,0.012141,-1.596922,0.338533,1.205708,0.31679,-0.310708,-0.835929,-0.067566,-0.247182,1.033434,-0.270868,0.06379,0.593752,-0.269737,0.09642,0.17938,-0.404099,-0.02084,0.096011,0.640375,0
2,-0.202736,1.238621,-1.035523,-0.448453,0.168602,-0.298251,0.993855,-0.159102,0.5038,-0.816714,-0.5599,0.641493,1.292241,-1.527836,-0.013648,0.292515,0.289853,0.22351,-0.56031,-0.315595,0.049684,0.314358,0.127127,0.524997,-0.750275,-0.64153,-0.64373,-0.278064,0.061052,0
3,-1.113098,-0.577225,2.858831,0.187947,-0.607664,0.177849,-0.817502,0.257991,0.010794,-0.090632,-1.737516,0.82146,1.039663,-1.560143,-2.543463,-1.989125,0.283968,0.962656,-1.031802,-0.600516,-0.382599,-0.174207,-0.175412,0.41938,-0.112941,-0.472861,0.150488,0.159372,-0.253277,0
4,1.086121,0.258993,0.159523,2.88892,1.958164,4.679792,-1.146311,1.103353,0.557358,0.753077,0.170973,-2.850879,1.709516,1.248015,-0.76693,1.453719,-0.490767,0.761315,-1.434209,-0.024716,-0.082946,-0.146968,-0.080714,0.932757,0.490319,0.074531,0.006256,0.030038,-0.216655,0


In [35]:
X = new_data.drop('Class',axis=1)
y = new_data['Class']

In [36]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20,
                                                 random_state=42)

### 10. Logistic Regression

In [37]:
from sklearn.linear_model import LogisticRegression
log = LogisticRegression()
log.fit(X_train,y_train)

In [38]:
y_pred1 = log.predict(X_test)

In [39]:
from sklearn.metrics import accuracy_score

In [40]:
accuracy_score(y_test,y_pred1)

0.9315789473684211

In [41]:
accuracy_score(y_test,y_pred1)

0.9315789473684211

In [42]:
from sklearn.metrics import precision_score,recall_score,f1_score

In [43]:
precision_score(y_test,y_pred1)

0.9587628865979382

In [44]:
precision_score(y_test,y_pred1)

0.9587628865979382

In [45]:
recall_score(y_test,y_pred1)

0.9117647058823529

In [46]:
recall_score(y_test,y_pred1)

0.9117647058823529

In [47]:
f1_score(y_test,y_pred1)

0.9346733668341709

In [48]:
f1_score(y_test,y_pred1)

0.9346733668341709

### 11. Decision Tree Classifier

In [49]:
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
dt.fit(X_train,y_train)

In [50]:
y_pred2 = dt.predict(X_test)

In [51]:
accuracy_score(y_test,y_pred2)

0.9157894736842105

In [52]:
precision_score(y_test,y_pred2)

0.9056603773584906

In [53]:
recall_score(y_test,y_pred2)

0.9411764705882353

In [54]:
f1_score(y_test,y_pred2)

0.9230769230769231

### 12. Random Forest Classifier

In [55]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train,y_train)

In [56]:
y_pred3 = rf.predict(X_test)

In [57]:
accuracy_score(y_test,y_pred3)

0.9210526315789473

In [58]:
precision_score(y_test,y_pred3)

0.9484536082474226

In [59]:
recall_score(y_test,y_pred3)

0.9019607843137255

In [60]:
f1_score(y_test,y_pred3)

0.9246231155778895

In [61]:
final_data = pd.DataFrame({'Models':['LR','DT','RF'],
              "ACC":[accuracy_score(y_test,y_pred1)*100,
                     accuracy_score(y_test,y_pred2)*100,
                     accuracy_score(y_test,y_pred3)*100
                    ]})

In [62]:
final_data

Unnamed: 0,Models,ACC
0,LR,93.157895
1,DT,91.578947
2,RF,92.105263


In [66]:
# sns.barplot(final_data['Models'],final_data['ACC'])

### Oversampling

In [67]:
X = data.drop('Class',axis=1)
y = data['Class']

In [68]:
X.shape

(275663, 29)

In [69]:
y.shape

(275663,)

In [70]:
from imblearn.over_sampling import SMOTE

In [71]:
X_res,y_res = SMOTE().fit_resample(X,y)

In [72]:
y_res.value_counts()

Class
0    275190
1    275190
Name: count, dtype: int64

In [73]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X_res,y_res,test_size=0.20,
                                                 random_state=42)

### 10. Logistic Regression

In [74]:
log = LogisticRegression()
log.fit(X_train,y_train)

In [75]:
y_pred1 = log.predict(X_test)

In [76]:
accuracy_score(y_test,y_pred1)

0.9442839492714125

In [77]:
precision_score(y_test,y_pred1)

0.9729782044829856

In [78]:
recall_score(y_test,y_pred1)

0.9138774248677345

In [79]:
f1_score(y_test,y_pred1)

0.9425022265972905

### 11. Decision Tree Classifier

In [80]:
dt=DecisionTreeClassifier()
dt.fit(X_train,y_train)

In [81]:
y_pred2 = dt.predict(X_test)

In [82]:
accuracy_score(y_test,y_pred2)

0.9983102583669464

In [83]:
precision_score(y_test,y_pred2)

0.9975493310581444

In [84]:
recall_score(y_test,y_pred2)

0.999072777848481

In [85]:
f1_score(y_test,y_pred2)

0.9983104732491598

### 12. Random Forest Classifier

In [86]:
rf = RandomForestClassifier()
rf.fit(X_train,y_train)

In [87]:
y_pred3 = rf.predict(X_test)

In [88]:
accuracy_score(y_test,y_pred3)

0.9999364075729495

In [89]:
precision_score(y_test,y_pred3)

0.9998727504090166

In [90]:
recall_score(y_test,y_pred3)

1.0

In [91]:
f1_score(y_test,y_pred3)

0.9999363711561361

In [92]:
final_data = pd.DataFrame({'Models':['LR','DT','RF'],
              "ACC":[accuracy_score(y_test,y_pred1)*100,
                     accuracy_score(y_test,y_pred2)*100,
                     accuracy_score(y_test,y_pred3)*100
                    ]})

In [93]:
final_data

Unnamed: 0,Models,ACC
0,LR,94.428395
1,DT,99.831026
2,RF,99.993641


In [94]:
# sns.barplot(final_data['Models'],final_data['ACC'])

# Save The Model

In [95]:
rf1 = RandomForestClassifier()
rf1.fit(X_res,y_res)

In [96]:
import joblib

In [97]:
joblib.dump(rf1,"credit_card_model")

['credit_card_model']

In [98]:
model = joblib.load("credit_card_model")

In [99]:
pred = model.predict([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]])



In [100]:
if pred == 0:
    print("Normal Transcation")
else:
    print("Fraudulent Transcation")

Normal Transcation
