<a href="https://colab.research.google.com/github/aakashjain824/Online-Fraud-Transaction/blob/main/Online_Fraud_detetction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Online Payments Fraud Detection with Machine Learning

For this task, I collected a dataset from Kaggle, which contains historical information about fraudulent transactions which can be used to detect fraud in online payments. Below are all the columns from the [dataset](https://www.kaggle.com/datasets/ealaxi/paysim1?resource=download) I’m using here:

* step: represents a unit of time where 1 step equals 1 hour
* type: type of online transaction
* amount: the amount of the transaction
* nameOrig: customer starting the transaction
* oldbalanceOrg: balance before the transaction
* newbalanceOrig: balance after the transaction
* nameDest: recipient of the transaction
* oldbalanceDest: initial balance of recipient before the transaction
* newbalanceDest: the new balance of recipient after the transaction
* isFraud: fraud transaction

In [5]:
import pandas as pd
import numpy as np
data = pd.read_csv("credit card.csv")
print(data.head())

   step      type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1   PAYMENT   9839.64  C1231006815       170136.0       160296.36   
1     1   PAYMENT   1864.28  C1666544295        21249.0        19384.72   
2     1  TRANSFER    181.00  C1305486145          181.0            0.00   
3     1  CASH_OUT    181.00   C840083671          181.0            0.00   
4     1   PAYMENT  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0      0.0             0.0  
1  M2044282225             0.0             0.0      0.0             0.0  
2   C553264065             0.0             0.0      1.0             0.0  
3    C38997010         21182.0             0.0      1.0             0.0  
4  M1230701703             0.0             0.0      0.0             0.0  


In [6]:
# TO check for null Values
print(data.isna().sum())

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    1
isFraud           1
isFlaggedFraud    1
dtype: int64


In [7]:
data[data['isFlaggedFraud'].isnull()]

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
110812,11,CASH_OUT,118777.61,C1510313227,0.0,0.0,C601893033,1019293.0,,,


In [8]:
data.dropna(axis = 0, inplace = True)
data[data['isFlaggedFraud'].isnull()]

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud


In [9]:
data.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0.0,0.0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0.0,0.0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1.0,0.0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1.0,0.0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0.0,0.0


In [10]:
# Exploring transaction type
print(data.type.value_counts())

PAYMENT     43315
CASH_OUT    34553
CASH_IN     22385
TRANSFER     9472
DEBIT        1087
Name: type, dtype: int64


In [17]:
type = data.type.value_counts()
transactions = type.index
quantity = type.values

import plotly.express as px

figure = px.pie(data,
                names = transactions,
                values = quantity, hole = 0.5,
                title="Distribution of Transaction Type")

figure.show()


In [21]:
data.dtypes

step                int64
type               object
amount            float64
nameOrig           object
oldbalanceOrg     float64
newbalanceOrig    float64
nameDest           object
oldbalanceDest    float64
newbalanceDest    float64
isFraud           float64
isFlaggedFraud    float64
dtype: object

In [22]:
correlation = data.corr()
print(correlation['isFraud'].sort_values(ascending = False))

isFraud           1.000000
amount            0.037206
oldbalanceOrg    -0.003736
newbalanceDest   -0.006218
oldbalanceDest   -0.009178
newbalanceOrig   -0.010312
step             -0.050454
isFlaggedFraud         NaN
Name: isFraud, dtype: float64






In [33]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()

data['type'] = label_encoder.fit_transform(data['type'])
data['isFraud'] = data['isFraud'].map({0: 'No Fraud', 1 : 'Fraud'})

data['type'].unique()

array([3, 4, 1, 2, 0])

In [34]:
# splitting the data
from sklearn.model_selection import train_test_split
x = np.array(data[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data[["isFraud"]])

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)


In [35]:

# training a machine learning model
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

0.9989171629669734


In [36]:
# prediction
#features = [type, amount, oldbalanceOrg, newbalanceOrig]
features = np.array([[4, 9000.60, 9000.60, 0.0]])
print(model.predict(features))

['Fraud']


In [39]:
# training a machine learning model
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(xtrain, ytrain.ravel()) #.ravel will convert that array shape to (n, ) (i.e. flatten it)
print(model.score(xtest, ytest))

0.9991878722252301


In [None]:
# prediction
#features = [type, amount, oldbalanceOrg, newbalanceOrig]
features = np.array([[4, 9000.60, 9000.60, 0.0]])
print(model.predict(features))

In [40]:
# training a machine learning model
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(xtrain, ytrain.ravel()) #.ravel will convert that array shape to (n, ) (i.e. flatten it)
print(model.score(xtest, ytest))

0.9996390543223245


In [41]:
# prediction
#features = [type, amount, oldbalanceOrg, newbalanceOrig]
features = np.array([[4, 9000.60, 9000.60, 0.0]])
print(model.predict(features))

['Fraud']
