<a href="https://colab.research.google.com/github/AnnaVarma7/Online-Payments-Fraud-Detection-with-Machine-Learning/blob/main/Online_Payments_Fraud_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Online Payments Fraud Detection**

Online payment frauds can happen with anyone using any payment system, especially while making payments using a credit card. That is why detecting online payment fraud is very important for credit card companies to ensure that the customers are not getting charged for the products and services they never paid. To identify online payment fraud with machine learning, we need to train a machine learning model for classifying fraudulent and non-fraudulent payments. For this, we need a dataset containing information about online payment fraud, so that we can understand what type of transactions lead to fraud.

**step**: represents a unit of time where 1 step equals 1 hour

**type**: type of online transaction

**amount**: the amount of the transaction

**nameOrig**: customer starting the transaction

**oldbalanceOrg**: balance before the transaction

**newbalanceOrig**: balance after the transaction

**nameDest**: recipient of the transaction

**oldbalanceDest**: initial balance of recipient before the transaction

**newbalanceDest**: the new balance of recipient after the transaction

**isFraud**: fraud transaction

In [1]:
import pandas as pd
import numpy as np
data=pd.read_csv("/content/drive/MyDrive/Projects/Online Payments Fraud Detection/PS_20174392719_1491204439457_log.csv")
data

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.00,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.00,0,0
2,1,TRANSFER,181.00,C1305486145,181.0,0.00,C553264065,0.0,0.00,1,0
3,1,CASH_OUT,181.00,C840083671,181.0,0.00,C38997010,21182.0,0.00,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.00,0,0
...,...,...,...,...,...,...,...,...,...,...,...
15199,8,PAYMENT,2308.49,C1094400145,132118.0,129809.51,M904572440,0.0,0.00,0,0
15200,8,CASH_IN,149098.18,C1863749488,30730.0,179828.18,C1695946783,0.0,1014751.42,0,0
15201,8,TRANSFER,297838.67,C1185653283,175.0,0.00,C1414763289,13216.0,434053.18,0,0
15202,8,PAYMENT,9896.65,C1590486487,50.0,0.00,M1052117624,0.0,0.00,0,0


In [2]:
data.isnull().sum()

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    0
isFraud           0
isFlaggedFraud    0
dtype: int64

In [3]:
#Exploring the transaction types
data.type.value_counts()

PAYMENT     8388
CASH_IN     2537
CASH_OUT    2329
TRANSFER    1531
DEBIT        419
Name: type, dtype: int64

In [6]:
type=data["type"].value_counts()
transactions=type.index
quantity=type.values

import plotly.express as px
figure=px.pie(data, values=quantity, names=transactions, hole=0.5, title="Distribution of Transaction types")
figure.show()

In [7]:
#Checking correlation
correlation=data.corr()
correlation["isFraud"].sort_values(ascending=False)





isFraud           1.000000
amount            0.126599
oldbalanceOrg    -0.004007
newbalanceDest   -0.009809
oldbalanceDest   -0.017412
newbalanceOrig   -0.026227
step             -0.027427
isFlaggedFraud         NaN
Name: isFraud, dtype: float64

Now let’s transform the categorical features into numerical. Here I will also transform the values of the isFraud column into No Fraud and Fraud labels to have a better understanding of the output:

In [8]:
data["type"]=data["type"].map({"CASH_OUT":1, "PAYMENT":2, "CASH_IN":3, "TRANSFER":4, "DEBIT":5})
data["isFraud"]=data["isFraud"].map({0:"No Fraud", 1: "Fraud"})
data.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,2,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,No Fraud,0
1,1,2,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,No Fraud,0
2,1,4,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,Fraud,0
3,1,1,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,Fraud,0
4,1,2,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,No Fraud,0


Now let’s train a classification model to classify fraud and non-fraud transactions. Before training the model, I will split the data into training and test sets:

In [11]:
#Splitting the data
from sklearn.model_selection import train_test_split
x=np.array(data[["type","amount","oldbalanceOrg","newbalanceOrig"]])
y=np.array(data[["isFraud"]])

In [14]:
#Training the machine learning model

from sklearn.tree import DecisionTreeClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

0.9986850756081526


In [15]:
# prediction
#feature=[type, amount, oldbalanceOrg, newbalanceOrig]

features=np.array([[4, 9000.60, 9000.60, 0.0]])
model.predict(features)

array(['No Fraud'], dtype=object)

**Summary**

So this is how we can detect online payments fraud with machine learning using Python. Detecting online payment frauds is one of the applications of data science in finance.