<a href="https://colab.research.google.com/github/ScriptSherpa/demoprojects/blob/main/Online_Payments_Fraud_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NIKHIL MALVI




# **Introduction**
This project focused on analyzing financial transactions to detect fraudulent activities. Using a publicly available dataset of credit card transactions, we aimed to build a machine learning model capable of identifying potentially fraudulent transactions based on transaction features like type, amount, and account balances.

In [1]:

import pandas as pd
import numpy as np
data = pd.read_csv("/content/archive (3).zip")
data.head()


Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


In [2]:
data.isnull().sum()

Unnamed: 0,0
step,0
type,0
amount,0
nameOrig,0
oldbalanceOrg,0
newbalanceOrig,0
nameDest,0
oldbalanceDest,0
newbalanceDest,0
isFraud,0


In [3]:
# Exploring transaction type
data.type.value_counts()

Unnamed: 0_level_0,count
type,Unnamed: 1_level_1
CASH_OUT,2237500
PAYMENT,2151495
CASH_IN,1399284
TRANSFER,532909
DEBIT,41432


The project involved a series of steps, including:

Data Exploration: We initiated the project by loading and exploring the dataset using pandas. We analyzed the distribution of transaction types using visualizations generated with plotly.express, focusing on understanding the frequencies of different transaction categories like 'PAYMENT', 'TRANSFER', 'CASH_OUT', etc.

Correlation Analysis: We assessed the correlation between various features and the target variable, 'isFraud', which indicates whether a transaction is fraudulent. This step aimed to identify the features that have the strongest relationship with fraud.

Data Preprocessing: We transformed categorical data into numerical format using mapping for model compatibility. We replaced 'type' and 'isFraud' column values with numerical representations using the .map() function of pandas.

Model Training: We employed a Decision Tree Classifier from the sklearn library to train a machine learning model using a subset of the data. The model learned patterns from transaction features to predict the likelihood of fraud.

Model Evaluation: We assessed the model's performance by calculating its accuracy on a separate test dataset. This process provided insight into the model's ability to accurately identify fraudulent transactions.

Prediction: We demonstrated the model's practical use by providing an example of how to make predictions on new transaction data. This showcases how the model could be integrated into a system for real-time fraud detection.

In [4]:

type = data["type"].value_counts()
transactions = type.index
quantity = type.values

import plotly.express as px
figure = px.pie(data,
             values=quantity,
             names=transactions,hole = 0.5,
             title="Distribution of Transaction Type")
figure.show()

In [6]:
# Checking correlation
correlation = data.corr(numeric_only=True) # Pass numeric_only=True
print(correlation["isFraud"].sort_values(ascending=False))

isFraud           1.000000
amount            0.076688
isFlaggedFraud    0.044109
step              0.031578
oldbalanceOrg     0.010154
newbalanceDest    0.000535
oldbalanceDest   -0.005885
newbalanceOrig   -0.008148
Name: isFraud, dtype: float64


In [7]:

data["type"] = data["type"].map({"CASH_OUT": 1, "PAYMENT": 2,
                                 "CASH_IN": 3, "TRANSFER": 4,
                                 "DEBIT": 5})
data["isFraud"] = data["isFraud"].map({0: "No Fraud", 1: "Fraud"})
data.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,2,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,No Fraud,0
1,1,2,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,No Fraud,0
2,1,4,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,Fraud,0
3,1,1,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,Fraud,0
4,1,2,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,No Fraud,0


In [8]:

# splitting the data
from sklearn.model_selection import train_test_split
x = np.array(data[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data[["isFraud"]])


In [9]:

# training a machine learning model
from sklearn.tree import DecisionTreeClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.20, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
model.score(xtest, ytest)

0.9997092392756443

**Conclusion**
This collaborative effort successfully implemented a machine learning model for fraud detection in financial transactions. The steps undertaken, from initial data exploration to model evaluation and prediction, showcase a comprehensive data science workflow, enabling identification of potential fraudulent activities and contributing to greater financial security.

In [14]:
# prediction
#features = [type, amount, oldbalanceOrg, newbalanceOrig]
features = np.array([[4, 12000.60, 90200.60, 0222222.02222]])
print(model.predict(features))

['No Fraud']
