# Online Payments Fraud Detection
In today's world, online payments have become a pivotal factor in facilitating financial transactions. This fast and efficient method of conducting transactions and payments has significantly increased convenience and productivity for individuals and organizations. However, it also comes with complexities and unique challenges. One of the critical challenges in this field is online payment fraud.
Online payment fraud refers to any unauthorized or illicit activity within the payment process that directly or indirectly leads to financial losses for individuals or organizations. These frauds may include theft of financial information, credit card misuse, collusion in online buying and selling processes, or even fraudulent interbank transactions.
In this project, we aim to identify and prevent online payment fraud using advanced technologies and machine learning models. Through the analysis of transaction patterns, transaction history, customer information, and other factors, we intend to detect potentially fraudulent activities and take appropriate measures to prevent them.

In order to detect online payment fraud using machine learning, it is essential to train a machine learning model capable of distinguishing between fraudulent and legitimate payments. To achieve this, we require a dataset (https://www.kaggle.com/ealaxi/paysim1/download) that provides insights into online payment fraud, helping us understand the characteristics of transactions that are more likely to result in fraudulent activities. To tackle this task, I have acquired a dataset from Kaggle, which contains historical information related to fraudulent transactions, making it a valuable resource for detecting fraud within online payments. The dataset includes several columns with the following information:

    step: represents a unit of time where 1 step equals 1 hour
    type: type of online transaction
    amount: the amount of the transaction
    nameOrig: customer starting the transaction
    oldbalanceOrg: balance before the transaction
    newbalanceOrig: balance after the transaction
    nameDest: recipient of the transaction
    oldbalanceDest: initial balance of recipient before the transaction
    newbalanceDest: the new balance of recipient after the transaction
    isFraud: fraud transaction
    
In this model, the decision tree algorithm is used, the result of which is determined by providing the type, amount, oldbalanceOrg and newbalanceOrig to detect whether the payment is fraudulent or not.

<<< let's start >>>

We intend to display the initial few rows of the DataFrame, which is loaded from a CSV file named "credit card.csv," using the pandas library:

In [10]:
import pandas as pd
import numpy as np
data = pd.read_csv("credit card.csv")
print(data.head())

   step      type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1   PAYMENT   9839.64  C1231006815       170136.0       160296.36   
1     1   PAYMENT   1864.28  C1666544295        21249.0        19384.72   
2     1  TRANSFER    181.00  C1305486145          181.0            0.00   
3     1  CASH_OUT    181.00   C840083671          181.0            0.00   
4     1   PAYMENT  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0        0               0  
1  M2044282225             0.0             0.0        0               0  
2   C553264065             0.0             0.0        1               0  
3    C38997010         21182.0             0.0        1               0  
4  M1230701703             0.0             0.0        0               0  


We want to display the count of empty values in the DataFrame:

In [11]:
print(data.isnull().sum())

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    0
isFraud           0
isFlaggedFraud    0
dtype: int64


This line of code is used to count the occurrences of different values in the 'type' column of the DataFrame referred to as 'data' and print the count for each unique value in that column:

In [12]:
print(data.type.value_counts())

type
CASH_OUT    2237500
PAYMENT     2151495
CASH_IN     1399284
TRANSFER     532909
DEBIT         41432
Name: count, dtype: int64


This code segment creates a pie chart to visualize the distribution of transaction types from the 'data' DataFrame. It counts the occurrences of different transaction types and displays them with Plotly Express:

In [13]:
type = data["type"].value_counts()
transactions = type.index
quantity = type.values

import plotly.express as px
figure = px.pie(data, 
             values=quantity, 
             names=transactions,hole = 0.4, 
             title="Distribution of Transaction Type")
figure.show()

It modifies the 'type' and 'isFraud' columns in the DataFrame by changing their values, and then displays the updated DataFrame:

In [14]:
data["type"] = data["type"].map({"CASH_OUT": 1, "PAYMENT": 2, 
                                 "CASH_IN": 3, "TRANSFER": 4,
                                 "DEBIT": 5})
data["isFraud"] = data["isFraud"].map({0: "No Fraud", 1: "Fraud"})
print(data.head())

   step  type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1     2   9839.64  C1231006815       170136.0       160296.36   
1     1     2   1864.28  C1666544295        21249.0        19384.72   
2     1     4    181.00  C1305486145          181.0            0.00   
3     1     1    181.00   C840083671          181.0            0.00   
4     1     2  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest   isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0  No Fraud               0  
1  M2044282225             0.0             0.0  No Fraud               0  
2   C553264065             0.0             0.0     Fraud               0  
3    C38997010         21182.0             0.0     Fraud               0  
4  M1230701703             0.0             0.0  No Fraud               0  


It separates the dataset into input features ('x') and the target variable ('y') for a machine learning model:

In [15]:
from sklearn.model_selection import train_test_split
x = np.array(data[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data[["isFraud"]])

It creates and trains a Decision Tree Classifier model, splits the data, and prints the model's accuracy on the test data:

In [16]:
from sklearn.tree import DecisionTreeClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

0.999732814469511


And testing the model with the following example, which should return a negative result:

In [17]:
features = np.array([[4, 9000.60, 9000.60, 0.0]])
print(model.predict(features))

['Fraud']
