<a href="https://colab.research.google.com/github/MahsaAbdollahiM/AI_and_ML_Assignment/blob/main/Fraud%20detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Fraud Detection Dataset Overview
The dataset contains information about financial transactions.
Each row represents a single transaction.
The data may have been preprocessed or modified for specific purposes, such as fraud detection.
Features include transaction details such as amount, type, old balance, new balance, and additional engineered features.
The target variable (label) for analysis is 'isFraud', which indicates whether a transaction is fraudulent or not.

The code first imports necessary libraries such as pandas, numpy, and scikit-learn modules.
Data is loaded from a CSV file, and basic information about the dataset is displayed.
Features and labels are defined based on the dataset columns.
The dataset is split into training and testing sets.
Feature scaling is performed using StandardScaler.
A Logistic Regression model is trained on the training data.
The model's performance is evaluated on both training and testing sets.
Finally, the model is used to predict fraud on new transaction data, and probabilities of fraud are displayed.

In [17]:
# loading initial libraries
import seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler



In [18]:
# Load the transaction data from a CSV file
transactions = pd.read_csv('transactions_modified.csv', delimiter=';')

# Display the first few rows and information about the dataset
print(transactions.head())
print(transactions.info())

#Summary statistics on amount column
transactions['amount'].describe()

   step      type      amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0   206  CASH_OUT    62927.08   C473782114           0.00            0.00   
1   380   PAYMENT    32851.57  C1915112886           0.00            0.00   
2   570  CASH_OUT  1131750.38  C1396198422     1131750.38            0.00   
3   184  CASH_OUT    60519.74   C982551468       60519.74            0.00   
4   162   CASH_IN    46716.01  C1759889425     7668050.60      7714766.61   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isPayment  \
0  C2096898696       649420.67       712347.75        0          0   
1   M916879292            0.00            0.00        0          1   
2  C1612235515       313070.53      1444820.92        1          0   
3  C1378644910        54295.32       182654.50        1          0   
4  C2059152908      2125468.75      2078752.75        0          0   

   isMovement  accountDiff  
0           1    649420.67  
1           0         0.00  
2           1    818679.85  


count    1.000000e+03
mean     5.373080e+05
std      1.423692e+06
min      0.000000e+00
25%      2.933705e+04
50%      1.265305e+05
75%      3.010378e+05
max      1.000000e+07
Name: amount, dtype: float64

In [19]:

# Create a new feature 'isPayment' based on transaction type
transactions['isPayment'] = 0
transactions['isPayment'] = 0
transactions['isPayment'][transactions['type'].isin(['PAYMENT','DEBIT'])] = 1

# Create a new feature 'isMovement' based on transaction type
transactions['isMovement'] = 0
transactions['isMovement'][transactions['type'].isin(['CASH_OUT', 'TRANSFER'])] = 1

# Create a new feature 'accountDiff' representing the absolute difference between old balances
transactions['accountDiff'] = abs(transactions['oldbalanceDest'] - transactions['oldbalanceOrg'])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  transactions['isPayment'][transactions['type'].isin(['PAYMENT','DEBIT'])] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  transactions['isMovement'][transactions['type'].isin(['CASH_OUT', 'TRANSFER'])] = 1


In [20]:
#  Define features (independent variables) and labels (dependent variable)
features = transactions[['amount','isPayment','isMovement','accountDiff']]
label = transactions['isFraud']


In [21]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, label, test_size=0.3)

# Normalize the features variables using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Fit the model to the training data, Initialize and train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Score the model on the training data
print(model.score(X_train, y_train))

# Score the model on the test data
print(model.score(X_test, y_test))

# Print the model coefficients
print(model.coef_)


0.8285714285714286
0.87
[[ 2.41674683 -0.62539346  2.04023623 -1.06798595]]


In [22]:
# Define New transaction data
transaction1 = np.array([123456.78, 0.0, 1.0, 54670.1])
transaction2 = np.array([98765.43, 1.0, 0.0, 8524.75])
transaction3 = np.array([543678.31, 1.0, 0.0, 510025.5])
transaction4 = np.array([6472.54, 1.0, 0.0, 55901.23])


In [23]:
# Combine new transactions into a single array
sample_transactions = np.stack((transaction1,transaction2,transaction3,transaction4))

# Normalize the new transactions
sample_transactions = scaler.transform(sample_transactions)

# Predict fraud on the new transactions
print("Predicted Fraud Status:",model.predict(sample_transactions))

# Show probabilities on the new transactions
print("Fraud Probabilities:",model.predict_proba(sample_transactions))

Predicted Fraud Status: [0 0 0 0]
Fraud Probabilities: [[0.61687566 0.38312434]
 [0.99801698 0.00198302]
 [0.99635574 0.00364426]
 [0.99832496 0.00167504]]


