<a href="https://colab.research.google.com/github/Farazmghm/fraud_detection/blob/main/Credit_card_fraud_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Project Description
This project focuses on developing a machine learning model to detect fraudulent banking transactions. In today's fast-paced financial systems, the ability to identify suspicious activity in real-time is crucial for preventing fraud and safeguarding user assets. The goal is to leverage ML algorithms to build a reliable fraud detection system capable of distinguishing between legitimate and fake transactions.

📂 Dataset Overview
The dataset consists of 20 features related to each transaction, capturing information such as:

Transaction details: ID, amount, type, timestamp

Account data: balance, user ID

Device & location data: device type, geographic location, IP flag

Historical behavior: previous fraud history, daily count, failed transactions

Card & authentication: card type, card age, auth method

Contextual features: risk score, transaction distance, weekend flag

Target Variable: Fraud_Label

1: Fraudulent transaction

0: Genuine transaction

🧠 Tasks and Workflow
✅ 1. Data Preprocessing
Encode categorical features using LabelEncoder

Handle invalid or missing data

Normalize numerical features for optimal model performance

✅ 2. Model Training
Trained and compared several ensemble-based machine learning algorithms:

Gradient Boosting

XGBoost

LightGBM

AdaBoost

✅ 3. Model Evaluation
Evaluated the models using the following metrics:

Accuracy

Precision

Recall

F1-Score

AUC-ROC Curve

✅ 4. Interactive Web Interface with Streamlit
Developed a Streamlit web app where users can input transaction data

The model predicts whether the transaction is fraudulent or valid

✅ 5. Model Saving and Deployment
Trained model saved using joblib

Integrated into the Streamlit app for real-time predictions



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings("ignore")

In [3]:
df = pd.read_csv('/content/drive/MyDrive/fraud_dataset_mod.csv')

In [4]:
df.head()

Unnamed: 0,Transaction_ID,User_ID,Transaction_Amount,Transaction_Type,Timestamp,Account_Balance,Device_Type,Location,Merchant_Category,IP_Address_Flag,...,Daily_Transaction_Count,Avg_Transaction_Amount_7d,Failed_Transaction_Count_7d,Card_Type,Card_Age,Transaction_Distance,Authentication_Method,Risk_Score,Is_Weekend,Fraud_Label
0,TXN_33553,USER_1834,39.79,POS,2023-08-14 19:30:00,93213.17,Laptop,Sydney,,0.0,...,7.0,437.63,3.0,Amex,65.0,883.17,Biometric,0.8494,0.0,0.0
1,TXN_9427,USER_7875,1.19,Bank Transfer,2023-06-07 04:01:00,75725.25,Mobile,New York,Clothing,0.0,...,13.0,478.76,4.0,Mastercard,186.0,2203.36,Password,0.0959,0.0,1.0
2,TXN_199,USER_2734,28.96,Online,2023-06-20 15:25:00,1588.96,Tablet,Mumbai,Restaurants,0.0,...,14.0,50.01,4.0,Visa,226.0,1909.29,Biometric,0.84,,1.0
3,TXN_12447,USER_2617,254.32,ATM Withdrawal,2023-12-07 00:31:00,76807.2,Tablet,New York,Clothing,0.0,...,8.0,182.48,4.0,Visa,,1311.86,OTP,0.7935,0.0,1.0
4,TXN_39489,USER_2014,31.28,POS,2023-11-11 23:44:00,92354.66,Mobile,Mumbai,Electronics,0.0,...,14.0,328.69,4.0,Mastercard,140.0,966.98,Password,0.3819,1.0,1.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 21 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Transaction_ID                48896 non-null  object 
 1   User_ID                       49052 non-null  object 
 2   Transaction_Amount            49018 non-null  float64
 3   Transaction_Type              49018 non-null  object 
 4   Timestamp                     49008 non-null  object 
 5   Account_Balance               48984 non-null  float64
 6   Device_Type                   48977 non-null  object 
 7   Location                      49000 non-null  object 
 8   Merchant_Category             49025 non-null  object 
 9   IP_Address_Flag               49076 non-null  float64
 10  Previous_Fraudulent_Activity  48969 non-null  float64
 11  Daily_Transaction_Count       48971 non-null  float64
 12  Avg_Transaction_Amount_7d     49004 non-null  float64
 13  F

In [6]:
df.isna().sum()


Unnamed: 0,0
Transaction_ID,1104
User_ID,948
Transaction_Amount,982
Transaction_Type,982
Timestamp,992
Account_Balance,1016
Device_Type,1023
Location,1000
Merchant_Category,975
IP_Address_Flag,924


In [7]:
df.duplicated().sum()

np.int64(0)

In [8]:
df.shape

(50000, 21)