### **Credit Card Fraud Detection Using Machine Learning**

Credit card fraud is a significant issue in financial security, and machine learning models can help detect fraudulent transactions. This project aims to build and evaluate multiple models to classify fraudulent and non-fraudulent transactions using a dataset from Kaggle.




#### Dataset Overview
- **Total Entries**: 100,000 transactions
- **Columns**:
  - `TransactionID` (int): Unique identifier for each transaction
  - `TransactionDate` (object): Timestamp of the transaction
  - `Amount` (float): Transaction amount
  - `MerchantID` (int): ID of the merchant
  - `TransactionType` (object): Type of transaction (e.g., purchase, refund)
  - `Location` (object): Location where the transaction occurred
  - `IsFraud` (int): **Target variable** (0 = legitimate, 1 = fraud)

let's proceed! 🚀

 #### Project Workflow

 1. Data Preprocessing
 2. Exploratory Data Analysis (EDA)
 3. Handling Class Imbalance
 4. Feature Scaling and Transformation
 5. Model Training & Evaluation
 6. Hyperparameter Tuning
 7. Conclusion & Future Improvements

#### 1. Data Loading and Preprocessing

1.1 Import Required Libraries




In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
import warnings
import gc


# Enable inline plotting for Jupyter notebooks
%matplotlib inline

# Set the style for matplotlib plots
sns.set_palette("viridis")

warnings.filterwarnings('ignore')

gc.enable()  # Enable garbage collection

1.2 Load Dataset

In [None]:
# Load the dataset
df = pd.read_csv('credit_card_fraud_dataset.csv')
print("Dataset loaded successfully!")
print("Shape of dataset:", df.shape)



1.3 Check for Missing Values

In [None]:
print(df.isnull().sum().sum())  # No missing values expected

# Display basic info and first few rows
df_info = df.info()
df_head = df.head()




1.4 Data Cleaning and Feature Engineering

In [None]:
# Convert 'TransactionDate' to datetime
df['TransactionDate'] = pd.to_datetime(df['TransactionDate'], errors='coerce')

# Check for any missing or null values
print("Missing values in each column:")
print(df.isnull().sum())

# Extract date and time features from TransactionDate
df['Hour'] = df['TransactionDate'].dt.hour
df['Day'] = df['TransactionDate'].dt.day
df['Month'] = df['TransactionDate'].dt.month

# Encode categorical variables ('TransactionType' and 'Location')
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

df['TransactionType'] = le.fit_transform(df['TransactionType'])
df['Location'] = le.fit_transform(df['Location'])

# Drop TransactionID and TransactionDate after extracting necessary features
df.drop(['TransactionID', 'TransactionDate'], axis=1, inplace=True)

# Verify the transformed DataFrame
print("DataFrame after preprocessing:")
display(df.head())


#### 2. Exploratory Data Analysis (EDA)

2.1 Check for Class Imbalance

In [None]:
sns.countplot(x=df["Class"])
plt.title("Class Distribution")
plt.show()

fraud_percentage = df["Class"].mean() * 100
print(f"Fraud Percentage: {fraud_percentage:.4f}%")


2.2 Descriptive Statistics and Distributions

In [None]:
# Descriptive statistics of the dataset
print("Descriptive statistics:")
display(df.describe())

# Visualizing the distribution of transaction amounts
plt.figure(figsize=(8, 6))
sns.histplot(df['Amount'], bins=50, kde=True)
plt.title('Transaction Amount Distribution')
plt.xlabel('Amount')
plt.ylabel('Frequency')
plt.show()

2.3 Correlation Analysis

In [None]:
# Plot correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()


2.4 Fraud Distribution and Categorical Analysis

In [None]:
# Fraud distribution
plt.figure(figsize=(6, 4))
sns.countplot(x='IsFraud', data=df)
plt.title('Fraudulent vs Non-Fraudulent Transactions')
plt.xlabel('Is Fraud')
plt.ylabel('Count')
plt.show()

# Transaction Type vs Fraud
plt.figure(figsize=(6, 4))
sns.countplot(x='TransactionType', hue='IsFraud', data=df)
plt.title('Transaction Type and Fraud Occurrence')
plt.xlabel('Transaction Type (Encoded)')
plt.ylabel('Count')
plt.show()

# Location vs Fraud (sampled for readability if many unique values)
plt.figure(figsize=(12, 6))
sns.countplot(x='Location', hue='IsFraud', data=df)
plt.title('Location and Fraud Occurrence')
plt.xlabel('Location (Encoded)')
plt.ylabel('Count')
plt.xticks(rotation=90)
plt.show()
