Credit Card Fraud Detection System

Project Overview

This project implements a machine learning system for detecting fraudulent credit card transactions. The system utilizes multiple classification algorithms to identify potential fraud cases with high precision and recall, offering a comprehensive approach to financial security.

Deployment Instructions

Render Deployment

Sign up for a Render account at render.com
Create a new Web Service and connect your GitHub repository
Set the following configuration:

Build Command:

pip install -r requirements.txt

Start Command:

uvicorn api.index:app --reload --port 8000 --lifespan on

Note: Render will automatically detect the requirements.txt file and install dependencies.

API Endpoints

Make a prediction by providing a transformed transaction data

The response body is as follows:

Make a prediction by providing raw transaction data

The response body is as follows:

Installation

# Clone the repository
git clone https://github.com/brabentil/Model.git
cd credit-card-fraud-detection

# Set up virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the application
python main.py

Dataset

The project uses the credit card transaction dataset (data/creditcard.csv) with the following characteristics:

284,807 transactions with 31 features
Highly imbalanced: 492 fraudulent transactions (0.17%) vs 284,315 legitimate transactions
Features include time, amount, and 28 anonymized features (V1-V28) derived from PCA transformation
No missing values in the dataset

Key statistics:

Transaction amounts range from $0 to $25,691.16
Average transaction amount: $88.35
Class distribution: 284,315 legitimate (Class 0) vs 492 fraudulent (Class 1)

Methodology

Data Preprocessing

Exploratory Data Analysis (EDA)
- Examined class distribution and feature statistics
- Verified no missing values
Handling Class Imbalance
- Applied Synthetic Minority Over-sampling Technique (SMOTE)
- Balanced class distribution to 227,451 samples per class
Feature Engineering
- Standardized features to improve model performance
- Split data into training (80%) and testing (20%) sets

Models Implemented

Four classification algorithms were implemented and compared:

Logistic Regression
K-Nearest Neighbors (KNN)
Decision Tree
Random Forest

Model Performance

Summary of model performance metrics:

Model	Accuracy	Precision (Fraud)	Recall (Fraud)	F1-Score (Fraud)	ROC AUC
Logistic Regression	98.80%	0.12	0.90	0.20	0.943
KNN	94.53%	0.02	0.53	0.03	0.738
Decision Tree	99.78%	0.42	0.80	0.55	0.897
Random Forest	99.94%	0.83	0.83	0.83	0.913

Visualizations

Class Distribution

Visual representation of the class distribution before and after preprocessing.

Confusion Matrices

Decision Tree

Confusion matrix for the Decision Tree model showing classification performance.

K-Nearest Neighbors (KNN)

Confusion matrix for the KNN model highlighting classification results.

Logistic Regression

Confusion matrix for the Logistic Regression model.

Random Forest

Confusion matrix for the Random Forest model, displaying prediction performance.

Correlation Matrix

Heatmap showing feature correlations within the dataset.

Feature Importance

Key features contributing to fraud detection as determined by the model.

PCA Visualization

Principal Component Analysis (PCA) representation of the dataset for dimensionality reduction.

# Start the API server
python main.py

# Example API request
curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": [0.0, -1.35, -0.07, 2.53, 1.37, -0.33, 0.46, 0.23, ..., 149.62]}'

Results Interpretation

Analysis of Model Performance

The Random Forest model emerged as the best performing model with:

High Accuracy (99.94%): Nearly perfect overall classification.
Balanced Precision and Recall (both 0.83): Critical for fraud detection where both false positives and false negatives carry significant costs.
Strong ROC AUC (0.913): Indicates excellent discrimination ability between classes.

Comparing Model Behaviors

Logistic Regression: Despite high recall (0.90) and AUC (0.943), its precision is very poor (0.12), meaning many legitimate transactions would be falsely flagged as fraudulent.
KNN: Underperformed with low precision (0.02) and moderate recall (0.53).
Decision Tree: Good recall (0.80) but moderate precision (0.42).
Random Forest: Provides the best balance of all metrics, making it the optimal choice for deployment.

Conclusion

This project successfully developed a machine learning system capable of detecting fraudulent credit card transactions with high accuracy and balanced precision-recall metrics. The Random Forest algorithm proved most effective for this task, demonstrating the power of ensemble methods for complex classification problems with imbalanced data.

Future Enhancements

Implement real-time transaction scoring
Explore deep learning approaches (LSTM, Autoencoders)
Add explainability features for model decisions
Develop an interactive dashboard for fraud monitoring
Implement model monitoring for performance drift detection

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
api		api
assets		assets
controllers		controllers
model		model
models		models
routes		routes
visualizations		visualizations
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
api.py		api.py
api_debug.log		api_debug.log
api_server.py		api_server.py
gunicorn_config.py		gunicorn_config.py
main.py		main.py
model.py		model.py
render.yaml		render.yaml
requirements.txt		requirements.txt
run_api.py		run_api.py
start.bat		start.bat
start.sh		start.sh
transaction_transformer.py		transaction_transformer.py
vercel.json		vercel.json
verify_deployment.py		verify_deployment.py

brabentil/Model

Folders and files

Latest commit

History

Repository files navigation