This is my academic project and it is best demonstration of Machine learning skill.
This project presents a complete, end-to-end system for detecting fraudulent financial transactions using a machine learning model that is both highly accurate and fully explainable. The system is built using a Python-based stack and features a back-end API that serves the model and a user-friendly web interface for interaction and demonstration. An alternative, simplified deployment method using a single Streamlit application is also provided.
The core of the project is an XGBoost classifier trained on the PaySim dataset from Kaggle. A key focus of this work was to build a system under realistic data constraints, meaning it does not rely on private recipient information. Its high performance is achieved through advanced feature engineering and a focus on model interpretability using SHAP.
- High-Recall Model: The final XGBoost model achieves 99% recall on the fraud class, demonstrating its effectiveness at the primary business goal: catching fraudsters.
- Advanced Feature Engineering: The model's intelligence is driven by custom-built features like
senderBalanceErrorandisOrigAccountEmptythat capture the behavioral signatures of fraud. - Explainable AI (XAI): The system is not a "black box." By integrating SHAP, we can prove why the model makes its decisions, making it transparent and trustworthy.
- Interactive Web UI: A user-friendly front-end built with Streamlit allows for easy manual transaction checks and live demonstrations.
- Comprehensive Simulation: The UI includes a multi-scenario simulation that demonstrates the system's ability to detect various fraud patterns, including velocity attacks and account takeovers.
- API-Based Architecture (Primary): The machine learning model is served via a robust Flask API, separating the model logic from the user interface and allowing for easy integration.
- Simplified Streamlit Deployment (Alternative): An option to run the entire application as a single Streamlit script, embedding model loading and prediction directly within the UI.
The application offers two deployment options:
This operates on a client-server architecture:
-
Back-End (Flask API):
- Loads the pre-trained Isolation Forest and XGBoost models.
- Exposes a
/predictendpoint that receives transaction data. - Implements the two-stage detection logic: a fast anomaly scan followed by a deep analysis.
- Returns the final prediction as a JSON response.
-
Front-End (Streamlit UI):
- Provides a clean user interface for entering transaction details.
- Sends the user input to the Flask API.
- Receives the prediction and displays the result in a clear, user-friendly format, including visual alerts for the detection stage and velocity.
+----------------+ +---------------------+ +----------------------+
| Streamlit UI | <--> | Flask API | <--> | ML Models (joblib) |
| (app_ui.py) | | (app.py) | | (XGBoost, IsoForest) |
+----------------+ +---------------------+ +----------------------+
In this setup, the Streamlit application directly loads the machine learning models and performs predictions, eliminating the need for a separate Flask API.
+-------------------------------------------------+
| Streamlit UI (app_ui.py) |
| (Loads ML Models directly, performs prediction)|
+-------------------------------------------------+
| |
V V
+----------------------+ +----------------------+
| ML Models (joblib) | | ML Models (joblib) |
| (XGBoost, IsoForest) | | (XGBoost, IsoForest) |
+----------------------+ +----------------------+
The project followed a comprehensive machine learning pipeline:
- Data Analysis: The PaySim dataset was analyzed, revealing that fraud only occurred in
TRANSFERandCASH_OUTtransactions. The data was filtered accordingly. - Feature Engineering: New, highly predictive features were created from the raw data to capture behavioral patterns without using private recipient information.
- Model Training: A comparative analysis was performed between a Random Forest baseline and an XGBoost classifier. The models were trained to handle the extreme class imbalance by using class weights (
scale_pos_weight), prioritizing recall. - Model Evaluation: The XGBoost model was selected as the final model due to its superior recall (99%) on the unseen test set.
- Explainability Analysis: SHAP was used to analyze the final XGBoost model, confirming that our engineered features were the most important drivers of its predictions.
The final XGBoost model demonstrated excellent performance, prioritizing the critical task of catching fraud.
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Fraud | 0.27 | 0.99 | 0.43 |
The 99% recall proves the model's effectiveness. The lower precision is an accepted and well-understood trade-off in fraud detection, where minimizing missed frauds is the top priority.
- Back-End: Python, Flask (for original architecture)
- Machine Learning: Pandas, Scikit-learn, XGBoost, SHAP
- Front-End: Streamlit
- Data Analysis: Jupyter Notebook (or Google Colab)
.
├── fraud_api/
│ ├── app.py # The Flask API server
│ ├── final_fraud_model.joblib # The trained XGBoost model
│ └── isolation_forest_model.joblib # The trained Isolation Forest model
│
├── fraud_ui/
│ └── app_ui.py # The Streamlit UI application (connects to Flask API)
│
├── streamlit_deployment/
│ ├── app_ui.py # The Streamlit UI application (standalone, loads models directly)
│ ├── final_fraud_model.joblib # Copy of the trained XGBoost model
│ ├── isolation_forest_model.joblib # Copy of the trained Isolation Forest model
│ └── requirements.txt # Dependencies for Streamlit-only deployment
│
├── notebook/
│ └── Fraud_Detection_Analysis.ipynb # Your analysis notebook (optional)
│
└── README.md # This file
-
Clone the repository:
git clone https://github.com/agp-369/Fraud-Detection-Using-Machine-Learning.git cd Fraud-Detection-Using-Machine-Learning -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages: The required packages depend on which deployment method you choose.
Create a
requirements.txtfile in your project root (or use the one infraud_apiandfraud_uiif they exist) with the following content and runpip install -r requirements.txt.requirements.txt(for Flask API + Streamlit UI):flask pandas scikit-learn xgboost streamlit requestsNavigate to the
streamlit_deploymentdirectory and install its specific requirements.streamlit_deployment/requirements.txt:pandas scikit-learn xgboost streamlitThen run:
cd streamlit_deployment pip install -r requirements.txt cd .. # Go back to project root if needed
You have two options for running the application:
This system requires two terminals running simultaneously.
-
Start the Back-End API: Open a terminal, navigate to the
fraud_apidirectory, and run:cd fraud_api python app.pyYou should see output indicating the server is running on
http://127.0.0.1:5000. -
Start the Front-End UI: Open a second terminal, navigate to the
fraud_uidirectory, and run:cd fraud_ui streamlit run app_ui.pyThis will automatically open a new tab in your web browser with the user interface, usually at
http://localhost:8501.
This method runs the entire application from a single Streamlit script.
-
Navigate to the
streamlit_deploymentdirectory:cd streamlit_deployment -
Ensure model files are present: Make sure
isolation_forest_model.joblibandfinal_fraud_model.joblibare copied into thestreamlit_deploymentdirectory. -
Run the Streamlit application:
streamlit run app_ui.py
This will automatically open a new tab in your web browser with the user interface, usually at
http://localhost:8501.
Experience the Fraud Detection System live: [Launch Streamlit App]https://frauddetection-system.streamlit.app/
You can now use the web interface to check transactions manually or run the comprehensive simulation to see the hybrid detection system in action.

