
# 🛡️ Financial Fraud Detection System

A comprehensive project developed during my data analytics internship. This system focuses on **real-time detection of financial fraud** using a combination of **supervised and unsupervised machine learning**, **graph-based analysis**, **adaptive learning**, **email alerts**, and a fully interactive **Power BI dashboard**.

---

## 📁 Table of Contents

- [🧨 Problem Statement](#-problem-statement)  
- [📊 Dataset](#-dataset)  
- [🗂️ Project Structure](#️-project-structure)  
- [🤖 Models Used](#-models-used)  
- [🌐 Graph-Based Fraud Analysis](#-graph-based-fraud-analysis)  
- [📩 Email Alert System](#-email-alert-system)  
- [🔁 Adaptive Learning](#-adaptive-learning)  
- [⏱️ Real-Time Fraud Monitoring (Simulation)](#️-real-time-fraud-monitoring-simulation)  
- [📊 Power BI Dashboard](#-power-bi-dashboard)  
- [⚙️ How to Run](#️-how-to-run)  
- [🔮 Future Enhancements](#-future-enhancements)  
- [🙌 Credits](#-credits)  

---

## 🧨 Problem Statement

> **Online financial fraud** is increasing rapidly. The goal of this project is to develop a **robust, multi-model detection system** that can:
- Detect fraud in real time,
- Adapt to new fraud patterns,
- Trigger email alerts,
- Visualize insights using a **Power BI Dashboard**.

---

## 📊 Dataset

- **Source**: [PaySim simulated financial dataset](https://data.mendeley.com/datasets/5fgyxf6tgs/1)
- **Records**: ~6 million transactions
- **Target Column**: `isFraud`

### Key Columns:
- `type` – Transaction type  
- `amount` – Transferred amount  
- `nameOrig`, `nameDest` – Sender & receiver IDs  
- `oldbalanceOrg`, `newbalanceOrig`, etc.  

---

## 🗂️ Project Structure

```
📁 data/
    ├── paysim_cleaned.csv
    ├── df_with_predictions.csv

📁 models/
    ├── RandomForest.pkl
    ├── LogisticRegression.pkl
    └── ...

📁 notebooks/
    ├── 01_Preprocessing_Visualization.ipynb
    ├── 02_Model_Training.ipynb
    ├── 03_Graph_Analysis.ipynb
    ├── 04_Email_Alert_System.ipynb
    ├── 05_Adaptive_Learning.ipynb
    └── 06_Stream_Simulation.ipynb

📁 dashboard/
    └── PowerBI_Report.pbix

📄 README.md
```

---

## 🤖 Models Used

| Model               | Type          | Tuned | Adaptive Learning |
|--------------------|---------------|:-----:|:-----------------:|
| Logistic Regression| Supervised    | ✅    | ✅                |
| Random Forest      | Supervised    | ✅    | ✅                |
| XGBoost            | Supervised    | ✅    | ❌                |
| ETS                | Supervised    | ✅    | ❌                |
| ARIMA              | Time-Series   | ✅    | ❌                |
| Prophet            | Time-Series   | ✅    | ❌                |
| Isolation Forest   | Unsupervised  | ✅    | ❌                |

Model performance metrics (Accuracy, Precision, Recall, F1-Score) are displayed on the Power BI dashboard.

---

## 🌐 Graph-Based Fraud Analysis

- Used **NetworkX** to model transactions as a directed graph.
- **Nodes** = Accounts  
- **Edges** = Transactions  
- Applied **centrality** and **community detection algorithms** to identify:
  - Suspicious accounts sending funds to many others
  - High-risk transaction clusters

---

## 📩 Email Alert System

- Sends **email alerts** when fraud is detected using selected models.
- SMTP (Gmail) used with App Password authentication.
- Dynamic email content includes: sender, receiver, amount.
- Easily customizable to use any trained model.

---

## 🔁 Adaptive Learning

- Models retrained incrementally using **`partial_fit()`** (Logistic Regression, Random Forest).
- Simulates **concept drift** (changing fraud patterns).
- Real-time learning from incoming batches of new transactions.

---

## ⏱️ Real-Time Fraud Monitoring (Simulation)

- Simulates **live transaction stream** using Python.
- Mimics a real-time pipeline similar to **Kafka/Spark**.
- For each simulated batch:
  - Predict fraud,
  - Trigger alert,
  - Update logs and visualization.

---

## 📊 Power BI Dashboard

An interactive, professional dashboard created in **Power BI** featuring:

- Clustered bar charts of model performance
- Slicers to filter by model or time
- Fraud statistics and precision-recall comparisons
- Clean UI/UX for stakeholders and decision-makers

---

## ⚙️ How to Run

1. **Clone the repository**

```bash
git clone https://github.com/yourusername/financial-fraud-detection.git
cd financial-fraud-detection
```

2. **Install required packages**

```bash
pip install -r requirements.txt
```

3. **Run notebooks in sequence**

- `01_Preprocessing_Visualization.ipynb`
- `02_Model_Training.ipynb`
- `03_Graph_Analysis.ipynb`
- `04_Email_Alert_System.ipynb`
- `05_Adaptive_Learning.ipynb`
- `06_Stream_Simulation.ipynb`

4. **Configure Email Settings**

Create a `.env` file with:

```ini
EMAIL_SENDER=your_email@gmail.com
EMAIL_APP_PASSWORD=your_app_password
```

---

## 🔮 Future Enhancements

- Real-time streaming with **Kafka** or **Apache Spark**
- Deploy model using **Flask** or **Streamlit**
- Add anomaly score maps to **Power BI**
- Build **REST API** for on-demand fraud prediction

---

## 🙌 Credits

- Interned under the guidance of **[Mentor's Name]**
- Thanks to the creators of the **PaySim Dataset**
- Built using: `Python`, `Scikit-learn`, `NetworkX`, `Power BI`

---
