TTEH LAB · School of Engineering, Dayananda Sagar University
Bangalore – 562112, Karnataka, India
"Advanced Deep Learning for Real-Time Fraud Detection in Banking"
This project presents an AI-driven fraud detection system that uses graph-based modeling and deep learning to identify fraudulent financial transactions more effectively than traditional approaches. Instead of analyzing transactions individually, the system represents users and transactions as a connected graph, enabling it to capture hidden relationships and complex fraud patterns. By leveraging Graph Neural Networks (GCN) along with attention mechanisms, the model learns both local and global interaction patterns within the data. The system is trained using a hybrid loss function to improve accuracy and robustness, and it generates real-time fraud predictions evaluated using metrics such as precision, recall, and F1-score. Overall, the approach enhances detection performance, scalability, and adaptability for modern digital payment systems.
- Problem Statement
- Tech Stack
- Methodology & Key Components
- System Architecture
- Mathematical Modeling & Core Equations
- Model Design
- Transaction Graph Visualization
- Results & Analysis
- Confusion Matrix Analysis
- Conclusion
- Contributors & Details
- IEEE Paper
"How to detect banking fraud in real time accurately?"
The banking industry faces increasingly sophisticated fraud, causing significant financial losses and reducing customer trust. Traditional rule-based and statistical systems are often reactive, struggle to adapt to evolving fraud patterns, and generate high false positives, disrupting legitimate transactions. The scale and speed of modern financial data demand a more intelligent, real-time fraud detection approach.
The goal of this project is to develop an advanced deep learning framework for real-time fraud detection that improves accuracy and efficiency using modern AI techniques. The system aims to:
- Minimize Financial Losses through precise fraud detection
- Improve Detection Speed with real-time analysis
- Reduce False Positives to avoid disrupting genuine users
- Enhance Adaptability to evolving fraud patterns
- Leverage Advanced Models such as RNNs/Transformers, GNNs, and anomaly detection techniques
Fraud Detection · Deep Learning · Real-Time Systems · Banking Security · AI Models
| Layer | Technologies |
|---|---|
| Language | Python |
| Data Processing | Pandas, NumPy |
| Imbalance Handling | SMOTE |
| ML Models | GNN, Transformers |
| Frameworks | PyTorch / TensorFlow |
| Security | Zero Trust Architecture |
| Visualization | Matplotlib, Seaborn |
| Tools | Jupyter, GitHub |
- Built using Python, enabling seamless integration of data processing, machine learning, and deep learning components.
- Efficient data handling achieved with Pandas and NumPy for preprocessing and transformation.
- Addressed class imbalance using SMOTE, improving model fairness and performance.
- Leveraged Graph Neural Networks (GNN) and Transformer models for capturing complex relationships and sequential patterns.
- Implemented using powerful frameworks like PyTorch / TensorFlow for scalable deep learning.
- Designed with a Zero Trust Architecture, enhancing system security and resilience.
- Data insights and results visualized using Matplotlib and Seaborn.
- Developed and managed using Google Colab and version-controlled via GitHub.
- Data Collection: Transaction dataset (CSV with fraud & legitimate cases)
- Preprocessing: Cleaning, feature selection, normalization
- Imbalance Handling: SMOTE applied to balance fraud class
- EDA: Pattern analysis & visualization
- Model Development: Hybrid model (GNN + Transformer)
- Adversarial Training: Improves robustness against attacks/noise
- Evaluation: Accuracy, Precision, Recall, F1-score, ROC-AUC
- Security: Zero Trust principles for secure predictions
- Data Layer: Input dataset & preprocessing
- Processing Layer: Cleaning + SMOTE balancing
- Model Layer: GNN (relationships) + Transformer (sequences)
- Training Layer: Adversarial learning & optimization
- Evaluation Layer: Metrics & performance analysis
- Security Layer: Zero Trust validation
- Output Layer: Fraud detection results & insights
flowchart TD
A[Raw Dataset CSV Input] --> B[Data Ingestion and EDA]
B --> C[Data Preprocessing]
C --> C1[Cleaning]
C --> C2[Feature Split]
C --> C3[SMOTE Balancing]
C --> D[Model Training Random Forest]
D --> E[Model Evaluation Metrics and Visualization]
E --> F[Model Serialization joblib pkl]
F --> G[Inference Engine New Data Prediction]
➡️ Purpose: Model transactions as a network
➡️ Used in: Capturing relationships between users/accounts
➡️ Purpose: Learn features from connected nodes
➡️ Used in: Detecting suspicious patterns in transaction graphs
➡️ Purpose: Focus on important interactions
➡️ Used in: Capturing global dependencies in data
➡️ Purpose: Classify transaction (fraud / non-fraud)
➡️ Used in: Final decision output
➡️ Purpose: Minimize error & improve model robustness
➡️ Used in: Training phase
➡️ Purpose: Balance precision & recall
➡️ Used in: Measuring fraud detection performance
-
Hybrid deep learning architecture combining
Graph Neural Networks (GNN) + Transformer Models -
GNN Layer
- Captures relationships between entities (graph-structured data)
- Learns connectivity patterns and hidden dependencies
-
Transformer Layer
- Processes sequential data (logs / events)
- Captures long-range dependencies using attention mechanism
-
Feature Fusion
- Outputs from GNN and Transformer are combined
- Creates a richer, context-aware representation
-
Adversarial Training
- Introduces perturbed inputs during training
- Improves robustness against attacks and noise
-
Output Layer
- Classification / prediction (e.g., anomaly detection)
Input Data → Preprocessing → GNN → Transformer → Fusion → Prediction
graph TD
%% Users
U1[User A]
U2[User B]
U3[User C]
%% Accounts
A1[Account 1]
A2[Account 2]
A3[Account 3]
%% Devices
D1[Device X]
D2[Device Y]
%% Merchants
M1[Merchant 1]
M2[Merchant 2]
%% User-Account Mapping
U1 --> A1
U2 --> A2
U3 --> A3
%% Transactions (Edges with amount)
A1 -->|500| M1
A2 -->|700| M1
A3 -->|1200| M2
%% Shared Device Relationships (Fraud Indicator)
A1 --- D1
A2 --- D1
A3 --- D2
%% Suspicious Pattern (Fraud Link)
A1 -.-> A2
| Metric / Finding | Value / Result | Analysis & Implications |
|---|---|---|
| Initial Class Distribution | Legitimate (0): 150,337 Fraudulent (1): 294 |
🚨 Severe Imbalance: The dataset is highly skewed, causing models to favor the majority class and overlook fraud cases. |
| Overall Accuracy | 99.95% | |
| Precision (Fraud Class) | 0.96 (96%) | ✅ High Confidence: Fraud predictions are highly reliable, minimizing inconvenience to legitimate users. |
| Recall (Fraud Class) | 0.80 (80%) | ❗ Critical Weakness: 20% of fraud cases are missed, leading to potential financial losses. |
| F1-Score (Fraud Class) | 0.87 (87%) | ⚖️ Balanced Performance: Indicates decent trade-off, but affected by lower recall. |
| ROC-AUC Score | ~0.898 | 📈 Strong Discrimination: Good ability to distinguish classes, but not optimal for high-security systems. |
| Confusion Matrix Breakdown | TN: 30,061 FP: 2 FN: 13 TP: 51 |
🔍 Conservative Model Behavior: Minimizes false alarms but allows some fraud cases to go undetected. |
| Pipeline Optimization Applied | SMOTE Integration | 🔧 Improvement Strategy: Balances dataset by generating synthetic fraud samples, enhancing recall and detection capability. |
- Model prioritizes precision over recall, ensuring fewer false alerts
- Class imbalance significantly impacts performance metrics
- SMOTE improves minority class detection, but further tuning is needed
- Trade-off exists between security (recall) and user experience (precision)
-
The confusion matrix evaluates the performance of the fraud detection model by comparing actual vs predicted classifications.
-
True Negatives (TN = 30,061)
- Correctly identified legitimate transactions
- Indicates strong performance in recognizing normal activity
-
False Positives (FP = 2)
- Legitimate transactions incorrectly flagged as fraud
- Very low value → ensures minimal disruption to users
-
False Negatives (FN = 13) ❗
- Fraud transactions missed by the model
- Critical issue as it may lead to financial loss
-
True Positives (TP = 51) ✅
- Correctly detected fraud cases
- Shows the model is effective in identifying fraudulent behavior
- The hybrid model combining GNN and Transformer architectures achieved high overall accuracy (~99.95%)
- Strong precision (96%) indicates reliable fraud detection with minimal false alarms
- However, recall (80%) reveals that some fraud cases remain undetected
- Severe class imbalance significantly influenced model behavior and evaluation metrics
- The model is effective in minimizing false positives, ensuring better user experience
- Missed fraud cases highlight a critical risk in real-world financial systems
- Demonstrates the importance of using appropriate metrics (Precision, Recall, F1) instead of relying solely on accuracy
- Integration of SMOTE and adversarial training improves robustness and fairness
- Improve recall through hyperparameter tuning and advanced sampling techniques
- Experiment with ensemble or more advanced deep learning models
- Optimize the system for real-time deployment and scalability
- Further strengthen the security layer with advanced zero-trust and quantum-resilient mechanisms
|
Harshitha B R ENG23CY0018 harshisuma1805@gmail.com |
Pragna G ENG23CY0031 pragna122004@gmail.com |
Akshata ENG23CY0003 tattiakshata@gmail.com |
Sunay N ENG23CY0039 Rajsunay1@gmail.com |
Druthu Katna ENG23CY0014 druthukatna51@gmail.com |
Department of Computer Science and Engineering (Cyber Security)
School of Engineering, Dayananda Sagar University
Dr. Prajwalasimha S N
Ph.D., Postdoc. (NewRIIS)
Associate Professor
Department of Computer Science and Engineering (Cyber Security)
School of Engineering, Dayananda Sagar University
TTEH LAB
School of Engineering
Dayananda Sagar University
📍 Bangalore – 562112, Karnataka, India
