The Fraud Detection in Financial Transactions project is aimed at addressing the increasing challenge of fraudulent transactions for "FinTech Innovations." The project implements a robust system that analyzes transactional data, customer information, and external data to detect suspicious activities in near real-time, with a focus on minimizing false alerts.
The key objectives of the project include:
-
API Development: Creation of APIs for transaction data, customer data, and external data to facilitate seamless access.
-
Data Collection and Integration: Utilizing APIs to collect transactional, customer, and external data and ensuring data cleanliness and relevance.
-
Storage and Data Management with Hive: Designing and implementing Hive tables for efficient storage and management of transaction, customer, and external data.
-
Rule-Based Fraud Detection System: Development of HiveQL queries for rule-based fraud detection, including the identification of unusually high transactions, high-frequency transactions, and transactions involving blacklisted customers.
-
Deployment: Implementation of an Airflow DAG for orchestration of data collection, processing, and alerting, along with CI/CD integration using GitHub Actions.
This script generates synthetic data for transactions, customers, and external data, implementing APIs for data access and saving the generated data in JSON files.
Flask APIs for transactions, customers, and external data are implemented in this script. It loads data from JSON files for API responses.
This script fetches data from APIs using Python requests, connects to Hive, creates tables for transactions, customers, blacklist, and external info, and loads data into Hive tables.
Connects to Hive for fraud detection queries using HiveQL, executing queries to identify transactions with unusually high amounts, transactions involving blacklisted customers, and transactions from unusual locations.
Airflow DAG that orchestrates the entire workflow:
- Task 1 - Data Generation: Generates synthetic data using
data_generator.py
. - Task 2 - API: Launches the Flask API server using
api.py
. - Task 3 - Load Data: Executes a Jupyter Notebook (
load.ipynb
) to fetch and load data into Hive tables. - Task 4 - Fraud Detection: Executes a Jupyter Notebook (
fraud_detection_HiveQL.ipynb
) for fraud detection using HiveQL queries.
- Python
- Flask
- Apache Hive
- Apache Airflow