This project focuses on detecting fraudulent activities using machine learning techniques. It includes data preprocessing, feature engineering, model training, and evaluation. The goal is to build a robust system that can identify fraudulent transactions or behaviors with high accuracy.
- Data Preprocessing: Handles missing values, outliers, and categorical variables.
- Feature Engineering: Creates meaningful features to improve model performance.
- Model Training: Implements various machine learning algorithms (e.g., Logistic Regression, Random Forest, XGBoost).
- Model Evaluation: Evaluates models using metrics like accuracy, precision, recall, and F1-score.
- Deployment: Includes a script to deploy the trained model for real-time predictions.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/lijoks/Fraud_detection.git
-
Navigate to the project directory: cd Fraud_detection
-
Create a virtual environment and activate it: python -m venv fraud_detection_env source fraud_detection_env/bin/activate # On Windows, use
fraud_detection_env\Scripts\activate -
Install the required dependencies: pip install -r requirements.txt
Dataset
The dataset used in this project is Kaggle's Credit Card Fraud Detection Dataset. It contains transactions made by credit cards in September 2013 by European cardholders. The dataset is highly imbalanced, with only 0.172% of transactions being fraudulent.
Acknowledgments
Kaggle for providing the dataset. The open-source community for libraries like scikit-learn, pandas, and numpy.