A machine learning project for detecting fraudulent transactions using advanced data validation and model training pipelines.
This project implements a comprehensive fraud detection system that includes:
- Data Validation Layer: Automated validation, cleaning, and quality checks for financial transaction data
- Model Training: Baseline machine learning models for fraud classification
- Reporting: Detailed validation reports and model performance metrics
- Testing: Comprehensive test suite for data validation components
- Python 3.8+
- Required packages listed in
requirements.txt
- Clone the repository:
git clone https://github.com/dshak1/hackML.git
cd hackML- Install dependencies:
pip install -r requirements.txtPlace your fraud detection datasets in the fraud/ directory:
fraud/train.csv- Training data with target columnfraud/test.csv- Test data without target column
Run data validation to check data quality and generate reports:
python scripts/validate_data.py \
--train fraud/train.csv \
--test fraud/test.csv \
--out_dir runs \
--mode warnTrain a baseline fraud detection model:
python scripts/train_model.py \
--train fraud/train.csv \
--test fraud/test.csv \
--out_dir runs- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the full test suite
- Submit a pull request