Skip to content

Specter842/ChainGuard

Repository files navigation

Anti-Money Laundering System using Blockchain & Machine Learning

A full-stack AML detection platform that combines a trained machine learning fraud classifier with an Ethereum smart contract to flag and immutably log suspicious financial transactions. Built on Django with Web3.py integration for on-chain interaction.


Overview

This system mirrors how modern financial institutions are beginning to augment compliance operations with AI. It combines two independent detection mechanisms:

  1. ML-based fraud classifier — A Random Forest model trained on real Ethereum transaction data that scores wallet behavior and predicts fraudulent activity
  2. On-chain smart contract — A Solidity contract deployed to the Ethereum network that monitors live transfers and emits SuspiciousTransaction events when threshold violations are detected

Both layers feed into a Django backend that serves predictions and maintains a local compliance record.


Architecture

Ethereum Network
      │
      ▼
AntiMoneyLaundering.sol ──── SuspiciousTransaction events
      │
      ▼
Web3.py Integration (Django)
      │
      ├── ML Inference Layer (finalized_model.sav)
      │       └── Random Forest Classifier
      │               └── Trained on Ethereum Fraud Detection Dataset
      │
      └── Django REST API
              └── SQLite / PostgreSQL

Features

  • Random Forest fraud classifier trained on 10,000+ Ethereum wallet transactions with feature engineering, outlier removal, and correlation analysis
  • Solidity smart contract with deposit, withdraw, and transfer functions; emits on-chain events for transactions exceeding a configurable ETH threshold
  • Smart contract compilation and deployment pipeline using py-solc-x and Web3.py
  • Django backend serving ML predictions via REST API
  • Model comparison — evaluated Logistic Regression, SVM (RBF kernel with GridSearchCV), and Random Forest; Random Forest selected as final model based on superior recall and F1 on fraud class
  • PCA experimentation — dimensionality reduction tested against full feature set; full features retained for better performance
  • ROC curve analysis for threshold tuning

Tech Stack

Layer Technology
ML & Data Python, scikit-learn, Pandas, NumPy, Matplotlib, Seaborn, Plotly
Blockchain Solidity, Web3.py, py-solc-x, Infura
Backend Django 4.1, Django REST Framework, Web3.py
Database SQLite (dev), PostgreSQL-ready
Model Persistence Pickle

ML Pipeline

Dataset

Preprocessing

  • Dropped zero-variance features and highly correlated feature pairs (threshold > 0.9)
  • Mean imputation for numeric nulls; mode imputation for categorical nulls
  • Label encoding for ERC20 token type categorical features
  • MinMax normalization (0–1 range) applied to train/test splits independently
  • Outlier removal on extreme values in Time Diff and Avg min between received tnx

Models Evaluated

Model Notes
Logistic Regression Baseline; lower recall on fraud class
SVM (RBF, C=9) GridSearchCV tuned; slower, marginal gains
Random Forest Selected — best F1 and recall on fraud class

Final Model

  • RandomForestClassifier(n_estimators=25, max_features=5, random_state=101)
  • Persisted to finalized_model.sav via Pickle
  • Loaded at runtime by Django for inference

Smart Contract

File: Anti-Money-Laundering.sol

contract AntiMoneyLaundering {
    address public admin;
    uint public threshold;          // Default: 100 ETH
    mapping(address => uint) public balances;

    function transfer(address to, uint amount) public { ... }

    event SuspiciousTransaction(address from, address to, uint amount);
}

The contract emits a SuspiciousTransaction event whenever a sender's post-transfer balance exceeds the configured threshold — creating an immutable on-chain audit trail that cannot be altered or deleted.

Compiled using py-solc-x (Solidity 0.8.0) and deployed via Web3.py to Infura.


Project Structure

├── model_training.ipynb        # Full ML pipeline: EDA → preprocessing → training → evaluation
├── AML_Smart_Contract.ipynb    # Smart contract compilation and Ethereum deployment
├── Anti-Money-Laundering.sol   # Solidity contract source
├── compiled_code.json          # ABI + bytecode output
├── finalized_model.sav         # Trained Random Forest model (Pickle)
├── manage.py                   # Django entry point
├── requirements.txt            # Python dependencies
└── db.sqlite3                  # Local development database

Setup & Installation

Prerequisites

  • Python 3.9+
  • Node.js (optional, for local Ethereum node)
  • An Infura account for Ethereum RPC access

Install dependencies

pip install -r requirements.txt

Run Django server

python manage.py migrate
python manage.py runserver

Compile and deploy the smart contract

Open AML_Smart_Contract.ipynb and run all cells. Make sure to update the Infura provider URL with your own project key before deploying.

Train the model (optional — pretrained model included)

Open model_training.ipynb and run all cells. The final cell saves finalized_model.sav to disk.


Key Concepts Demonstrated

  • End-to-end ML pipeline — raw data ingestion through to a production-persisted model
  • Blockchain + AI integration — two independent fraud detection mechanisms working in tandem
  • On-chain immutability — suspicious transaction events written permanently to the Ethereum ledger
  • Smart contract development — Solidity authoring, compilation, ABI extraction, and deployment via Python
  • Compliance-aware design — threshold-based flagging mirrors real-world AML regulatory frameworks (FATF, FinCEN)

Dataset Credit

Vagif A. — Ethereum Fraud Detection Dataset on Kaggle.


License

MIT License

About

Ethereum fraud detection using a Random Forest classifier on 10,000+ transactions, integrated with a Solidity smart contract for immutable on-chain audit logging.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors