This project is a fraud detection system for government schemes. It uses a RandomForestClassifier to detect potential fraud based on applicant data and serves predictions via a FastAPI backend.
data/: Contains the synthetic dataset (fraud_dataset.csv).models/: Stores the trained machine learning model (fraud_model.pkl).backend/: Contains the FastAPI application (main.py).frontend/: Directory reserved for the frontend web application (currently empty as requested).utils/: Contains utility scripts:generate_data.py: Script to generate the 300-row synthetic dataset.train_model.py: Script to train theRandomForestClassifieron the generated data.
Make sure you have Python installed. Install the required packages:
pip install -r requirements.txtRun the data generation script to create a synthetic dataset of 300 scheme applicants:
python utils/generate_data.pyThis will output data/fraud_dataset.csv.
Train the fraud detection model:
python utils/train_model.pyThis will train the model and save it to models/fraud_model.pkl.
To use the AI explanation feature, create a .env file from the example:
cp .env.example .envOpen .env and add your OpenAI API key.
Run the FastAPI application using Uvicorn:
uvicorn backend.main:app --reloadThe API will be accessible at: http://127.0.0.1:8000.
Run the Streamlit UI to get a user-friendly dashboard:
streamlit run frontend/app.pyThe dashboard will open automatically in your browser.
GET /— Health check endpoint.POST /predict— Used to check an applicant for fraud. Example payload:{ "income": 25000, "family_size": 4, "location": "Delhi", "scheme_type": "PM_Kisan", "bank_account": "ACC999", "phone_number": "9998887776" }POST /explain— Submits a prediction result to generate a human-readable AI explanation. Example payload:{ "income": 95000, "family_size": 1, "location": "Delhi", "scheme_type": "PM_Kisan", "fraud": 1, "confidence": 87.5, "anomaly_score": -0.12, "graph_cluster_size": 8, "graph_connected_fraud_rate": 75.0 }