This project implements an end-to-end machine learning system for credit risk scoring using the German Credit dataset. It covers data preprocessing, model training, evaluation, explainability, experiment tracking, and deployment via a REST API.
The system compares multiple machine learning models and provides interpretability using SHAP and LIME.
The goal is to predict credit risk:
- 0 → Bad credit risk
- 1 → Good credit risk
The following models are implemented:
- Logistic Regression (baseline model)
- Random Forest
- XGBoost
Model performance is evaluated using:
- Accuracy
- ROC-AUC
- Confusion Matrix
- ROC Curve
Model interpretability is provided using:
- SHAP (global and local explanations)
- LIME (local explanation for individual predictions)
- MLflow for experiment tracking:
- parameters
- metrics
- artifacts
- trained models
- Reproducible training pipeline
- Structured modular codebase
A FastAPI-based inference service is implemented.
POST /predict
{
"status_checking_account": "A11",
"duration_months": 12,
"credit_history": "A34",
"credit_amount": 2500,
"age": 35
}{
"risk_score": 0.73,
"decision": "reject"
}
- Python
- Scikit-learn
- XGBoost
- MLflow
- SHAP
- LIME
- FastAPI
- Pandas
- NumPy
The system has three main components that must be run separately:
This runs model training, evaluation, and MLflow logging.
python main.pymlflow ui --host 127.0.0.1 --port 5000
http://127.0.0.1:5000
uvicorn api_:app --reload
http://127.0.0.1:8000
http://127.0.0.1:8000/docs