This project builds a Deep Learning Neural Network model to predict a customer's credit score category (Good / Standard / Poor) based on financial and behavioural attributes.
Credit scoring is used by banks and financial institutions to determine whether a customer is eligible for loans or credit cards. This model analyses customer financial history and predicts their creditworthiness, helping reduce financial risk.
Build a machine learning pipeline that predicts whether a person is creditworthy based on their financial history and demographic details.
Why it matters:
- ✅ Reduce loan defaults
- ✅ Improve credit approval decisions
- ✅ Manage financial risk effectively
- ✅ Provide explainable, confidence-backed predictions
The model uses scikit-learn's MLPClassifier with the following design:
| Component | Details |
|---|---|
| Architecture | 4 hidden layers: 256 → 128 → 64 → 32 neurons |
| Activation | ReLU (non-linearity) |
| Optimizer | Adam with adaptive learning rate |
| Regularisation | L2 (alpha = 0.001 → 0.005 after tuning) |
| Early Stopping | Yes — patience = 20 epochs |
| Batch Size | 64 |
| Validation Split | 10% held out during training |
| Cross-Validation | 2-fold Stratified K-Fold |
Raw CSV Data
│
▼
┌─────────────────┐
│ 1. Load Data │ ── train.csv & test.csv
└────────┬────────┘
│
▼
┌─────────────────┐
│ 2. EDA │ ── Correlation Heatmap, Feature Distributions
└────────┬────────┘
│
▼
┌──────────────────────┐
│ 3. Preprocessing │ ── Fill NaN, Encode, Scale (StandardScaler)
│ + SMOTE (opt.) │ ── Handle class imbalance if available
└────────┬─────────────┘
│
▼
┌──────────────────────┐
│ 4. Train MLP │ ── Deep Neural Network (256→128→64→32)
│ + Cross-Val │ ── 2-Fold Stratified CV
└────────┬─────────────┘
│
▼
┌──────────────────────┐
│ 5. Hyperparameter │ ── Refined alpha=0.005, lr=0.0005
│ Tuning │
└────────┬─────────────┘
│
▼
┌──────────────────────┐
│ 6. Evaluation │ ── Accuracy, ROC-AUC, Confusion Matrix
│ + Plots │ ── Loss Curve, Feature Importance, ROC Curve
└────────┬─────────────┘
│
▼
┌──────────────────────┐
│ 7. Save Artifacts │ ── best_model.pkl, scaler.pkl, metadata.pkl
└────────┬─────────────┘
│
▼
┌──────────────────────┐
│ 8. Predict │ ── final_predictions.csv from test.csv
└──────────────────────┘
| Output File | Description |
|---|---|
correlation_heatmap.png |
Heatmap showing feature correlations |
feature_distributions.png |
Distribution plots for top numeric features |
feature_importance.png |
Permutation importance bar chart (top 20) |
training_loss_curve.png |
Training loss + validation accuracy over epochs |
roc_curve.png |
ROC curve (binary classification) |
final_predictions.csv |
Predicted credit scores for test dataset |
| ID | Customer_ID | Predicted_Score |
|---|---|---|
| 0x160a | CUS_0xd40 | Good |
| 0x160b | CUS_0xd40 | Standard |
| 0x160c | CUS_0xd41 | Poor |
The project includes a Gradio-powered web dashboard (app.py) with:
- 🎛️ Input sliders and dropdowns for all financial features
- 🎯 Real-time credit score prediction (Good / Standard / Poor)
- 📊 Confidence chart — probability breakdown per class
- 📌 Feature importance chart — top 10 most influential features
- 🌙 Dark theme with gradient UI design
| Section | Fields |
|---|---|
| 👤 Personal Info | Name, Age, Monthly Income, Occupation |
| 💳 Credit Behaviour | Credit Mix, Min Amount Paid, Delayed Payments, Days Past Due |
| 📊 Financial Details | Outstanding Debt, Credit Utilization %, Interest Rate, Credit History Age, No. of Credit Inquiries |
CodeAlpha_Credit_Scoring_Model/
│
├── dataset/
│ ├── train.csv # Training data
│ └── test.csv # Test data
│
├── model/ # Auto-generated after training
│ ├── best_model.pkl # Trained MLP model
│ ├── scaler.pkl # StandardScaler
│ ├── label_encoders.pkl # LabelEncoders for categorical features
│ └── metadata.pkl # Feature names, accuracy, importances
│
├── credit_scoring_model.py # Full deep learning training pipeline
├── app.py # Gradio web UI
├── requirements.txt # Python dependencies
├── README.md # Project documentation
│
├── correlation_heatmap.png # EDA output
├── feature_distributions.png # EDA output
├── feature_importance.png # Permutation importance chart
├── training_loss_curve.png # Loss curve during training
├── roc_curve.png # ROC curve (binary)
└── final_predictions.csv # Test set predictions
git clone https://github.com/Venu200723/CodeAlpha_Credit_Scoring_Model.git
cd CodeAlpha_Credit_Scoring_Modelpip install -r requirements.txtpython credit_scoring_model.pyThis will:
- Load and preprocess the dataset
- Train the Deep Neural Network
- Save model artifacts to
model/ - Generate all visualisation plots
- Output
final_predictions.csv
python app.pyOpen your browser at: http://127.0.0.1:7860
| Library | Purpose |
|---|---|
Python 3.8+ |
Core language |
Pandas |
Data manipulation |
NumPy |
Numerical operations |
Scikit-learn |
MLPClassifier, preprocessing, evaluation |
Imbalanced-learn |
SMOTE for class balancing (optional) |
Matplotlib |
Plotting and visualisations |
Seaborn |
Statistical visualisations |
Gradio |
Interactive web UI |
Joblib |
Model serialisation |
- ✅ Accuracy Score
- ✅ ROC-AUC Score (macro-average for multi-class)
- ✅ Confusion Matrix
- ✅ Classification Report (Precision, Recall, F1)
- ✅ 2-Fold Stratified Cross-Validation
- ✅ Permutation Feature Importance
Venu Gopal R
B.Tech – Artificial Intelligence & Data Science
This project is licensed under the MIT License — feel free to use, modify, and distribute.