Banks and financial institutions lose money when borrowers default on their loans.
Before approving a loan, it is important to assess how likely an applicant is to default so that risk can be managed effectively.
The objective of this project is to predict the Probability of Default (PD) for a loan applicant in a transparent and business-friendly manner.
This project builds a credit risk scorecard to estimate the probability of default using historical loan application data. A WOE-based logistic regression model is used, which is widely applied in the banking industry due to its stability and interpretability.
Instead of directly approving or rejecting a loan, the model outputs a probability of default.
This allows the business to define its own risk thresholds, for example:
- Reject applications with PD above 15%
- Review applications between 15%–20%
- Approve applications below a selected cutoff
The decision threshold is intentionally left to the business.
- Feature engineering using Weight of Evidence (WOE)
- Feature selection using Information Value (IV)
- Monotonic WOE enforcement for key variables such as Credit Score and LTV
- Logistic Regression with hyperparameter tuning
- Model evaluation using AUC, Gini, and KS statistics
This approach ensures stable model behavior and meaningful risk interpretation.
The final model is trained on the following eight features:
- Credit Score
- Loan Tenure
- Loan-to-Value (LTV)
- Number of Years at Present Address
- Loan Amount
- Residence Category
- Income
- EMI
All features are transformed into WOE values before model training and prediction.
- AUC ≈ 0.70
- Gini ≈ 0.33
- KS indicates acceptable separation between defaulters and non-defaulters
The model performs better than random prediction and provides a reliable risk ranking.
A Streamlit web application is developed to demonstrate the model. The application allows users to:
- Enter applicant details
- Convert inputs to WOE internally
- View the estimated probability of default
The app can be accessed here: View the Streamlit App
- Provides a clear probability of default instead of a hard decision
- Supports business-defined risk cutoffs
- Uses an interpretable scorecard approach suitable for regulated environments
- Demonstrates an end-to-end credit risk modeling workflow
- Python
- Pandas, NumPy
- Scikit-learn
- Streamlit