Credit risk refers to the potential loss a lender may face when a borrower fails to repay a loan or meet contractual obligations. The goal of a credit risk assessment is to determine if potential borrowers are creditworthy, have the means to repay their debts, and to minimize credit risk. This project involves building a classification model for default prediction using LightGBM, optimizing hyperparameters with Hyperopt, and using SHAP for model explainability.
The aim of this project is to predict loan defaulters and reduce the risk of financial loss by analyzing credit history, employment, and demographic data.
The dataset includes information on 143,727 borrowers, including attributes such as employment type, work experience, income, dependents, total loans, and total payments.
- Language:
Python
- Libraries:
pandas
,numpy
,matplotlib
,seaborn
,scikit-learn
,lightgbm
,hyperopt
,shap
- Data Reading
- Data Processing
- Drop Columns
- Split Data
- Define Label
- Roll Rate Analysis
- Window Roll Analysis
- Feature Engineering
- Label
- % Amount Paid as interest in past Loan Repayment
- % of Loans defaulted in the last 2 years
- Exploratory Data Analysis (EDA)
- Univariate Analysis
- Numerical Summary: Min, Max, Mean, Median, etc.
- Categorical Summary: Top, Unique, Count, etc.
- Bivariate Analysis
- Correlation Plot
- Box Plots
- Univariate Analysis
- Target Encoding
- Feature Selection
- Random Forest
- Decision Tree
- ML Model Development
- LightGBM
- Hyperparameter Tuning using Hyperopt
- Model Evaluation
- ROC AUC
- PRAUC
- Score Distribution
- Feature Importance
- Split and Gain
- SHAP
- Class Rate Curve and Right Threshold
input
: Contains the raw data for analysis, in this case,credit_risk_data.csv
.documents
: Contains supporting learning material.lib
: A reference folder containing the original iPython notebook used in lectures.ml_pipeline
: Contains functions in various Python files for processing the data and training the model.output
: The folder where the trained model is saved.engine.py
: A script that calls the functions in theml_pipeline
to run the entire process and save the model.requirements.txt
: Lists required libraries and versions.readme.md
: Instructions for running the code.