Credit Risk Prediction with LightGBM, Hyperopt and SHAP

Project Overview

Business Context

Credit risk refers to the potential loss a lender may face when a borrower fails to repay a loan or meet contractual obligations. The goal of a credit risk assessment is to determine if potential borrowers are creditworthy, have the means to repay their debts, and to minimize credit risk. This project involves building a classification model for default prediction using LightGBM, optimizing hyperparameters with Hyperopt, and using SHAP for model explainability.

Aim

The aim of this project is to predict loan defaulters and reduce the risk of financial loss by analyzing credit history, employment, and demographic data.

Data Description

The dataset includes information on 143,727 borrowers, including attributes such as employment type, work experience, income, dependents, total loans, and total payments.

Tech Stack

Language: Python
Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, lightgbm, hyperopt, shap

Approach

Data Reading
Data Processing
- Drop Columns
- Split Data
- Define Label
- Roll Rate Analysis
- Window Roll Analysis
Feature Engineering
- Label
- % Amount Paid as interest in past Loan Repayment
- % of Loans defaulted in the last 2 years
Exploratory Data Analysis (EDA)
- Univariate Analysis
  - Numerical Summary: Min, Max, Mean, Median, etc.
  - Categorical Summary: Top, Unique, Count, etc.
- Bivariate Analysis
  - Correlation Plot
  - Box Plots
Target Encoding
Feature Selection
- Random Forest
- Decision Tree
ML Model Development
- LightGBM
- Hyperparameter Tuning using Hyperopt
Model Evaluation
- ROC AUC
- PRAUC
- Score Distribution
Feature Importance
- Split and Gain
- SHAP
Class Rate Curve and Right Threshold

Modular Code Overview

input: Contains the raw data for analysis, in this case, credit_risk_data.csv.
documents: Contains supporting learning material.
lib: A reference folder containing the original iPython notebook used in lectures.
ml_pipeline: Contains functions in various Python files for processing the data and training the model.
output: The folder where the trained model is saved.
engine.py: A script that calls the functions in the ml_pipeline to run the entire process and save the model.
requirements.txt: Lists required libraries and versions.
readme.md: Instructions for running the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Prediction with LightGBM, Hyperopt and SHAP

Project Overview

Business Context

Aim

Data Description

Tech Stack

Approach

Modular Code Overview

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
input		input
lib		lib
ml_pipeline		ml_pipeline
output		output
LICENSE		LICENSE
engine.py		engine.py
readme.md		readme.md
requirements.txt		requirements.txt

License

AjNavneet/Credit-Risk-Prediction-LightGBM-Hyperopt-SHAP

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Prediction with LightGBM, Hyperopt and SHAP

Project Overview

Business Context

Aim

Data Description

Tech Stack

Approach

Modular Code Overview

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages