Predictive Model to Classify Profitable Borrowers

Note: If GitHub gives an error while opening Jupyter notebooks, click on this link and paste the notebook link.

Project Objective

The bank is facing significant financial losses due to loan defaults, which directly impacts profitability and risk management. To address this issue, the bank aims to develop a predictive model that can accurately identify borrowers who are likely to fully repay their loans versus those who are at risk of defaulting (Charged Off). By leveraging comprehensive data on loan applications, borrower demographics, and historical payment behavior, this project seeks to enhance the bank's loan approval process, minimize default rates, and optimize lending decisions.

Project Goal

To improve the bank's overall financial health and customer satisfaction through data-driven insights and predictive analytics.

Build a robust model that will predict whether the borrower will be profitable for the bank. We have a dataset that consists of:

Borrower’s identification data
Borrower’s loan data
Borrower’s demographic data
Borrower’s verification data
Date/ time data
Borrower’s transaction data
Borrower’s hardship data
Borrower’s debt settlement data
Borrower’s application data

Key Steps to Achieve the Objective:

1. Data Quality Analysis and Data Quality Report

The first step is to assess the data quality and generate a comprehensive Data Quality report. This report includes:

Missing value %
Number of Unique values
Data Types
Descriptive statistical measures
Type of distribution

Data Quality report and Data Cleaning and feature engineering report are generated. The script to generate this report is here. The data_quality_report.xlsx can be downloaded from here, and the raw descriptive statistical file can be downloaded from here.

2. Exploratory Data Analysis and Feature Engineering

Exploratory Data Analysis (EDA) focuses on handling missing values and outliers to prepare the dataset for modeling. The script to perform EDA is here. Based on the raw descriptive statistical files generated in the previous step and EDA using Python, the Data Cleaning and feature engineering reports are generated. You can download those reports from here.

Feature Engineering:

You can find the Data Preprocessing script here. Based on the Data Cleaning and feature engineering reports, data preprocessing is performed to:

Drop irrelevant columns
Impute missing values
Transform feature distribution
Convert numerical variables to categorical variables using percentile distribution, binning, etc.
Derive new features from existing features

3. Segmentation

Segmentation involves dividing the borrowers based on various characteristics to identify patterns and group-specific behaviors. The entire script can be found here.

Steps for Segmentation:

Data preparation for clustering:
- Create Dummy Variables
- Scale the numerical variables
- Feature elimination using variance threshold method and using correlation matrix
Validate if dataset is relevant for clustering using Hopkin's test
Export prepared data so that it can be used for classification model as well: processed_data_for_models.csv
KMeans++ iteration for 3 different kinds of data:
- processed_data_for_models
- Only numerical variables from processed_data_for_models, dropping categorical and dummy variables
- Reduced dimensions: Dimensions of the processed_data_for_models are reduced using Principal Component Analysis
For each of the 3 datasets:
- Determine optimum number of clusters using the elbow method
- Generate silhouette scores for optimum_k-1, optimum_k, and optimum_k+1 number of clusters
Export segmented data:
- segmented_processed_data.csv: Based on the optimum cluster number for our problem statement and best silhouette score, the final KMeans++ model is created. Segments are assigned to each data point and the final dataset is exported to drive.

Next Steps:

Classification based on selected features
Model Monitoring framework

Skills Demonstrated

Technical Skills: Data Preprocessing, Feature Engineering, Clustering (KMeans++, PCA), Classification Modeling, Model Evaluation, Model Monitoring, Python, Jupyter Notebooks, Data Visualization.
Non-Technical Skills: Analytical Thinking, Problem Solving, Communication, Project Management.

Important Python Packages Used

Pandas: Data manipulation and analysis
NumPy: Numerical computing
scikit-learn: Machine learning algorithms and tools
Seaborn & Matplotlib: Data visualization
Joblib: Parallel processing
Scipy: Scientific computing

Author

Souvik Ganguly

Connect with me: If you would like to connect with me, feel free to reach out via LinkedIn or email me at souvik.ganguly.ds@gmail.com.

For more details, refer to the project presentation here.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data Cleaning and feature engineering report		Data Cleaning and feature engineering report
Data Quality Report		Data Quality Report
Images		Images
Main_notebook_version_control		Main_notebook_version_control
Output Files		Output Files
1.1_dq_report.py		1.1_dq_report.py
1.2_analysis_functions_columns.py		1.2_analysis_functions_columns.py
1_DQ_checks_and_Descriptive_Statisticalal_reports.ipynb		1_DQ_checks_and_Descriptive_Statisticalal_reports.ipynb
2_Exploratory_Data_Analysis.ipynb		2_Exploratory_Data_Analysis.ipynb
3_Data_preprocessing.ipynb		3_Data_preprocessing.ipynb
4_Borrower_Segmentation.ipynb		4_Borrower_Segmentation.ipynb
LCDataDictionary_2018.xlsx		LCDataDictionary_2018.xlsx
LICENSE		LICENSE
Predictive model to classify profitable borrowers_v0.2.pdf		Predictive model to classify profitable borrowers_v0.2.pdf
Predictive model to classify profitable borrowers_v0.2.pptx		Predictive model to classify profitable borrowers_v0.2.pptx
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Model to Classify Profitable Borrowers

Project Objective

Project Goal

Key Steps to Achieve the Objective:

1. Data Quality Analysis and Data Quality Report

2. Exploratory Data Analysis and Feature Engineering

Feature Engineering:

3. Segmentation

Steps for Segmentation:

Next Steps:

Skills Demonstrated

Important Python Packages Used

Author

Souvik Ganguly

About

Releases

Packages

Languages

License

ds-souvik/Predictive-model-to-classify-profitable-borrowers

Folders and files

Latest commit

History

Repository files navigation

Predictive Model to Classify Profitable Borrowers

Project Objective

Project Goal

Key Steps to Achieve the Objective:

1. Data Quality Analysis and Data Quality Report

2. Exploratory Data Analysis and Feature Engineering

Feature Engineering:

3. Segmentation

Steps for Segmentation:

Next Steps:

Skills Demonstrated

Important Python Packages Used

Author

Souvik Ganguly

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages