credit-defaults-forecasting

One of the most simple and popular consumer finance products is an Unsecured Personal Loan. Competition in the market is strong and margins are slim so the product has to be well prepared. To be able to offer the right price for the customer The Lender has to know what is the cost of lending and a big part of that cost comes from non-performing loans. The purpose of this notebook is to build a contemporary credit scoring model to forecast credit defaults for unsecured lending, by employing machine learning techniques.

Installation:

To run this notebook interactively:

Download this repository in a zip file by clicking on this link or execute this from the terminal: git clone https://github.com/Mieczmik/credit-defaults-forecasting.git
Install virtualenv.
Navigate to the directory where you unzipped or cloned the repo and create a virtual environment with virtualenv env.
Activate the environment with source env/bin/activate
Install the required dependencies with pip install -r requirements.txt.
Execute ipython notebook from the command line or terminal.
Click on credit-defaults-forecasting.ipynb on the IPython Notebook dashboard and enjoy!
When you're done deactivate the virtual environment with deactivate.

The analysis was made in the following steps:

Data Preprocessing

In order to avoid overfitting models (without feature selection xgboost classifier was overfitted), achieve greater computational efficiency, reduce the occupied memory space, eliminate unnecessary and duplicate information, one of the feature selection methods was used: SelectFromModel. In addition resample dataset (generate synthetic samples with SMOTE and undersampling data with RandomUnderSampler) was used to combat imbalanced training.

Importing Data with Pandas
Data Review using pandas-profiling library
Cleaning Data
Creating Pipelines for Numerical and Categorical Data assembled from
- Numeric/Categorical Column Selector
- Missing Data Replacement has benn done with SimpleImputer
- Data Standarization with StandardScaler
- Join both pipelines using ColumnTransformer
- Resampling dataset: generate synthetic samples with SMOTE and undersampling data with RandomUnderSampler,
- Feature Selection with SelectFromModel

Data Analysis

Griding solutions with cross-validation (Stratified k-fold) together with F1-score for imbalanced data and Pipeline assembled with preprocessing pipeline and the following models:

Linear Support Vector Classifier
Logistic Regression
k-Nearest Neighbors
Decission Tree Classifier
Random Forest
Bagging Classifier
Extra Trees Classifier
Adaptive Boosting (AdaBoost)
Gradient Boosting
Extreme Gradient Boosting (xgboost)

Summary

Feature importance
Plotting results
Export results

Conclusions

The solution work with a pretty good total accuracy of 88% and f1-score of 76% for Adaboost Classifier,
Recall-score, which represents the ratio of people expressed as a percentage that the algorithm predicted would default to the number of people who defaulted was 76%,
Trees algorithms such as AdaBoost and xgboost achieved the best results. One reason is the lower sensitivity to outliers.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
datasets		datasets
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
credit-defaults-forecasting.ipynb		credit-defaults-forecasting.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

credit-defaults-forecasting

Installation:

The analysis was made in the following steps:

Data Preprocessing

Data Analysis

Summary

Conclusions

About

Releases

Packages

Languages

Mieczmik/credit-defaults-forecasting

Folders and files

Latest commit

History

Repository files navigation

credit-defaults-forecasting

Installation:

The analysis was made in the following steps:

Data Preprocessing

Data Analysis

Summary

Conclusions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages