Credit_Risk_Analysis

Overview of the analysis:

In this analysis we are using machine learning to evaluate credit risk based on a series of data provided. We rely on multiple methods to produce our models. We address issues of oversampling and under sampling to ensure that we get our most accurate results. For this analysis we find that having a precise search is a safer bet as a bank because having too many defaults could lead to bankruptcy of the bank if too many bad loans are given out. We used six different under sample and oversampling algorithms to reduce as much bias as possible. We used RandomOverSampler and SMOTE to oversample the data, and ClusterCentroids to undersample the data. SMOTEENN was then later used to combine both. To combat bias we used BalancedRandomForestClassifier and EastEnsembleClassifier.

Results:

RandomOverSampler:

accuracy score: 0.64
precision score: 1.0
recall score: 0.66

SMOTE:

accuracy score: 0.615
precision score: 1.0
recall score: 0.59

Undersampling:

accuracy score: 0.52
precision score: 1.0
recall score: 0.43

SMOTEENN:

accuracy score: 0.65
precision score: 1.0
recall score: 0.59

BalancedRandomForestClassifier:

accuracy score: 0.795
precision score: 1.0
recall score: 0.9

EasyEnsembleClassifier:

accuracy score: 0.925
precision score: 1.0
recall score: 0.94

Summary

Based on our results we see that every model has a precision scroe of 1. However our highest scorers were Easy Ensemble Classifier at 0.925 and Balanced Random Forest Classifier at 0.795. Our lowest score was 0.52 when we used our undersampling method which is not a surprise as undersampling trims the data down to where almost both outcomes are equal in the data. The highest outcomes were the ones that are designed to reduce bias. I would air on the side of caution and avoid using Easy Ensemble Classifier at that 0.925 accuracy as it can be harder to interpert and since it's so much higher than the other results. Balanced Random Forest Classifier is easier to interpert and brings a more beliveable result at 0.795 and it still has a very comparable recall score to Easy Ensemble Classifier at 0.9 vs the 0.94. It's also the easiest to interpret.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Images		Images
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

Results:

RandomOverSampler:

SMOTE:

Undersampling:

SMOTEENN:

BalancedRandomForestClassifier:

EasyEnsembleClassifier:

Summary

About

Releases

Packages

Languages

Hamza97anh/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

Results:

RandomOverSampler:

SMOTE:

Undersampling:

SMOTEENN:

BalancedRandomForestClassifier:

EasyEnsembleClassifier:

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages