Credit_Risk_Analysis

Introduction

This analysis is to use data preparation, statistical reasoning, and machine learning to test credit risk on loans. Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, I’ll employ different techniques to train and evaluate models with unbalanced classes using imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling. Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, I’ll oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, I’ll use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm. Next, I’ll compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.

Results

Random Over Sampler

Balanced Accuracy Score: 0.6249984891886339
Precision(avg/total): 0.99
Recall(avg/total): 0.65

SMOTE

Balanced Accuracy Score: 0.6512584051472337
Precision(avg/total): 0.99
Recall(avg/total): 0.66

Cluster Centroids

Balanced Accuracy Score: 0.5103309281216384
Precision(avg/total): 0.99
Recall(avg/total): 0.44

SMOTEENN

Balanced Accuracy Score: 0.6400726134353378
Precision(avg/total): 0.99
Recall(avg/total): 0.44

Balanced Random Forest Classifier

Balanced Accuracy Score: 0.7885466545953005
Precision(avg/total): 0.99
Recall(avg/total): 0.87

Easy Ensemble Classifier

Balanced Accuracy Score: 0.9316600714093861
Precision(avg/total): 0.99
Recall(avg/total): 0.94

Summary

To summarize this analysis, using machine learning to accurately predict credit is high risk is not precise enough to be used. The Easy Ensemble Classifier had the highest scores and accuracy out of the 6-machine learning libraries, but it stilled had low precision and f1 score for high-risk which would cause a lot of low-risk credits to be falsely detected as high risk. I would not recommend any of these machine learning libraries because there still too much of a risk with not enough accuracy in predicting which credit would be high risk.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Introduction

Results

Random Over Sampler

SMOTE

Cluster Centroids

SMOTEENN

Balanced Random Forest Classifier

Easy Ensemble Classifier

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

breeProgram/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Introduction

Results

Random Over Sampler

SMOTE

Cluster Centroids

SMOTEENN

Balanced Random Forest Classifier

Easy Ensemble Classifier

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages