Credit_Risk_Analysis

Challenge Overview

Purpose:

The purpose of this analysis is to predict credit risk with machine learning models by using different techniques to train and evaluate models with unbalanced classes.

Resampling Models
- Over-sampling method: using the RandomOverSampler & SMOTE algorithms
- Under-sampling method: using the ClusterCentroids algorithm
- Combination Sampling (a combinatorial approach of oversampling and undersampling): using the SMOTEENN algorithm
Ensemble Classifiers method
- using BalancedRandomForestClassifier & EasyEnsembleClassifier algorithms

Resources

Software:
- Jupyter Notebook 6.4.6
- Machine Learning
  - Python
    - scikit-learn library
    - imbalanced-learn library
Data source:
- Credit card credit dataset from LendingClub
  - LoanStats_2019Q1.csv

Results

Random Oversampling
- the balanced accuracy score: 66%
- the precision and recall scores (high_risk):
  - The sensitivity/recall (71%) is more than the precision (1%)
  - there are many false positives (predicted high risk but actually low risk)
  - making this a poor algorithm for this dataset

Synthetic Minority Oversampling Technique (SMOTE)
- the balanced accuracy score: 66%
- the precision and recall scores (high_risk):
  - The sensitivity/recall (63%) is more than the precision (1%)
  - there are many false positives (predicted high risk but actually low risk)
  - making this a poor algorithm for this dataset

Cluster Centroid Undersampling
- the balanced accuracy score: 54%
- the precision and recall scores (high_risk):
  - The sensitivity/recall (69%) is more than the precision (1%)
  - there are many false positives (predicted high risk but actually low risk)
  - making this a poor algorithm for this dataset

Combination Sampling With SMOTEENN
- The balanced accuracy score: 64%
- the precision and recall scores (high_risk):
  - The sensitivity/recall (72%) is more than the precision (1%)
  - there are many false positives (predicted high risk but actually low risk)
  - making this a poor algorithm for this dataset

BalancedRandomForestClassifier
- The balanced accuracy score: 79%
- the precision and recall scores (high_risk):
  - The sensitivity/recall (70%) is more than the precision (3%)
  - there are many false positives (predicted high risk but actually low risk)
  - making this a poor algorithm for this dataset
- The total_rec_prncp and total_pymnt of the credit dataset are the more relevant features or columns

EasyEnsembleClassifier
- The balanced accuracy score: 93%
- the precision and recall scores (high_risk):
  - The sensitivity/recall (92%) is more than the precision (9%)
  - there are many false positives (predicted high risk but actually low risk)
  - making this a poor algorithm for this dataset

Summary:

Eventhough the EasyEnsembleClassifier algorithm has the highest balanced accuracy score, 93%, this algorithm and the other algorithms still are not good enough to determine if a credit is high risk because the sensitivity/recall is very high, while the precision is very low. It indicates that there are many false positives (predicted high risk but actually low risk). Clearly, they are not useful algorithms for this dataset. Therefore I would not recommend that they be used to predict credit risk. Maybe a dataset with more obsevations would produce a better result.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Resources		Resources
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Challenge Overview

Purpose:

Resources

Results

Summary:

About

Releases

Packages

Languages

SYDsCorner/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Challenge Overview

Purpose:

Resources

Results

Summary:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages