Credit_Risk_Analysis

Overview of the analysis:

To apply machine learning to solve a real-world challenge: credit card risk.

Purpose

To employ different techniques to train and evaluate models with unbalanced classes since credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans.

Results:

Oversampling

Naive Random Oversampling

In random oversampling, instances of the minority class are randomly selected and added to the training set until the majority and minority classes are balanced.

The accuracy score of this model is around 66%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 70% for high risk loans that is to say model can identify almost 70 % of risky loans but it can only identify about 63% of good ones.

SMOTE Oversampling

In synthetic minority oversampling technique (SMOTE), new instances are interpolated and the size of the minority is increased.

The accuracy score of this model is around 66%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 63% for high risk loans that is to say model can identify almost 63 % of risky loans and it can also identify about 69% of good ones.

Undersampling

Cluster centroid undersampling is akin to SMOTE. The algorithm identifies clusters of the majority class, then generates synthetic data points, called centroids, that are representative of the clusters.

The accuracy score of this model is around 54%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 69 % for high risk loans that is to say model can identify almost 69 % of risky loans but it can only identify about 40% of good ones.

Combination( Over and Under) sampling

SMOTEENN combines the SMOTE and Edited Nearest Neighbors (ENN) algorithms

The accuracy score of this model is around 64%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 72% for high risk loans that is to say model can identify almost 72 % of risky loans but it can only identify about 57% of good ones.

Balanced Random Forest Classifier

A balanced random forest randomly under-samples each boostrap sample.

The accuracy score of this model is around 87%

The precision for high risk loans is low around 3% but is very good at predicting good loans with precision of almost 100%.

Recall is around 70% for high risk loans that is to say model can identify almost 70 % of risky loans and 87% of good loans are identified.

Easy Ensemble AdaBoost Classifier

Ensemble learning is the process of combining multiple models, like decision tree algorithms, to help improve the accuracy and robustness, as well as decrease variance of the model

The accuracy score of this model is around 93%

The precision for high risk loans is low around 9% but is very good at predicting good loans with precision of almost 100%.

Recall is around 92% for high risk loans that is to say model can identify almost 92 % of risky loans and it can also identify about 94% of good ones.

Summary:

Although SMOTE reduces the risk of oversampling but it does not always outperform random oversampling.While resampling can attempt to address imbalance, it does not guarantee better results.Resampling with SMOTEENN did not work miracles, but some of the metrics show an improvement over undersampling.Balanced random forest model have precision of 3% for bad loan applications which is indicative of large number of false negatives.

Recommendation:

EasyEnsembleClassifier model can identify 92% of risky loans and 94% of good loans. It has a precision of almost 100% for good loans and only 9 % for bad loans, that is to say there are lot of false negatives and it failed to notice several good loan applications but still overall it is a better model to use.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
linear_regression_salary		linear_regression_salary
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

Purpose

Results:

Oversampling

Naive Random Oversampling

The accuracy score of this model is around 66%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 70% for high risk loans that is to say model can identify almost 70 % of risky loans but it can only identify about 63% of good ones.

SMOTE Oversampling

The accuracy score of this model is around 66%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 63% for high risk loans that is to say model can identify almost 63 % of risky loans and it can also identify about 69% of good ones.

Undersampling

The accuracy score of this model is around 54%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 69 % for high risk loans that is to say model can identify almost 69 % of risky loans but it can only identify about 40% of good ones.

Combination( Over and Under) sampling

The accuracy score of this model is around 64%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 72% for high risk loans that is to say model can identify almost 72 % of risky loans but it can only identify about 57% of good ones.

Balanced Random Forest Classifier

The accuracy score of this model is around 87%

The precision for high risk loans is low around 3% but is very good at predicting good loans with precision of almost 100%.

Recall is around 70% for high risk loans that is to say model can identify almost 70 % of risky loans and 87% of good loans are identified.

Easy Ensemble AdaBoost Classifier

The accuracy score of this model is around 93%

The precision for high risk loans is low around 9% but is very good at predicting good loans with precision of almost 100%.

Recall is around 92% for high risk loans that is to say model can identify almost 92 % of risky loans and it can also identify about 94% of good ones.

Summary:

Recommendation:

About

Releases

Packages

Languages

Ayesha-da/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

Purpose

Results:

Oversampling

Naive Random Oversampling

The accuracy score of this model is around 66%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 70% for high risk loans that is to say model can identify almost 70 % of risky loans but it can only identify about 63% of good ones.

SMOTE Oversampling

The accuracy score of this model is around 66%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 63% for high risk loans that is to say model can identify almost 63 % of risky loans and it can also identify about 69% of good ones.

Undersampling

The accuracy score of this model is around 54%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 69 % for high risk loans that is to say model can identify almost 69 % of risky loans but it can only identify about 40% of good ones.

Combination( Over and Under) sampling

The accuracy score of this model is around 64%

The precision for high risk loans is very low around 1% but is very good at predicting low risk loans with precision of almost 100%.

Recall is around 72% for high risk loans that is to say model can identify almost 72 % of risky loans but it can only identify about 57% of good ones.

Balanced Random Forest Classifier

The accuracy score of this model is around 87%

The precision for high risk loans is low around 3% but is very good at predicting good loans with precision of almost 100%.

Recall is around 70% for high risk loans that is to say model can identify almost 70 % of risky loans and 87% of good loans are identified.

Easy Ensemble AdaBoost Classifier

The accuracy score of this model is around 93%

The precision for high risk loans is low around 9% but is very good at predicting good loans with precision of almost 100%.

Recall is around 92% for high risk loans that is to say model can identify almost 92 % of risky loans and it can also identify about 94% of good ones.

Summary:

Recommendation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages