This analysis is to use data preparation, statistical reasoning, and machine learning to test credit risk on loans. Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, I’ll employ different techniques to train and evaluate models with unbalanced classes using imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling. Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, I’ll oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, I’ll use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm. Next, I’ll compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.
- Balanced Accuracy Score: 0.6249984891886339
- Precision(avg/total): 0.99
- Recall(avg/total): 0.65
- Balanced Accuracy Score: 0.6512584051472337
- Precision(avg/total): 0.99
- Recall(avg/total): 0.66
- Balanced Accuracy Score: 0.5103309281216384
- Precision(avg/total): 0.99
- Recall(avg/total): 0.44
- Balanced Accuracy Score: 0.6400726134353378
- Precision(avg/total): 0.99
- Recall(avg/total): 0.44
- Balanced Accuracy Score: 0.7885466545953005
- Precision(avg/total): 0.99
- Recall(avg/total): 0.87
- Balanced Accuracy Score: 0.9316600714093861
- Precision(avg/total): 0.99
- Recall(avg/total): 0.94
To summarize this analysis, using machine learning to accurately predict credit is high risk is not precise enough to be used. The Easy Ensemble Classifier had the highest scores and accuracy out of the 6-machine learning libraries, but it stilled had low precision and f1 score for high-risk which would cause a lot of low-risk credits to be falsely detected as high risk. I would not recommend any of these machine learning libraries because there still too much of a risk with not enough accuracy in predicting which credit would be high risk.