The purpose of the analysis was to predict credit risk using supervised learning algorithms with various machine learning models
The analysis used six machine learning algorithms including:
- RandomOverSampler: Oversampling
- SMOTE (Synthetic Minority Oversampling Technique): Oversampling
- Cluster Centroids: Undersampling
- SMOTEENN (Synthetic Minority Oversampling Technique - Edited Nearest Neighbor): Oversampling & undersampling combinatorial approach
- Balanced Random Forest Classifier: Reduction bias
- Easy Ensemble AdaBoost Classifier: Reduction bias
All precision, recall, and F1 summary statistics are based on high-risk detection
- Balanced Accuracy Score: 65%
- Precision: 1%
- Recall/Sensitivity: 63%
- F1: 2%
- Balanced Accuracy Score: 65%
- Precision: 1%
- Recall/Sensitivity: 64%
- F1: 2%
- Balanced Accuracy Score: 52%
- Precision: 1%
- Recall/Sensitivity: 69%
- F1: 2%
- Balanced Accuracy Score: 62%
- Precision: 1%
- Recall/Sensitivity: 69%
- F1: 2%
- Balanced Accuracy Score: 79%
- Precision: 4%
- Recall/Sensitivity: 67%
- F1: 7%
- Balanced Accuracy Score: 93%
- Precision: 7%
- Recall/Sensitivity: 91%
- F1: 14%
Ensemble models, including the Balanced Random Forest Classifier and Easy Ensemble Classifier, demonstrated superior recall performance in high risk credit decisions when compared to the other models in the analysis. Despite this superior performance, these models still demonstrated low precision, making them potential liabilities in real-life situations. These models are unrealiable in their intended tasks and I would not recommend them for predicting credit risk