# Comparing Results: Synthetic data vs Increse Model Penalty

Now that I tested the same models with two different techniques to handle imbalance, let's see which one performed better.

- Synthetic data for minority class: `classification_report.csv`
- Increase Penalty for positive class (minority): `classification_report2.csv`


## Concerns
Since EDA, I noticed train and test sets aren't similar and have very different behavior on their clients and their features related to risk. For instance, grade A clients have the same interest rate as risky grades (which doesn't make much sense specially when the loan value is similar).

For this, I have couple of hypothesis for why is it happening:
1. We are missing other important information when it comes to risk analysis: age, location, investment account, ocuppation.
2. Train set with suspect data: in my EDA test set have the expected risk x rate relation we would expect in these sorts of scenarios. But **train set** doesn't seems to understand this relationship. And one explenation for it not reflect the real world is because train set data is synthetic generated by some algorithm like K-means.

Since my dataset is not balanced, accuracy might be an unrealiable metric to look. I'll focus on macro avg, since it considers the proportion of each class.

In [1]:
import pandas as pd

In [2]:
# Load results sets
synt = pd.read_csv('classification_report.csv')
ip = pd.read_csv('classification_report2.csv')

# Concat sets into one
results = pd.concat([synt,ip], axis=0)

## Logits

In [3]:
print(results[results['Model'].str.contains('logit')])

     Unnamed: 0  precision  recall  f1-score   support     Model
0             0       0.91    0.57      0.70  17463.00     logit
1             1       0.09    0.44      0.16   1776.00     logit
2      accuracy       0.56    0.56      0.56      0.56     logit
3     macro avg       0.50    0.51      0.43  19239.00     logit
4  weighted avg       0.83    0.56      0.65  19239.00     logit
0             0       0.91    0.52      0.66  17463.00  logit IP
1             1       0.10    0.50      0.16   1776.00  logit IP
2      accuracy       0.51    0.51      0.51      0.51  logit IP
3     macro avg       0.50    0.51      0.41  19239.00  logit IP
4  weighted avg       0.84    0.51      0.61  19239.00  logit IP


Consideting the **macro avg** **f1-score** , the models are not so different in terms of performance. While Logistical Regression with synthetic data was 0.02 point better, both models are far from identifying default clients (the ideal would be f1 = 1 or closer to).

## XGBoosts

In [4]:
print(results[results['Model'].str.contains('xgb')])

     Unnamed: 0  precision  recall  f1-score   support   Model
5             0       0.91    1.00      0.95  17463.00     xgb
6             1       0.11    0.00      0.01   1776.00     xgb
7      accuracy       0.90    0.90      0.90      0.90     xgb
8     macro avg       0.51    0.50      0.48  19239.00     xgb
9  weighted avg       0.83    0.90      0.86  19239.00     xgb
5             0       0.91    0.85      0.88  17463.00  xgb IP
6             1       0.09    0.15      0.11   1776.00  xgb IP
7      accuracy       0.78    0.78      0.78      0.78  xgb IP
8     macro avg       0.50    0.50      0.50  19239.00  xgb IP
9  weighted avg       0.83    0.78      0.81  19239.00  xgb IP


Between XGBoosts, increase penalty was 0.01 point better than synthetic data. Compared to **Logit with Synthetic data** (macro avg f1: 0.43), **XGBoost IP** (macro avg f1: 0.50) had better results, and was able to classify both classes (default = 1, and non-default = 0) better than previous models.

## Light GBM

In [5]:
print(results[results['Model'].str.lower().str.contains('light')])

      Unnamed: 0  precision  recall  f1-score   support         Model
10             0       0.91    1.00      0.95  17463.00     Light GBM
11             1       0.25    0.00      0.00   1776.00     Light GBM
12      accuracy       0.91    0.91      0.91      0.91     Light GBM
13     macro avg       0.58    0.50      0.48  19239.00     Light GBM
14  weighted avg       0.85    0.91      0.86  19239.00     Light GBM
10             0       0.91    0.75      0.82  17463.00  Light GBM IP
11             1       0.10    0.27      0.15   1776.00  Light GBM IP
12      accuracy       0.71    0.71      0.71      0.71  Light GBM IP
13     macro avg       0.50    0.51      0.48  19239.00  Light GBM IP
14  weighted avg       0.84    0.71      0.76  19239.00  Light GBM IP


While XGBoost and **Light GBM** have a similar use of gradients on their algorithms, their techniques differ and we see the impact on the results. **Light GBM** will focus on maximizing the leaf in each decision tree. This allows to get the minimum error per branch, but has a high chance of overfitting.

Between the both imbalance techniques, both had the same results (macro avg f1: 0.48) and it's very close to **XGBoost IP** (macro avg f1: 0.50).

## SVM

In [6]:
print(results[results['Model'].str.lower().str.contains('svm')])

      Unnamed: 0  precision  recall  f1-score   support   Model
15             0       0.91    0.36      0.51  17463.00     SVM
16             1       0.09    0.64      0.16   1776.00     SVM
17      accuracy       0.38    0.38      0.38      0.38     SVM
18     macro avg       0.50    0.50      0.34  19239.00     SVM
19  weighted avg       0.83    0.38      0.48  19239.00     SVM
15             0       0.91    0.64      0.75  17463.00  SVM IP
16             1       0.10    0.38      0.15   1776.00  SVM IP
17      accuracy       0.61    0.61      0.61      0.61  SVM IP
18     macro avg       0.50    0.51      0.45  19239.00  SVM IP
19  weighted avg       0.83    0.61      0.70  19239.00  SVM IP


I was hoping that non-linearity would be the answer for a better model, but seems I was wrong. Non-linearity doesn't seems to be better than a simple linear regression (Logit).

Our winning model still **XGBoost IP**

## Artifical Neural Networks

### Stochastic Gradient Descent (SGD)

In [7]:
print(results[results['Model'].str.lower().str.contains('sgd')], '\n')
print(results[results['Model'].str.lower().str.contains('xgb ip')])

      Unnamed: 0  precision  recall  f1-score   support       Model
20             0       0.90    0.02      0.04  17463.00     ANN SGD
21             1       0.09    0.98      0.17   1776.00     ANN SGD
22      accuracy       0.11    0.11      0.11      0.11     ANN SGD
23     macro avg       0.49    0.50      0.11  19239.00     ANN SGD
24  weighted avg       0.82    0.11      0.05  19239.00     ANN SGD
20             0       0.91    0.00      0.01  17463.00  ANN SGD IP
21             1       0.09    1.00      0.17   1776.00  ANN SGD IP
22      accuracy       0.09    0.09      0.09      0.09  ANN SGD IP
23     macro avg       0.50    0.50      0.09  19239.00  ANN SGD IP
24  weighted avg       0.84    0.09      0.02  19239.00  ANN SGD IP 

     Unnamed: 0  precision  recall  f1-score   support   Model
5             0       0.91    0.85      0.88  17463.00  xgb IP
6             1       0.09    0.15      0.11   1776.00  xgb IP
7      accuracy       0.78    0.78      0.78      0.78  xgb I

With a **ANN SGD** didn't perform well. We can see f1-scores were under 0.50, and we read this as the model has trouble classifying default clients.

### ADAM

In [8]:
print(results[results['Model'].str.lower().str.contains('adam')], '\n')
print(results[results['Model'].str.lower().str.contains('xgb ip')])

      Unnamed: 0  precision  recall  f1-score   support        Model
25             0       0.91    1.00      0.95  17463.00     ANN ADAM
26             1       0.00    0.00      0.00   1776.00     ANN ADAM
27      accuracy       0.91    0.91      0.91      0.91     ANN ADAM
28     macro avg       0.45    0.50      0.48  19239.00     ANN ADAM
29  weighted avg       0.82    0.91      0.86  19239.00     ANN ADAM
25             0       0.91    1.00      0.95  17463.00  ANN ADAM IP
26             1       0.00    0.00      0.00   1776.00  ANN ADAM IP
27      accuracy       0.91    0.91      0.91      0.91  ANN ADAM IP
28     macro avg       0.45    0.50      0.48  19239.00  ANN ADAM IP
29  weighted avg       0.82    0.91      0.86  19239.00  ANN ADAM IP 

     Unnamed: 0  precision  recall  f1-score   support   Model
5             0       0.91    0.85      0.88  17463.00  xgb IP
6             1       0.09    0.15      0.11   1776.00  xgb IP
7      accuracy       0.78    0.78      0.78      

**ANN Adam** performed a little better (macro avg f1-score: 0.48) regardless the imbalanced method used.

Both models also got close to XGBoost results (macro avg f1-score: 0.50)

## Take aways

Here I tested different models using two different techniques to deal with imbalanced dataset. In general, SMOT or increase model penalty will have different effects depending on the model. For this dataset, seems like increasing penalty had a better effect on XGBoost model than synthetic data.

Another interestinng thing is the result of the models itself. While it was quite difficult to find a good model, the **XGBoost IP** (macro avg f1: 0.50) presented the best option to predict default clients even with all the differences and weird behavior I found on EDA. It is important to notice that even though the model display the best result between the models, **default clients** are still hard to predict..