<img style="float: left;" src="https://d3rt91u8ecpt22.cloudfront.net/assets/learn/logo-exts-0e71782f000e506b332ae30887d6a959dd3a13bcc0d6fb6bb7797c4f1100a537.svg" alt="drawing" width="100"/> 

# Capstone Project - Liberty Mutual Group Fire Peril Loss Cost


https://www.kaggle.com/c/liberty-mutual-fire-peril

***

***
## 6. Model Appraisal: Classification <a name="6"></a>

Before we start the regression models, we first introduce measures that will help us understand the underlying problem of the challenge at hand and how quality and validity of the models can be evaluated.

In this section, we will look at classification metrics. Although this challenge is inherently a regression problem, we also will look at the challenge as a classification problem as the data could be transformed to classify target variables `> 0` as loss events and target values `== 0` as non-events. Hencce, this chapter introduces metrics to measure classification performance.

Firstly, we start with the simple accuracy score, which simply summarises the number of times we got the right prediction. Secondly, we introduce the confusion matrix and the measures derived from it. Lastly, we will elaborate more on Type I and Type II erros and their importance in the context of this project.

### 6.1. Simple Logistic Regression

For illustrative purposes, we take the short and compressed version of the training set. This increases training speed as it is only for iullstrative purposes.

In [121]:
from sklearn.linear_model import LogisticRegression

# separate features from target
d = df_train_dict['short_comp']
X = d.drop(['LogTarget', 'isClaim'], axis=1)
X = pd.get_dummies(X, columns=intersection(nominals,X.columns), prefix=intersection(nominals,X.columns))
y = d['isClaim']

# creat train and test sets
X_tr, X_te, y_tr, y_te = train_test_split(
    X,
    y, 
    train_size=0.8, 
    random_state=0,
    stratify=y
)

logreg = LogisticRegression(max_iter=1000, class_weight='balanced')
logreg.fit(X_tr, y_tr)
y_pred = logreg.predict(X_te)

### 6.2. Baseline Accuracy: Most Frequent Class

With the baseline accuracy we establish an idea of how accurate a purely random model will be in predicting the most frequent class. This gives us a baseline of what we can expect at the very least of a model under scrutiny.

In [122]:
y_te.value_counts()

0    1665
1    166 
Name: isClaim, dtype: int64

Note that we have approximately ten times the amout of non-events as compared to the loss events. In reality, this imbalance is much larger. However, since we work in this example with a heavily resampled datase, where most of the majority class events `(loss == 0)` has been downsampled to a majority-to-minority-class ratio of 10.

In [124]:
pos_pct = y_te.mean()
neg_pct = 1 - y_te.mean()
baseline = max(pos_pct, neg_pct)

print('Percentage of positive values:', round(pos_pct,4))
print('Percentage of negative values:', round(neg_pct,4))
print('-------------------------------------')
print('Most Frequent Baseline:       ', round(baseline,4))

Percentage of positive values: 0.0907
Percentage of negative values: 0.9093
-------------------------------------
Most Frequent Baseline:        0.9093


> **Given the imbalance of the dataset, we would be 90% accurate just by randomly guessing and taking under consideration  the underlying probability distribution of true and false values.**

### 6.3. Simple Accuracy Score

In [139]:
from sklearn.metrics import balanced_accuracy_score
print('Accuracy score:', round(balanced_accuracy_score(y_te, np.zeros_like(y_te)),4))

Accuracy score: 0.5


In [137]:
y_te.value_counts()

0    1665
1    166 
Name: isClaim, dtype: int64

In [140]:
from sklearn.metrics import accuracy_score
print('Accuracy score:', round(accuracy_score(y_te, np.zeros_like(y_te)),4))

Accuracy score: 0.9093


Lets look at the first 25 values:

In [126]:
print('True:', y_te.values[0:25])
print('Pred:', y_pred[0:25])

True: [0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0]
Pred: [0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 0 0 0 1 1 0 0]


In the above output, we observe the following:
- We predicted three true positive value correctly (True positives)
- We predicted 13 true negative values correcly (True negatives)


- **We predicted one true positive values incorrecly as negative**. (False negatives)
- **We predicted eight true positive values incorrecly as positive**. (False positives)

### 6.3. The Confusion Matrix

Now, let's apply the above analysis to the entire set of predictions.

In [127]:
from sklearn.metrics import confusion_matrix
def confusion_matrix_df(y_test, y_pred):
    cm = pd.DataFrame(confusion_matrix(y_te, y_pred), 
                      index=['True Negative', 'True Positive'], 
                      columns=['Predicted Negative', 'Predicted Positive'])
    return cm
confusion_matrix_df(y_te, y_pred)

Unnamed: 0,Predicted Negative,Predicted Positive
True Negative,1149,516
True Positive,76,90


**Basic terminology**

- <font color='green'>**True Positives (TP):** we correctly predicted the loss event 90 times
- **True Negatives (TN):** we correctly predicted no loss event 1149 times</font>


- <font color='red'>**False Positives (FP):** we incorrectly predicted that a loss event occured (a "Type I error") 76 times
- **False Negatives (FN):** we incorrectly predicted that a no loss event occured (a "Type II error") **516 times**</font>

**Note that Type II errors are the ones we should be minimise in an insurence context. False negatives can take an insurance company into financial distress.** Incorrecly predicing no loss occuring will cause the solvency reserves to be lower than actually needed. Thereforen, in the event of an unprediced loss, the insurance company may not have adequate reserves to cover their losses.

### 6.4. Metrics from a confusion matrix

We will now look at several metrics that can be derived from the confusion matrix.

- Classification Accuracy
- Classification Error
- Sensitivity
- Specificity

In [128]:
# save confusion matrix and slice into four pieces
confusion = confusion_matrix(y_te, y_pred)
TP = confusion[1, 1]
TN = confusion[0, 0]
FP = confusion[0, 1]
FN = confusion[1, 0]

#### Classification Accuracy

In [129]:
print('Classification Accuracy: ', (TP + TN) / (TP + TN + FP + FN))
print('Accuracy Score:          ', accuracy_score(y_te, y_pred))

Classification Accuracy:  0.6766794101583834
Accuracy Score:           0.6766794101583834


#### Classification Error (Misclassification Rate)

In [130]:
print('Classification Error:', (FP + FN) / (TP + TN + FP + FN))
print('1 - Accuracy Score:  ', 1 - accuracy_score(y_te, y_pred))

Classification Error: 0.3233205898416166
1 - Accuracy Score:   0.3233205898416166


#### Sensitivity (Recall, True Positive Rate)

> **When the actual value is positive, how often is the prediction correct?**

> It is the ratio of the number of true positives to the sum of the true positive row

In [131]:
print('Sensitivity: ', TP / (TP + FN))

Sensitivity:  0.5421686746987951


In this case, it means that we are approximately 54% accurate that we have predicted a true loss.

#### Specificity

> ** When the actual value is negative, how often is the prediction correct?**

> It is the ratio of true negatives to the total of the true negative row

In [132]:
print('Specificity: ', TN / (TN + FP))

Specificity:  0.69009009009009


This means, that we are approximately 69% accurate in predicting true non-events.

#### Precision 

> ** When a positive value is predicted, how often is the prediction correct?

> It is ratio of true positives to predicted positives column

In [133]:
print('Precision:', TP / (TP + FP))

Precision: 0.1485148514851485


This means, that we can be 15% confident that are prediction of a loss actually is a loss.

### 6.5. Objective in classification

We can now see that we have to

- Maximize Sensitivity
- Maximize Specificity
- Maximize Precision

However, the maximization of all three measures at the same time may be difficult. Hence, the objective will be to balance out  all of the three measures. **Also, specificity (true negative rate) may not be as important as sensitivity (true positive rate) in this context**

><font color='red'>**Hence, in the context of this insurance project, optimise for sensitivity at the expense of lowering specificity.**</font>

### 6.6. Adjusting the classification threshold

In [None]:
def confusion_metrics(y_te, y_pred):
    from sklearn.metrics import confusion_matrix
    confusion = confusion_matrix(y_te, y_pred)
    TP = confusion[1, 1]
    TN = confusion[0, 0]
    FP = confusion[0, 1]
    FN = confusion[1, 0]
    
    return pd.DataFrame({'Accuracy Score': (TP + TN) / (TP + TN + FP + FN), 
                         'Error Rate': (FP + FN) / (TP + TN + FP + FN), 
                         'Sensitivity': (TP) / (TP + FN),
                         'Specificity': (TN) / (TN + FN),
                         'Precision': (TP) / (TP + FP)
                        }, index=[0])

In [None]:
print('Some classifications:              ', logreg.predict(X_te)[5:10])
print('Their probabilities of being true: ', logreg.predict_proba(X_te)[5:10, 1])

In [None]:
# store the predicted probabilities for class 1
y_pred_prob = logreg.predict_proba(X_te)[:, 1]

In [None]:
# allow plots to appear in the notebook
%matplotlib inline
import matplotlib.pyplot as plt

# histogram of predicted probabilities
plt.hist(y_pred_prob, bins=200)
plt.xlim(0, 1)
plt.title('Histogram of predicted probabilities')
plt.xlabel('Predicted probability of a loss event')
plt.ylabel('Frequency');

**Decrease the threshold** for predicting loss events in order to **increase the sensitivity** of the classifier.

In [None]:
from sklearn.preprocessing import binarize

res = pd.DataFrame()

for t in np.arange(0,1,0.01):
    y_pred_adj = binarize([y_pred_prob], round(t,2))[0]
    confusion = confusion_metrics(y_te, y_pred_adj)
    confusion.insert(loc=0, column='Probability Threshold', value=[round(t,2)])
    res = confusion.append(res).sort_values(by='Probability Threshold')

In [None]:
res

In [None]:
plt.plot(res['Probability Threshold'], res['Sensitivity'], color='b', label='Sensitivity (Recall, True Positive Rate)')
plt.plot(res['Probability Threshold'], res['Specificity'], color='g', label='Specificity')
plt.plot(res['Probability Threshold'], res['Precision'], color='r', label='Precision')
plt.xlabel('Probability Threshold')
plt.legend(loc='upper right')

**Conclusion:**

- Threshold of 0.5 is used by default (for binary problems) to convert predicted probabilities into class predictions
- Threshold can be adjusted to increase sensitivity or specificity
- Sensitivity and specificity have an inverse relationship

### 6.7. Classification Report

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_te, y_pred))

### 6.8 Weighted Gini-Loss

In order to compare model performance with the official Kaggle leaderboard, we calculate the normalized, weighted Gini coefficient.

According to the Kaggle website, the normalized, weighted Gini coefficient is calculated as follows:

In [None]:
def weighted_gini(act,pred,weight): 
    df = pd.DataFrame({"act":act,"pred":pred,"weight":weight}) 
    df = df.sort_values('pred',ascending=False) 
    df["random"] = (df.weight / df.weight.sum()).cumsum() 
    total_pos = (df.act * df.weight).sum() 
    df["cumposfound"] = (df.act * df.weight).cumsum() 
    df["lorentz"] = df.cumposfound / total_pos 
    n = df.shape[0] 

    gini = sum(df.lorentz[1:].values * (df.random[:-1])) - sum(df.lorentz[:-1].values * (df.random[1:])) 
    return gini

In [None]:
def normalized_weighted_gini(act,pred,weight):
    return weighted_gini(act,pred,weight) / weighted_gini(act,act,weight)