# 05 - Establishing  baseline performance
____

Before we go any further, we want to make sure that any feature selection or engineering that we do is actually useful!

The best way to do this is to establish a baseline performance for a model that is just fed the data as is. We can also compare our results to a 'dummy classifier' that will just guess the classifications - because we have an unbalanced predictor, we will just predict the most common class ('slight')

In [7]:
import pandas as pd
import numpy as np

from sklearn.metrics import accuracy_score, classification_report
from sklearn.dummy import DummyClassifier


# Baseline Models

## Quick data preparation
* Drop unique indexing columns
* OHE columns and seperate X and y.

In [8]:
train = pd.read_csv('cleaned_training_data.csv', low_memory = False)

severity_dict = {'Slight': 0, 'Serious': 1}

y_train = train.casualty_severity
y_train.replace(severity_dict, inplace=True)


X_train = train.drop(['accident_severity', 'accident_reference', 'vehicle_reference', 'casualty_reference' ], axis=1)
X_train = pd.get_dummies(X_train, prefix_sep='_')

In [9]:
test = pd.read_csv('cleaned_training_data.csv', low_memory = False)

y_test = test.casualty_severity
y_test.replace(severity_dict, inplace=True)


X_test = test.drop(['accident_severity', 'accident_reference', 'vehicle_reference', 'casualty_reference' ], axis=1)
X_test = pd.get_dummies(X_test, prefix_sep='_')

## Dummy Classifier
Let's create a "dummy classifier" that makes predictions without any input features. This will be our simple baseline to compare our more complex classifiers.

Here, we are predicting everything to have the 'majority' label - 0. Because the classes are imbalanced (the majority of data has a true label of 0) this dummy classfier achieves a high accuracy of __72%__, i.e. is the same % as the majority label in the dataset. However, this is clearly not actually a useful predictive model.

In [10]:
dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_freq_pred = dummy_clf.fit(X_train, y_train).predict(X_test)

print("Accuracy (Most Frequent Class Dummy Classifier):", np.round(accuracy_score(y_test, dummy_freq_pred),2))

Accuracy (Most Frequent Class Dummy Classifier): 0.72


We can use the "stratified" strategy to generate random predictions based on the class distribution in the dataset. This means the classifier is not biased towards the majority. Now we have an accuracy of __60%__

In [11]:
dummy_clf = DummyClassifier(strategy="stratified")
dummy_strat_pred = dummy_clf.fit(X_train, y_train).predict(X_test)

print("Accuracy (Stratified Class Dummy Classifier):", np.round(accuracy_score(y_test, dummy_strat_pred),2))



Accuracy (Stratified Class Dummy Classifier): 0.59


This highlights the importance of not evaluating our models based on accuracy alone! 'recall' gives us the ratio of correctly predicted Serious casualties out of all of the Serious casualties in the dataset. 'precision' will tell us how many of our Serious predictions were actually correct.

As we would like to be able to accurately pick out which factors are contributing to serious collisions, we would like to balance both our precision (how many of our serious predictions were correct) and our recall (how well did we classify the serious collisions in the dataset). Let's therefore focus on the F1 score.

The F1 score is the harmonic mean of the preicion and recall, taking into account both false positives and false negatives. THis is particularly imporant considering the imbalanced classes in our dataset where accuracy will not really give us any information about our predictive power.

We should also investigate the area-under-the-curve of the precision-recall curve.


In [12]:
print("\nClassification Report (Most Frequent Class Dummy Classifier):")
print(classification_report(y_test, dummy_freq_pred))

print("\nClassification Report (Stratified Class Dummy Classifier):")
print(classification_report(y_test, dummy_strat_pred))


Classification Report (Most Frequent Class Dummy Classifier):
              precision    recall  f1-score   support

           0       0.72      1.00      0.84      8173
           1       0.00      0.00      0.00      3186

    accuracy                           0.72     11359
   macro avg       0.36      0.50      0.42     11359
weighted avg       0.52      0.72      0.60     11359


Classification Report (Stratified Class Dummy Classifier):
              precision    recall  f1-score   support

           0       0.72      0.72      0.72      8173
           1       0.27      0.27      0.27      3186

    accuracy                           0.59     11359
   macro avg       0.50      0.50      0.50     11359
weighted avg       0.59      0.59      0.59     11359



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


From our classification report, we can see that a value of 0.28 in the F1 score is the value to try and beat.