# Model evaluation in Python - Performance measures and class imbalance
by María Óskarsdóttir

This notebook demonstrates the basics of measuring performance of binary classifiers in Python.  In addition, it shows balancing strategies for imbalanced data.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Part 1: Evaluating performance of binary classifiers.
#### We assume that a binary classification model to predict churn has already been built.  Our goal is to measure the perforance of this model using various measures, such as confusion matrix, accuracy and AUC. 

First, we read in the true classes and predicted probabilties of the target variable on the test set.

In [None]:
TrueTarget = pd.read_csv('true_value.csv')
PredictedProb = pd.read_csv('predictions.csv')
print(TrueTarget.Churn.value_counts())
PredictedProb.head(10)

Next we use a cut-off of 0.5 to determine the predicted class of the target variable. This variable is called PredictetTarget.

In [None]:
PredictedTarget=(PredictedProb>0.5)+0
print(PredictedTarget.Churn_prob.value_counts())

We import the libraries we need for the model evaluation. They come from sklearn.metrics.

In [None]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score

We generate and inspect the confusion matrix.

In [None]:
CM=confusion_matrix(TrueTarget,PredictedTarget)
print(CM)

The above confusion matrix shows us that 937 non-churners are correctly classified as non-churners and that 204 churners are correctly classified as churners.  
We can compute the model accuracy manually as follows.

In [None]:
Accuracy=(CM[0,0]+CM[1,1])/CM.sum()
Accuracy

Or using the built in function

In [None]:
Acc=accuracy_score(TrueTarget,PredictedTarget)
print('Accuracy: %.3f' % Acc)

Other metrics:

In [None]:

print('Recall: %.3f'% recall_score(TrueTarget,PredictedTarget))
print('Precision: %.3f'% precision_score(TrueTarget,PredictedTarget))
print('F1-score: %.3f' % f1_score(TrueTarget,PredictedTarget))



Next we look at the AUC performance measure, which is cut-off independent. This time we use the predicted churn probabilty vector, `PredictedProb`

In [None]:
auc = roc_auc_score(TrueTarget,PredictedProb)
print('AUC: %.3f' % auc)

And finally define a function to plot the ROC-curve.

In [None]:
def plot_roc_curve(fpr, tpr):
    plt.plot(fpr, tpr, color='orange', label='ROC')
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend()
    plt.show()

fpr, tpr, thresholds = roc_curve(TrueTarget,PredictedProb)
plot_roc_curve(fpr, tpr)

## Part 2:  Balancing data
In this part we try some techniques to rebalance an unbalanced data set and investigate the effect it has on the class distribution of the target variable. The function `Counter` lets us see the class distibution.

We start by generating a simple synthetic dataset with three variables and a target with two classes. 

In [None]:
from collections import Counter
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=5000, n_features=3, n_informative=3,
                           n_redundant=0, n_repeated=0, n_classes=2,
                           n_clusters_per_class=1,
                           weights=[0.03, 0.97],
                           class_sep=0.8, random_state=0)
Counter(y)

First we use the random over sampling techinique, using different balancing strategies.

In [None]:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(sampling_strategy='minority')
X_oversampled, y_oversampled = ros.fit_resample(X, y)
Counter(y_oversampled)


In [None]:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=0,sampling_strategy=0.2)
X_oversampled, y_oversampled = ros.fit_resample(X, y)
Counter(y_oversampled)


Then we use the random under sampling technique.

In [None]:
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler()
X_undersampled, y_undersampled = rus.fit_resample(X, y)
Counter(y_undersampled)


In [None]:
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(sampling_strategy=0.7)
X_undersampled, y_undersampled = rus.fit_resample(X, y)
Counter(y_undersampled)


And finally SMOTE.

In [None]:
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_smote, y_smote = sm.fit_resample(X, y)
Counter(y_smote)

In [None]:
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42,sampling_strategy=0.2)
X_smote, y_smote = sm.fit_resample(X, y)
Counter(y_smote)

## Part 3: Putting it together. How do balancing techniques affect model performance? 

We use an imbalanced dataset from the imblearn library to demonstrate that by balancing the dataset, performance can improve.

Note: The classification techinique is not important in this example. We use out-of-the-box random forests, but any other binary classifier could be used.

We start by fetchin the car_eval_4 dataset. 


In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
from imblearn.datasets import fetch_datasets
car = fetch_datasets()['car_eval_4']
X, y = car.data, car.target
y[y==-1]=0
Counter(y) 

We split the data into train and test sets, using stratified sampling to ensure that the distibution of classes in both sets is the same. We will apply balancing techniques to adjust the class balance of the training set and use the test set to evaluate the performance.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)
Counter(y_train)

In [None]:
Counter(y_test)

We define a function `Performance` to evaluate the performance of the model.

In [None]:
def Performance(y_test,X_test):
    print('Accuracy: ',accuracy_score(y_test,clf.predict(X_test)))
    print('Recall: ',recall_score(y_test,clf.predict(X_test)))
    print('Precision: ',precision_score(y_test,clf.predict(X_test),zero_division=0))
    print('AUC: ',roc_auc_score(y_test,clf.predict_proba(X_test)[:,1]))

1. Build a model with the data as-is, without rebalancing. The accuracy is very high, but both recall and precision are 0, which indicates that the model is not capturing the monority class.

In [None]:
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_train, y_train)
Performance(y_test,X_test)

2. Randomly oversample the minorty class, build another model and evaluate the performance.

In [None]:
ros = RandomOverSampler(sampling_strategy=0.4)
X_oversampled, y_oversampled = ros.fit_resample(X_train, y_train)
Counter(y_oversampled)

The new model has the same accuracy, but the recall and precision have improved. The AUC also improves slightly.

In [None]:
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_oversampled, y_oversampled)
Performance(y_test,X_test)