The following Python codes demonstrate how we can derive the measures of a confusion matrix. The comments embedded in the codes give descriptions to guide the rationale of the programming logic.

In [2]:
# import necessary libraries
import pandas as pd 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.model_selection import train_test_split 
from sklearn import metrics 
import numpy as np 

#specify dataset source
df = pd.read_csv('ChurnFinal.csv') 

# need to convert categorical to numeric for Python ROC and AUC calculations
df_inputs = pd.get_dummies(df[['Gender', 'Age', 'PostalCode', 'Cash', 'CreditCard', 
            'Cheque', 'SinceLastTrx', 'SqrtTotal', 'SqrtMax', 'SqrtMin']]) 
df_label = df['Churn']

# initiate modelling object, and split train and test sets
dtree = DecisionTreeClassifier(criterion = 'entropy', splitter="best", max_depth=5, 
            min_samples_leaf=5, min_samples_split=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(df_inputs, df_label, 
            stratify=df_label, test_size=0.3, random_state=1) 

# train model with decision tree algorithm  
dtree.fit(X_train, y_train) 

# apply the model to predict data
y_predict = dtree.predict(X_test)

# Using metrics' function parameters to derive performance measures
acc = metrics.accuracy_score(y_test, y_predict) 
sens = metrics.recall_score(y_test, y_predict,average='binary', pos_label='yes') 
spec = metrics.recall_score(y_test, y_predict,average='binary', pos_label='no') 
prec = metrics.precision_score(y_test, y_predict,average='binary', pos_label='yes') 
f1 = metrics.f1_score(y_test, y_predict,average='binary', pos_label='yes') 


# display all the measures derived
print("Accuracy : ", round(acc,3)) 
print("Misclassification : ", round(1-acc,3)) 
print("Precision : ", round(prec,3)) 
print("Sensitivity/Recall 1: ", round(sens,3)) 
print("Specificity/Recall 0: ", round(spec,3)) 
print("F1-measure : ", round(f1,3)) 


Accuracy :  0.756
Misclassification :  0.244
Precision :  0.729
Sensitivity/Recall 1:  0.821
Specificity/Recall 0:  0.689
F1-measure :  0.773


After running the above Python codes, we shall observe the following results:

Accuracy : 0.756 
Misclassification : 0.244 
Precision : 0.729 
Sensitivity/Recall 1: 0.821 
Specificity/Recall 0: 0.689 
F1-measure : 0.773 
Based on the results, we can interpret that the performance of the model built as follows:

Accuracy: Overall, the model correctly predicts 75.6% of the churn labels (i.e.'yes' and 'no'). 

Misclassification: Overall, the model misclassified 24.4% of the churn labels (i.e., 'yes' and 'no').

Sensitivity: Of all the customers who churned (i.e., 'yes'), 82.1% of them were correctly predicted by the model.

Specificity: Of all the customers who were not churned (i.e., 'no'), 68.9% of them were correctly predicted by the model.

Precision: 72.9% of those predicted as churn (i.e., 'yes') by the model are actually churned.

F1 score: The harmonic mean(average) of the precision and recall/sensitivity is 77.3%.


NOTE:  For a detailed explanation of the Python API performance metrics() parameters, refer to the official website, https://scikit-learn.org/stable/modules/model_evaluation.html