# Evaluating our models

Now that we know how to use a decision tree, let's try and evaluate the performance of the different algorithms. To do so, we need a test set. Remember, we use the training set to create our model, and the test set to evaluate how well it works. None of the training examples should be used in the test set.

We can use the trained model to predict the labels in the test set.

In [8]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create the train and test sets. Remember, X contains the features and y the labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
tree_classifier = DecisionTreeClassifier(random_state=42)
tree_classifier.fit(X_train, y_train)

# Make predictions on test data
y_pred = tree_classifier.predict(X_test)

#Visualize the predictions
print ('Pred:', y_pred)

#Visualize the true labels
print ('True:',y_test)

Pred: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
True: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]


# Exercise

For the CART decision tree and the Random Forests, do the following with the **Wine dataset**:

1.   **Without using the libraries in sklearn**, calculate: the accuracy, error rate, precision, recall, sensitivity, specificity, G-mean, F-measure, F0.5 and F2 metrics. Which algorithm works better?
2.   Calculate the same metrics **using the tools in the sklearn**. Calculate also the ROC AUC. Try different variations of averaging and see how the metrics change. Do the values change with different averaging methods?
3. Modify the CART exercise (using the sklearn modules) to use:
    - a stratified split
    - cross-validation.



Making imports

In [12]:
from sklearn.datasets import load_wine
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

Loading data

In [23]:
wine = load_wine()
df_wine = pd.DataFrame(data=wine.data, columns=wine.feature_names)
df_wine['target'] = wine.target
# df_wine['target'] = "class_" + df_wine['target']

#Get the data
X = df_wine.drop('target', axis=1)
y = df_wine['target']

df_wine.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


Splitting the data

In [24]:
X_test, X_train, y_test, y_train = train_test_split(X, y, test_size=0.2, random_state=42)

Training the model

In [25]:
# CART
tree_classifier = DecisionTreeClassifier(random_state=42)
tree_classifier.fit(X_train, y_train)
# Random Forests
rf_classifier = RandomForestClassifier(random_state=42)
rf_classifier.fit(X_train, y_train)

Making predictions

In [26]:
# CART
y_pred_tree = tree_classifier.predict(X_test)
# Random Forests
y_pred_rf = rf_classifier.predict(X_test)

Algorithm evaluation without sklearn:

First we are going to create the confusion matrix for the CART algorithm

In [49]:
num_classes = len(np.unique(y_test))

# Initialize the confusion matrix
conf_matrix = np.zeros((num_classes, num_classes), dtype=int)

# Populate the confusion matrix
for true_label, pred_label in zip(y_test, y_pred_tree):
    conf_matrix[true_label, pred_label] += 1

TP_tree = {}
FP_tree = {}
TN_tree = {}
FN_tree = {}

for i in range(num_classes):
    TP_tree[i] = conf_matrix[i, i]
    FP_tree[i] = conf_matrix[:, i].sum() - TP_tree[i]
    FN_tree[i] = conf_matrix[i, :].sum() - TP_tree[i]
    TN_tree[i] = conf_matrix.sum() - (TP_tree[i] + FP_tree[i] + FN_tree[i])

print("True Positives:", TP_tree)
print("False Positives:", FP_tree)
print("True Negatives:", TN_tree)
print("False Negatives:", FN_tree)


True Positives: {0: 38, 1: 45, 2: 30}
False Positives: {0: 7, 1: 14, 2: 8}
True Negatives: {0: 90, 1: 71, 2: 94}
False Negatives: {0: 7, 1: 12, 2: 10}


In [45]:
num_classes = len(np.unique(y_test))

# Initialize the confusion matrix
conf_matrix = np.zeros((num_classes, num_classes), dtype=int)

# Populate the confusion matrix
for true_label, pred_label in zip(y_test, y_pred_rf):
    conf_matrix[true_label, pred_label] += 1

TP_rf = {}
FP_rf = {}
TN_rf = {}
FN_rf = {}

for i in range(num_classes):
    TP_rf[i] = conf_matrix[i, i]
    FP_rf[i] = conf_matrix[:, i].sum() - TP_rf[i]
    FN_rf[i] = conf_matrix[i, :].sum() - TP_rf[i]
    TN_rf[i] = conf_matrix.sum() - (TP_rf[i] + FP_rf[i] + FN_rf[i])

print("True Positives:", TP_rf)
print("False Positives:", FP_rf)
print("True Negatives:", TN_rf)
print("False Negatives:", FN_rf)

True Positives: {0: 44, 1: 55, 2: 40}
False Positives: {0: 2, 1: 1, 2: 0}
True Negatives: {0: 95, 1: 84, 2: 102}
False Negatives: {0: 1, 1: 2, 2: 0}


We are going to use macro averaging for the metrics

Accuracy

In [62]:
accuracy_per_class_rf = {}

for i in range(num_classes):
    accuracy_per_class_rf[i] = (TP_rf[i] + TN_rf[i]) / (TP_rf[i] + TN_rf[i] + FP_rf[i] + FN_rf[i])

accuracy_macro_rf = sum(accuracy_per_class_rf.values()) / num_classes

print("Accuracy Macro Random Forest:", accuracy_macro_rf)

accuracy_per_class_tree = {}

for i in range(num_classes):
    accuracy_per_class_tree[i] = (TP_tree[i] + TN_tree[i]) / (TP_tree[i] + TN_tree[i] + FP_tree[i] + FN_tree[i])
    
accuracy_macro_tree = sum(accuracy_per_class_tree.values()) / num_classes

print("Accuracy Macro CART:", accuracy_macro_tree)

Accuracy Macro Random Forest: 0.9859154929577465
Accuracy Macro CART: 0.863849765258216


Error rate

In [63]:
error_rate_rf = 1 - accuracy_macro_rf
print("Error rate Random Forest:", error_rate_rf)

error_rate_tree = 1 - accuracy_macro_tree
print("Error rate CART:", error_rate_tree)

Error rate Random Forest: 0.014084507042253502
Error rate CART: 0.136150234741784


Sensitivity

In [64]:
sensitivity_per_class_rf = {}
sensitivity_per_class_tree = {}

# Calculate sensitivity for Random Forest
for i in range(num_classes):
    sensitivity_per_class_rf[i] = TP_rf[i] / (TP_rf[i] + FN_rf[i])

sensitivity_macro_rf = sum(sensitivity_per_class_rf.values()) / num_classes
print("Sensitivity Macro Random Forest:", sensitivity_macro_rf)

# Calculate sensitivity for CART
for i in range(num_classes):
    sensitivity_per_class_tree[i] = TP_tree[i] / (TP_tree[i] + FN_tree[i])

sensitivity_macro_tree = sum(sensitivity_per_class_tree.values()) / num_classes
print("Sensitivity Macro CART:", sensitivity_macro_tree)

Sensitivity Macro Random Forest: 0.980896686159844
Sensitivity Macro CART: 0.7946393762183236


Specifity

In [65]:
specificity_per_class_rf = {}
specificity_per_class_tree = {}

# Calculate specificity for Random Forest
for i in range(num_classes):
    specificity_per_class_rf[i] = TN_rf[i] / (TN_rf[i] + FP_rf[i])

specificity_macro_rf = sum(specificity_per_class_rf.values()) / num_classes
print("Specificity Macro Random Forest:", specificity_macro_rf)

# Calculate specificity for CART
for i in range(num_classes):
    specificity_per_class_tree[i] = TN_tree[i] / (TN_tree[i] + FP_tree[i])

specificity_macro_tree = sum(specificity_per_class_tree.values()) / num_classes
print("Specificity Macro CART:", specificity_macro_tree)

Specificity Macro Random Forest: 0.9892055791388721
Specificity Macro CART: 0.8948992655481436


Precision

In [66]:
precision_per_class_rf = {}
precision_per_class_tree = {}

# Calculate precision for Random Forest
for i in range(num_classes):
    precision_per_class_rf[i] = TP_rf[i] / (TP_rf[i] + FP_rf[i])

precision_macro_rf = sum(precision_per_class_rf.values()) / num_classes
print("Precision Macro Random Forest:", precision_macro_rf)

# Calculate precision for CART
for i in range(num_classes):
    precision_per_class_tree[i] = TP_tree[i] / (TP_tree[i] + FP_tree[i])

precision_macro_tree = sum(precision_per_class_tree.values()) / num_classes
print("Precision Macro CART:", precision_macro_tree)

Precision Macro Random Forest: 0.9795548654244306
Precision Macro CART: 0.7988766643539167


G-mean

In [67]:
gmean_per_class_rf = {}
gmean_per_class_tree = {}

# Calculate G-mean for Random Forest
for i in range(num_classes):
    sensitivity_rf = TP_rf[i] / (TP_rf[i] + FN_rf[i])
    specificity_rf = TN_rf[i] / (TN_rf[i] + FP_rf[i])
    gmean_per_class_rf[i] = (sensitivity_rf * specificity_rf) ** 0.5

gmean_macro_rf = sum(gmean_per_class_rf.values()) / num_classes
print("G-mean Macro Random Forest:", gmean_macro_rf)

# Calculate G-mean for CART
for i in range(num_classes):
    sensitivity_tree = TP_tree[i] / (TP_tree[i] + FN_tree[i])
    specificity_tree = TN_tree[i] / (TN_tree[i] + FP_tree[i])
    gmean_per_class_tree[i] = (sensitivity_tree * specificity_tree) ** 0.5

gmean_macro_tree = sum(gmean_per_class_tree.values()) / num_classes
print("G-mean Macro CART:", gmean_macro_tree)

G-mean Macro Random Forest: 0.9850278135025802
G-mean Macro CART: 0.8428630969078913


F-measure

In [68]:
fmeasure_per_class_rf = {}
fmeasure_per_class_tree = {}

# Calculate F-measure for Random Forest
for i in range(num_classes):
    precision_rf = TP_rf[i] / (TP_rf[i] + FP_rf[i])
    recall_rf = TP_rf[i] / (TP_rf[i] + FN_rf[i])
    fmeasure_per_class_rf[i] = 2 * (precision_rf * recall_rf) / (precision_rf * recall_rf)

fmeasure_macro_rf = sum(fmeasure_per_class_rf.values()) / num_classes
print("F-measure Macro Random Forest:", fmeasure_macro_rf)

# Calculate F-measure for CART
for i in range(num_classes):
    precision_tree = TP_tree[i] / (TP_tree[i] + FP_tree[i])
    recall_tree = TP_tree[i] / (TP_tree[i] + FN_tree[i])
    fmeasure_per_class_tree[i] = 2 * (precision_tree * recall_tree) / (precision_tree * recall_tree)

fmeasure_macro_tree = sum(fmeasure_per_class_tree.values()) / num_classes
print("F-measure Macro CART:", fmeasure_macro_tree)

F-measure Macro Random Forest: 2.0
F-measure Macro CART: 2.0


F0.5

In [69]:
f0_5_per_class_rf = {}
f0_5_per_class_tree = {}

# Calculate F0.5 score for Random Forest
for i in range(num_classes):
    precision_rf = TP_rf[i] / (TP_rf[i] + FP_rf[i])
    recall_rf = TP_rf[i] / (TP_rf[i] + FN_rf[i])
    f0_5_per_class_rf[i] = (1 + 0.5**2) * (precision_rf * recall_rf) / (0.5**2 * precision_rf + recall_rf)

f0_5_macro_rf = sum(f0_5_per_class_rf.values()) / num_classes
print("F0.5 Macro Random Forest:", f0_5_macro_rf)

# Calculate F0.5 score for CART
for i in range(num_classes):
    precision_tree = TP_tree[i] / (TP_tree[i] + FP_tree[i])
    recall_tree = TP_tree[i] / (TP_tree[i] + FN_tree[i])
    f0_5_per_class_tree[i] = (1 + 0.5**2) * (precision_tree * recall_tree) / (0.5**2 * precision_tree + recall_tree)

f0_5_macro_tree = sum(f0_5_per_class_tree.values()) / num_classes
print("F0.5 Macro CART:", f0_5_macro_tree)

F0.5 Macro Random Forest: 0.9797821255963574
F0.5 Macro CART: 0.7978708443938819


F2

In [70]:
f2_per_class_rf = {}
f2_per_class_tree = {}

# Calculate F2 score for Random Forest
for i in range(num_classes):
    precision_rf = TP_rf[i] / (TP_rf[i] + FP_rf[i])
    recall_rf = TP_rf[i] / (TP_rf[i] + FN_rf[i])
    f2_per_class_rf[i] = (1 + 2**2) * (precision_rf * recall_rf) / (2**2 * precision_rf + recall_rf)

f2_macro_rf = sum(f2_per_class_rf.values()) / num_classes
print("F2 Macro Random Forest:", f2_macro_rf)

# Calculate F2 score for CART
for i in range(num_classes):
    precision_tree = TP_tree[i] / (TP_tree[i] + FP_tree[i])
    recall_tree = TP_tree[i] / (TP_tree[i] + FN_tree[i])
    f2_per_class_tree[i] = (1 + 2**2) * (precision_tree * recall_tree) / (2**2 * precision_tree + recall_tree)

f2_macro_tree = sum(f2_per_class_tree.values()) / num_classes
print("F2 Macro CART:", f2_macro_tree)

F2 Macro Random Forest: 0.9805870621961859
F2 Macro CART: 0.7953307758185807


We can observe that the Random Forest algorithm has better performance than the CART algorithm. Now we are going to use the sklearn tools to calculate the metrics.

In [72]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix
from sklearn.metrics import make_scorer, roc_curve, auc

# Calculate metrics using sklearn
# Accuracy
accuracy_rf = accuracy_score(y_test, y_pred_rf)
accuracy_tree = accuracy_score(y_test, y_pred_tree)

# Error rate
error_rate_rf = 1 - accuracy_rf
error_rate_tree = 1 - accuracy_tree

# Precision
precision_rf = precision_score(y_test, y_pred_rf, average='macro')
precision_tree = precision_score(y_test, y_pred_tree, average='macro')

# Recall (Sensitivity)
recall_rf = recall_score(y_test, y_pred_rf, average='macro')
recall_tree = recall_score(y_test, y_pred_tree, average='macro')

# Specificity
def specificity_score(y_true, y_pred):
    cm = confusion_matrix(y_true, y_pred)
    tn = cm[0, 0]
    fp = cm[0, 1]
    specificity = tn / (tn + fp)
    return specificity

specificity_rf = specificity_score(y_test, y_pred_rf)
specificity_tree = specificity_score(y_test, y_pred_tree)

# G-mean
def gmean_score(y_true, y_pred):
    sensitivity = recall_score(y_true, y_pred, average='macro')
    specificity = specificity_score(y_true, y_pred)
    return (sensitivity * specificity) ** 0.5

gmean_rf = gmean_score(y_test, y_pred_rf)
gmean_tree = gmean_score(y_test, y_pred_tree)

# F-measure (F1-score)
fmeasure_rf = f1_score(y_test, y_pred_rf, average='macro')
fmeasure_tree = f1_score(y_test, y_pred_tree, average='macro')



# Print the results
print("Random Forest Metrics:")
print(f"Accuracy: {accuracy_rf}")
print(f"Error Rate: {error_rate_rf}")
print(f"Precision: {precision_rf}")
print(f"Recall (Sensitivity): {recall_rf}")
print(f"Specificity: {specificity_rf}")
print(f"G-mean: {gmean_rf}")
print(f"F-measure (F1): {fmeasure_rf}")

print("\nCART Metrics:")
print(f"Accuracy: {accuracy_tree}")
print(f"Error Rate: {error_rate_tree}")
print(f"Precision: {precision_tree}")
print(f"Recall (Sensitivity): {recall_tree}")
print(f"Specificity: {specificity_tree}")
print(f"G-mean: {gmean_tree}")
print(f"F-measure (F1): {fmeasure_tree}")


Random Forest Metrics:
Accuracy: 0.9788732394366197
Error Rate: 0.021126760563380254
Precision: 0.9795548654244306
Recall (Sensitivity): 0.980896686159844
Specificity: 0.9777777777777777
G-mean: 0.9793359903643686
F-measure (F1): 0.980161431488865

CART Metrics:
Accuracy: 0.795774647887324
Error Rate: 0.204225352112676
Precision: 0.7988766643539167
Recall (Sensitivity): 0.7946393762183236
Specificity: 0.8444444444444444
G-mean: 0.8191634797672331
F-measure (F1): 0.7965124275469103
