# Multiclass Classification - Measurement metrics

Selecting the best metrics for evaluating the performance of a given classifier on dataset is guided by a number of consideration including the class-balance and expected outcomes. One particular performance measure may evaluate a classifier from a single perspective and often fail to measure others. Consequently, there is no unified metric to select measure the generalized performance of a classifier.

Two methods, micro-averaging and macro-averaging, are used to extract a single number for each of the precision, recall and other metrices across multiple classes. A macro-average calculates the metric autonomously for each class to calculate the average. In contrast, the micro-average calculates average metric from the aggregate contributions of all classes. Micro -average is used in unbalanced datasets as this method takes the frequency of each class into consideration. The micro average precision, recall, and accuracy scores are mathematically equivalent.

Classification report: 
The classification report provides the main classification metrics on a per-class basis. a) Precision (tp / (tp + fp) ) measures the ability of a classifier to identify only the correct instances for each class. b) Recall (tp / (tp + fn) is the ability of a classifier to find all correct instances per class. c) F1 score is a weighted harmonic mean of precision and recall normalized between 0 and 1. F score of 1 indicates a perfect balance as precision and the recall are inversely related. A high F1 score is useful where both high recall and precision is important.
d) Support is the number of actual occurrences of the class in the test data set. Imbalanced support in the training data may indicate the need for stratified sampling or rebalancing.

Confusion Matrix: 
A confusion matrix shows the combination of the actual and predicted classes. Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class. It is a good measure of wether models can account for the overlap in class properties and to understand which classes are most easily confused.

Class Prediction Error: 
This is a useful extension of the confusion matrix and visualizes the misclassified classes as a stacked bar. Each bar is a composite measure of predicted classes.

Aggregate metrics:
These provide a score for the overall performance of the classifier across the class spectrum.

Cohen’s Kappa: 
This is one of the best metrics for evaluating multi-class classifiers on imbalanced datasets. The traditional metrics from the classification report are biased towards the majority class and assumes an identical distribution of the actual and predicted classes. In contrast, Cohen’s Kappa Statistic measures the proximity of the predicted classes to the actual classes when compared to a random classification. The output is normalized between 0 and 1 the metrics for each classifier, therefore can be directly compared across the classification task. Generally closer the score is to one, better the classifier.

Cross-Entropy: 
Cross entropy measures the extent to which the predicted probabilities match the given data, and is useful for probabilistic classifiers such as Naïve Bayes. It is a more generic form of the logarithmic loss function, which was derived from neural network architecture, and is used to quantify the cost of inaccurate predictions. The classifier with the lowest log loss is preferred.

Mathews Correlation Coefficient (MCC):
MCC , originally devised for binary classification on unbalanced classes, has been extended to evaluates multiclass classifiers by computing the correlation coefficient between the observed and predicted classifications. A coefficient of +1 represents a perfect prediction, 0 is similar to a random prediction and −1 indicates an inverse prediction.

In [None]:
#!pip install PySpice

In [None]:
# libraries 
import os
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.base import BaseEstimator
import time

#Visualizers
from yellowbrick.classifier import ClassificationReport
from yellowbrick.classifier import ClassPredictionError
from yellowbrick.classifier import ConfusionMatrix
from yellowbrick.classifier import ROCAUC
from yellowbrick.classifier import PrecisionRecallCurve
import matplotlib.pyplot as plt

#Metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import hamming_loss
from sklearn.metrics import log_loss
from sklearn.metrics import zero_one_loss
from sklearn.metrics import matthews_corrcoef

#Classifiers
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier

#Neural Network
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Conv1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

import warnings
warnings.filterwarnings('ignore')

In [None]:
data_path6 = '../Databases/12DB_6FP.csv' 
data_path5 = '../Databases/12DB_5FP.csv' 
data_path3 = '../Databases/12DB_3FP.csv' 
figures_path = './figures'

In [None]:
if not os.path.exists(figures_path):
    os.makedirs(figures_path) 
if not os.path.exists(figures_path+"/6FP"):
    os.makedirs(figures_path+"/6FP") 
if not os.path.exists(figures_path+"/5FP"):
    os.makedirs(figures_path+"/5FP") 
if not os.path.exists(figures_path+"/3FP"):
    os.makedirs(figures_path+"/3FP") 

In [None]:
#Write function for class-centric metrics
# Classification report
def CR_viz():
    def Class_report(model,classes):
        visualizer = ClassificationReport(model, classes=classes, support=True)
        train_start_time = time.time()
        visualizer.fit(X_train, y_train)  # Fit the visualizer and the model
        print(f'Train runtime: {time.time()-train_start_time}')
        test_start_time = time.time()
        visualizer.score(X_test, y_test)  # Evaluate the model on the test data
        print(f'Test runtime: {time.time()-test_start_time}')
        return visualizer.poof()
    for name, classifier in zip(names, classifiers):
        fig, ax = plt.subplots(nrows=1, ncols=1 )
        Class_report(classifier,classes)
        fig.savefig(figures_path+"/"+str(len(classes))+"FP/"+name+"_CR.pdf")

#Class Prediction Error
def CPE_viz():    
    def CPE(model,classes):
        visualizer = ClassPredictionError(model, classes=classes)
        visualizer.fit(X_train, y_train)  # Fit the visualizer and the model
        visualizer.score(X_test, y_test)  # Evaluate the model on the test data 
        return visualizer.poof()  
    for name, classifier in zip(names, classifiers):
        fig, ax = plt.subplots(nrows=1, ncols=1 )
        CPE(classifier,classes)
        fig.savefig(figures_path+"/"+str(len(classes))+"FP/"+name+"_CPE.pdf")
        
#Confusion matrix
def CM_viz():    
    def CM(model,classes):
        visualizer = ConfusionMatrix(model, classes=classes, percent=True)
        visualizer.fit(X_train, y_train)  # Fit the visualizer and the model
        visualizer.score(X_test, y_test)  # Evaluate the model on the test data 
        return visualizer.poof()  
    for name, classifier in zip(names, classifiers):
        fig, ax = plt.subplots(nrows=1, ncols=1 )
        CM(classifier,classes)
        fig.savefig(figures_path+"/"+str(len(classes))+"FP/"+name+"_CM.pdf")
        
#ROC-AUC
def ROC_viz():    
    def ROC(model,classes):
        visualizer = ROCAUC(model, classes=classes)
        visualizer.fit(X_train, y_train)  # Fit the visualizer and the model
        visualizer.score(X_test, y_test)  # Evaluate the model on the test data 
        return visualizer.poof()  
    for name, classifier in zip(names, classifiers):
        fig, ax = plt.subplots(nrows=1, ncols=1 )
        ROC(classifier,classes)
        fig.savefig(figures_path+"/"+str(len(classes))+"FP/"+name+"_ROC.pdf")

#Precision Recall Curve
def PRC_viz():  
    def PRC(model,classes):
        visualizer = PrecisionRecallCurve(model,classes=classes, per_class=True, iso_f1_curves=False,
    fill_area=False, micro=False)
        visualizer.fit(X_train, y_train)  # Fit the visualizer and the model
        visualizer.score(X_test, y_test)  # Evaluate the model on the test data 
        return visualizer.poof()  
    for name, classifier in zip(names, classifiers):
        fig, ax = plt.subplots(nrows=1, ncols=1 )
        PRC(classifier,classes)
        fig.savefig(figures_path+"/"+str(len(classes))+"FP/"+name+"_PRC.pdf")


In [None]:
# Write function for aggregate metrics 
def classifier_metrics():    
    def metrics(model):
        #     model=model_name()
        model.fit(X_train, y_train)  # Fit the visualizer and the model
        y_pred = model.predict(X_test)
        try: 
            y_prob = model.predict_proba(X_test)
            log_metric = log_loss(y_test,y_prob)
        except:
            y_prob = "Not probablistic"
            log_metric = 0 
        else:
            y_pred = model.predict(X_test)

        acc_score=accuracy_score(y_test,y_pred) 
        c_k_s=cohen_kappa_score(y_test,y_pred)
        zero_met=zero_one_loss(y_test,y_pred)
        hl=hamming_loss(y_test,y_pred)
        mc=matthews_corrcoef(y_test,y_pred)
        print('accuracy_score: {0:.4f}'.format(acc_score))
        print('cohen_kappa_score: {0:.4f}'.format(c_k_s))
        print('log_loss: {0:.4f}'.format(log_metric))
        print('zero_one_loss: {0:.4f}'.format(zero_met))
        print('hemming_loss: {0:.4f}'.format(hl))
        print('matthews_corrcoef: {0:.4f}'.format(mc))
    for name in classifiers:
        print (str(name))
        metrics(name)
        print()
        print ("---------------------------------------------------------------------------------") 

In [None]:
class KerasBatchClassifier(KerasClassifier, BaseEstimator):
    def __init__(self, model, **kwargs):
        super().__init__(model)
        self.fit_kwargs = kwargs
        self._estimator_type = 'classifier'

    def fit(self, x, y, *args, **kwargs):
        y = np.array(y)
        if len(y.shape) == 2 and y.shape[1] > 1:
          self.classes_ = np.arange(y.shape[1])
        elif (len(y.shape) == 2 and y.shape[1] == 1) or len(y.shape) == 1:
          self.classes_ = np.unique(y)
          y = np.searchsorted(self.classes_, y)
        else:
          raise ValueError('Invalid shape for y: ' + str(y.shape))
        self.n_classes_ = len(self.classes_)
        return super(KerasClassifier, self).fit(x, y, **self.fit_kwargs)

def FullyConnected():
  inputs = Input(shape=(X_train.shape[1],), name="input_1")
  layers = Dense(512, activation="selu")(inputs)
  layers = Dense(256, activation="selu")(layers)
  layers = Dense(128, activation="selu")(layers)
  layers = Dense(64, activation="selu")(layers)
  predictions = Dense(len(classes), activation="softmax", name="output_1")(layers)
  model = Model(inputs = inputs, outputs=predictions)
  optimizer=RMSprop() 
  model.compile(optimizer=optimizer,
                loss='categorical_crossentropy', 
                metrics=['accuracy'])
  return model 

# 6FP

In [None]:
# Load Dataset

Data = pd.read_csv(data_path6, names=['Vsl', 'Vsg', 'VisL', 'VisG', 'DenL', 'DenG', 'ST', 'Ang', 'ID', 'Flow Pattern'], header=0)
print('Data shape:', Data.shape)
Data.head()

#Data = Data.drop(['VisG', 'VisL','DenG', 'ST', 'VisG', 'DenL'], axis=1)

In [None]:
Data.describe()

In [None]:
# Obtain the class distribution
Data['Flow Pattern'].value_counts()

In [None]:
# Train, test split
features_list = ['Vsl', 'Vsg', 'VisL', 'VisG', 'DenL', 'DenG', 'ST', 'Ang', 'ID']
Features = Data[features_list]
Labels = Data['Flow Pattern']

X_train, X_test, y_train, y_test = train_test_split(Features, Labels, test_size=0.2, stratify=Labels, random_state=42)

print('Train data shape:', X_train.shape)
print('Train labels shape:', y_train.shape)
print('Test data shape:', X_test.shape)
print('Test labels shape:', y_test.shape)

In [None]:
scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
#classes 
classes = [0, 1, 2, 3, 4, 5] 

In [None]:
# select classifiers
classifiers=[
ExtraTreesClassifier(n_estimators=220, max_depth=None, min_samples_split=3, random_state=18), #Definitivo
svm.SVC(C=120, gamma=8.0), #Definitivo
RandomForestClassifier(n_estimators=600,criterion='entropy',random_state=18), #Definitivo
GradientBoostingClassifier(n_estimators=2048, learning_rate=0.03, max_depth=10, random_state=10) #Definitivo
]

names=['ET', 'SVM', 'RF','GB'] 

In [None]:
#deploy visualization A 
visualization =[CR_viz(),CPE_viz(),CM_viz(),ROC_viz(),PRC_viz()] 

In [None]:
#Deploy aggregate metrics 
classifier_metrics() 

In [None]:
classifiers=[KerasBatchClassifier(FullyConnected, epochs=400, batch_size=64, verbose=0)]
names = ['FNN']

visualization =[CR_viz(),CPE_viz(),CM_viz(),ROC_viz()]

classifier_metrics() 

# 5FP

In [None]:
# Load Dataset

Data = pd.read_csv(data_path5, names=['Vsl', 'Vsg', 'VisL', 'VisG', 'DenL', 'DenG', 'ST', 'Ang', 'ID', 'Flow Pattern'], header=0)
print('Data shape:', Data.shape)
Data.head()

#Data = Data.drop(['VisG', 'VisL','DenG', 'ST', 'VisG', 'DenL'], axis=1)

In [None]:
Data.describe()

In [None]:
# Obtain the class distribution
Data['Flow Pattern'].value_counts()

In [None]:
# Train, test split
features_list = ['Vsl', 'Vsg', 'VisL', 'VisG', 'DenL', 'DenG', 'ST', 'Ang', 'ID']
Features = Data[features_list]
Labels = Data['Flow Pattern']

X_train, X_test, y_train, y_test = train_test_split(Features, Labels, test_size=0.2, stratify=Labels, random_state=42)

print('Train data shape:', X_train.shape)
print('Train labels shape:', y_train.shape)
print('Test data shape:', X_test.shape)
print('Test labels shape:', y_test.shape)

In [None]:
scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
#classes
classes = [0, 1, 2, 3, 4]

In [None]:
# select classifiers
classifiers=[
ExtraTreesClassifier(n_estimators=100, max_depth=None, min_samples_split=3, random_state=17), #Definitivo
svm.SVC(C=3000000, gamma=0.1), #Definitivo
RandomForestClassifier(n_estimators=10000,criterion='entropy',random_state=4), #Definitivo
GradientBoostingClassifier(n_estimators=2048, learning_rate=0.03, max_depth=7, random_state=5) #Definitivo
]

names=['ET', 'SVM', 'RF','GB'] 

In [None]:
#deploy visualization
visualization =[CR_viz(),CPE_viz(),CM_viz(),ROC_viz(),PRC_viz()]

In [None]:
#Deploy aggregate metrics  
classifier_metrics()

In [None]:
classifiers=[KerasBatchClassifier(FullyConnected, epochs=400, batch_size=64, verbose=0)]
names = ['FNN']

visualization =[CR_viz(),CPE_viz(),CM_viz(),ROC_viz()]

classifier_metrics() 

# 3FP

In [None]:
# Load Dataset

Data = pd.read_csv(data_path3, names=['Vsl', 'Vsg', 'VisL', 'VisG', 'DenL', 'DenG', 'ST', 'Ang', 'ID', 'Flow Pattern'], header=0)
print('Data shape:', Data.shape)
Data.head()


In [None]:
Data.describe()

In [None]:
# Obtain the class distribution
Data['Flow Pattern'].value_counts()

In [None]:
# Train, test split
features_list = ['Vsl', 'Vsg', 'VisL', 'VisG', 'DenL', 'DenG', 'ST', 'Ang', 'ID']
Features = Data[features_list]
Labels = Data['Flow Pattern']

X_train, X_test, y_train, y_test = train_test_split(Features, Labels, test_size=0.2, stratify=Labels, random_state=42)

print('Train data shape:', X_train.shape)
print('Train labels shape:', y_train.shape)
print('Test data shape:', X_test.shape)
print('Test labels shape:', y_test.shape)

In [None]:
scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
#classes
classes = [0, 1, 2]

In [None]:
# select classifiers
classifiers=[
ExtraTreesClassifier(n_estimators=100, max_depth=None, min_samples_split=2, random_state=38), #Definitivo
svm.SVC(C=3000000, gamma=0.5), #Definitivo
RandomForestClassifier(n_estimators=140,criterion='entropy',random_state=14), #Definitivo
GradientBoostingClassifier(n_estimators=2048, learning_rate=0.5, max_depth=11, random_state=150), #Definitivo
]

names=['ET', 'SVM', 'RF','GB'] 

In [None]:
#deploy visualization 
visualization =[CR_viz(),CPE_viz(),CM_viz(),ROC_viz(),PRC_viz()] 

In [None]:
#Deploy aggregate metrics  
classifier_metrics()

In [None]:
classifiers=[KerasBatchClassifier(FullyConnected, epochs=400, batch_size=64, verbose=0)]
names = ['FNN']

visualization =[CR_viz(),CPE_viz(),CM_viz(),ROC_viz()]

classifier_metrics() 