# 3. Multi-class Classification

In this notebook, we will explore various classifier algorithms on the dataset from https://www.dropbox.com/s/z9ebwa49koaqs7i/Medical_MNIST.zip?dl=0

**Due to the dataset being extremely tedious, we refrained from using weaker algorithms like Linear Discriminant Analysis. Moreover, due to this being a non-binary classification we skipped the generalized linear models like Logistic Regression and Perceptron as well**

**Owing to the size of the initial dataset that was over 200MB, GitHub did not allow the uploading of such a huge file. For the code that follows we took the dataset after having separate;y downloaded it into our working folder on the local device.**

In [1]:
#importing all libraries

import numpy as np
import pandas as pd
import scipy.stats
import math
import os, sys, itertools
from csv import reader
from random import seed
from random import randrange
import AuxUtils as au
import PriorUtils as pu
import CorrectnessMetricUtils as cmu
import ErrorMetricsUtils as emu
import matplotlib.pyplot as plt
import cv2
from PIL import Image

**Dataset Preparation:** We will approaxh this issue by the following steps: **(1)** Reading and resizing the image **(2)** Reading the image into a numpy array **(3)** Appending its class type as numbers with index starting from 0 to the arrays

In [2]:
#preparing dataset
path = 'Medical_MNIST'
dataset = list()

i = 0
for item in os.listdir(path):
    subpath = path + "\\" + item
    for subitem in os.listdir(subpath):
        imgpath = subpath + "\\" + subitem
        img = Image.open(imgpath)                  #reading image
        img = img.resize((32,32))                  #resizing to 32x32
        np_img = np.asarray(img)                   #converting to np array
        np_img = np_img.flatten()                  #converting to 1D array
        np_img = list(np_img)
        np_img.append(i)                           #adding class type as last element in the list
        dataset.append(np_img)
    i += 1

In [3]:
#visualizing dataset
df = pd.DataFrame(dataset)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024
0,101,101,101,101,101,101,101,101,101,101,...,102,101,102,101,99,101,101,101,101,0
1,101,101,101,101,101,101,101,101,101,101,...,101,101,101,102,100,101,101,101,101,0
2,101,101,101,101,101,101,101,101,101,101,...,102,101,101,101,100,101,101,101,101,0
3,101,101,101,101,101,101,101,101,101,101,...,102,100,101,101,101,101,101,101,101,0
4,101,101,101,101,101,101,101,101,101,101,...,102,101,101,101,100,101,101,101,101,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58949,25,25,25,25,25,25,25,25,25,25,...,25,24,22,96,44,23,25,27,27,5
58950,25,25,25,25,25,25,25,25,25,25,...,25,23,23,100,38,23,24,25,27,5
58951,25,25,25,25,25,25,25,25,25,25,...,24,25,32,101,31,23,26,27,24,5
58952,25,25,25,25,25,25,25,25,25,25,...,26,23,24,95,41,23,24,24,25,5


We can observe that the featureset has 1024 feature columns and we have 1 target column.

### 1.1 Naive Bayes Classifier

Here, we have Gaussian prior probability distribution for sampling the data from the featureset of the first 3 columns. The Naive Bayes Classifier works with MAP estimation serving as it basis. We _assume_ that all the features are **independent** from one another. 

#### 1.1.1 **MLE** for Naive Bayes Classifier, _without_ considering the relative class frequencies

In [6]:
# Split dataset into n folds
def crossval_split(dataset, n_folds):
    split = list()
    dataset_copy = list(dataset)
    fold_dim = int(len(dataset) / n_folds)
    for _ in range(n_folds):
        fold = list()
        while len(fold) < fold_dim:
            index = randrange(len(dataset_copy))
            fold.append(dataset_copy.pop(index))
        split.append(fold)
    return split

# Divide dataset by class
def class_divider(dataset):
    divided = dict()
    for i in range(len(dataset)):
        row = dataset[i]
        class_type = row[-1]
        if (class_type not in divided):
            divided[class_type] = list()
        divided[class_type].append(row)
    return divided

# Mean, std and count columnwise
def dataset_info(dataset):
    info = [(np.mean(col), np.std(col), len(col)) for col in zip(*dataset)]
    del(info[-1]) #not reqd for class labels
    return info

# Classwise column stats
def class_info(dataset):
    divided = class_divider(dataset)
    info = dict()
    for class_type, rows in divided.items():
        info[class_type] = dataset_info(rows)
    return info

# Calculate probabilities of predicting each class for given row
def calc_class_probs_nbmle(info, row, prior):
    total_rows = sum([info[label][0][2] for label in info])
    probs = dict()
    for class_type, class_info in info.items():
        probs[class_type] = info[class_type][0][2]/float(total_rows)
        for i in range(len(class_info)):
            mean, std, _ = class_info[i]
            probs[class_type] *= prior(row[i], mean, std)
    aux = 0
    for class_type, class_info in info.items():
        aux += probs[class_type]
    for class_type, class_info in info.items():
        probs[class_type] = probs[class_type]/aux
    return probs

# Predict class type for given row
def predict_nbmle(info, row, prior):
    probs = calc_class_probs_nbmle(info, row, prior)
    best_label, best_prob = None, -1
    for class_type, prob in probs.items():
        if best_label is None or prob > best_prob:
            best_prob = prob
            best_label = class_type
    return best_label

# Algo evaluation by cross validation split
def eval_algo(dataset, algo, n_folds, obs_label, *args):
    folds = crossval_split(dataset, n_folds)
    TestScores = list()
    TrainScores = list()
    Pscores = list()
    Rscores = list()
    Fscores = list()
    Sscores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold)
        train_set = sum(train_set, [])
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        test_pred = algo(train_set, test_set, *args)
        train_pred = algo(train_set, train_set, *args)
        test_actual = [row[-1] for row in fold]
        train_actual = [row[-1] for row in train_set]
        test_accuracy = cmu.accuracy_calc(test_actual, test_pred)
        train_accuracy = cmu.accuracy_calc(train_actual, train_pred)
        precision = cmu.precision_calc(obs_label, test_actual, test_pred)
        recall = cmu.recall_calc(obs_label, test_actual, test_pred)
        f1 = cmu.f1_calc(obs_label, test_actual, test_pred)
        spec = cmu.specificity_calc(obs_label, test_actual, test_pred)
        TestScores.append(test_accuracy)
        TrainScores.append(train_accuracy)
        Pscores.append(precision)
        Rscores.append(recall)
        Fscores.append(f1)
        Sscores.append(spec)
    return np.mean(TestScores), np.mean(TrainScores), np.mean(Pscores), np.mean(Rscores), np.mean(Fscores), np.mean(Sscores)

# Naive Bayes Algorithm simple gaussian
def naive_bayes_Gaussian_mle(train, test):
    info = class_info(train)
    predictions = list()
    for row in test:
        output = predict_nbmle(info, row, pu.Gaussian)
        predictions.append(output)
    return(predictions)

# evaluate naive bayes (gaussian) algorithm
n_folds = 5
TrainAcc = list()
TestAcc = list()
Prec = list()
Recall = list()
Spec = list()
for i in range(6):
    TestScores, TrainScores, Pscores, Rscores, Fscores, Sscores = eval_algo(dataset, naive_bayes_Gaussian_mle, n_folds, i)
    print("For Class Type", i)
    print('Train Accuracy: %s' % TrainScores)
    print('Test Accuracy: %s' % TestScores)
    print('Mean Precision: %s' % Pscores)
    print('Mean Recall: %s' % Rscores)
    print('Mean F1: %s' % Fscores)
    print('Mean Specificity: %s' % Sscores)
    TrainAcc.append(TrainScores)
    TestAcc.append(TestScores)
    Prec.append(Pscores)
    Recall.append(Rscores)
    Spec.append(Sscores)
    
Macro_prec = np.mean(Prec)
Macro_recall = np.mean(Recall)
Macro_F1 = 2*Macro_prec*Macro_recall/(Macro_prec + Macro_recall)
print('Macro F1-score: %s' % Macro_F1)

For Class Type 0
Train Accuracy: 0.8571428571428571
Test Accuracy: 0.8571428571428571
Mean Precision: 0.8311456607620746
Mean Recall: 0.834114774114774
Mean F1: 0.8308888576676352
Mean Specificity: 0.8766988914975185
For Class Type 1
Train Accuracy: 0.8571428571428571
Test Accuracy: 0.8571428571428571
Mean Precision: 0.8327907110956818
Mean Recall: 0.8319971139551594
Mean F1: 0.831905126234998
Mean Specificity: 0.8730873678354698
For Class Type 2
Train Accuracy: 0.8585714285714285
Test Accuracy: 0.8514285714285714
Mean Precision: 0.8284048365330199
Mean Recall: 0.8278262130815512
Mean F1: 0.8271113489481763
Mean Specificity: 0.8693468676009051
For Class Type 3
Train Accuracy: 0.8574999999999999
Test Accuracy: 0.8557142857142856
Mean Precision: 0.8298896221338279
Mean Recall: 0.8318066209829584
Mean F1: 0.8297940576736078
Mean Specificity: 0.874209814144792
For Class Type 4
Train Accuracy: 0.8582142857142857
Test Accuracy: 0.8528571428571429
Mean Precision: 0.8316970997812134
Mean Recal

#### 1.1.2 Naive Bayes Classifier with **MAP**.

In [7]:
# relative class frequencies in the dataset
data = np.asarray(dataset)
target = data.T[1024]
x = class_info(dataset)
class_types = list()
total = 0
for i in range(len(x)):
    class_types.append(i)
    class_types[i] = list()
    class_types[i].append(x[i][0][-1])
    total += class_types[i][0]
class_probs = list()
for i in range(len(x)):
    class_probs.append(class_types[i][0]/float(total))

# Calculate probabilities of predicting each class for given row
def calc_class_probs_nbmap(info, row, prior):
    total_rows = sum([info[label][0][2] for label in info])
    probs = dict()
    for class_type, class_info in info.items():
        probs[class_type] = info[class_type][0][2]/float(total_rows)
        for i in range(len(class_info)):
            mean, std, _ = class_info[i]
            probs[class_type] *= prior(row[i], mean, std)
    aux = 0
    for class_type, class_info in info.items():
        aux += probs[class_type]
    for class_type, class_info in info.items():
        probs[class_type] = probs[class_type]/aux
    return probs

# Calculate probabilities of predicting each class for given row
def calc_class_probs_nbmap(info, row, prior):
    total_rows = sum([info[label][0][2] for label in info])
    probs = dict()
    for class_type, class_info in info.items():
        probs[class_type] = info[class_type][0][2]/float(total_rows)
        for i in range(len(class_info)):
            mean, std, _ = class_info[i]
            probs[class_type] *= prior(row[i], mean, std)
        if class_type == 0.0:                                                      #incorporating relative class frequencies
            probs[class_type] *= class_probs[0] 
        elif class_type == 1.0:
            probs[class_type] *= class_probs[1]
        elif class_type == 2.0:
            probs[class_type] *= class_probs[2]
        elif class_type == 3.0:
            probs[class_type] *= class_probs[3]
        elif class_type == 4.0:
            probs[class_type] *= class_probs[4]
        elif class_type == 5.0:
            probs[class_type] *= class_probs[5]
    aux = 0
    for class_type, class_info in info.items():
        aux += probs[class_type]
    for class_type, class_info in info.items():
        probs[class_type] = probs[class_type]/aux
    return probs

# Naive Bayes Algorithm simple gaussian
def naive_bayes_Gaussian_map(train, test):
    info = class_info(train)
    predictions = list()
    for row in test:
        output = predict_nbmap(info, row, pu.Gaussian)
        predictions.append(output)
    return(predictions)

# evaluate naive bayes (gaussian) algorithm
n_folds = 5
TrainAcc = list()
TestAcc = list()
Prec = list()
Recall = list()
Spec = list()
for i in range(6):
    TestScores, TrainScores, Pscores, Rscores, Fscores, Sscores = eval_algo(dataset, naive_bayes_Gaussian_map, n_folds, i)
    print("For Class Type", i)
    print('Train Accuracy: %s' % TrainScores)
    print('Test Accuracy: %s' % TestScores)
    print('Mean Precision: %s' % Pscores)
    print('Mean Recall: %s' % Rscores)
    print('Mean F1: %s' % Fscores)
    print('Mean Specificity: %s' % Sscores)
    TrainAcc.append(TrainScores)
    TestAcc.append(TestScores)
    Prec.append(Pscores)
    Recall.append(Rscores)
    Spec.append(Sscores)
    
Macro_prec = np.mean(Prec)
Macro_recall = np.mean(Recall)
Macro_F1 = 2*Macro_prec*Macro_recall/(Macro_prec + Macro_recall)
print('Macro F1-score: %s' % Macro_F1)

For Class Type 0
Train Accuracy: 0.8567857142857143
Test Accuracy: 0.8585714285714285
Mean Precision: 0.8455963225768766
Mean Recall: 0.8197324574850476
Mean F1: 0.8298173923921768
Mean Specificity: 0.8878279905649226
For Class Type 1
Train Accuracy: 0.8582142857142857
Test Accuracy: 0.8528571428571429
Mean Precision: 0.8332396204332854
Mean Recall: 0.820987871698201
Mean F1: 0.8261208873333921
Mean Specificity: 0.8788205360115361
For Class Type 2
Train Accuracy: 0.8578571428571429
Test Accuracy: 0.8542857142857143
Mean Precision: 0.8379753136245922
Mean Recall: 0.8174671412569146
Mean F1: 0.8260512127304768
Mean Specificity: 0.8832505190173082
For Class Type 3
Train Accuracy: 0.8564285714285713
Test Accuracy: 0.86
Mean Precision: 0.8456294214668981
Mean Recall: 0.8281205399768832
Mean F1: 0.8350420633426223
Mean Specificity: 0.885447784636901
For Class Type 4
Train Accuracy: 0.8571428571428573
Test Accuracy: 0.8571428571428571
Mean Precision: 0.8445664048707376
Mean Recall: 0.81657867

### 1.2 Bayes Classifier

The Bayes Classifier works with MLE estimation serving as it basis. We assume that all the features have some dependency on one another as shown in the usage of covariance matrices.

#### 1.2.1 WIth Gaussian Distribution as Class Conditional Density

In [8]:
# Mean rv, cov_mat and count
def dataset_info(dataset):
    info_aux = [(np.mean(col), np.std(col), len(col)) for col in zip(*dataset)]
    del(info_aux[-1]) #not reqd for class labels
    ovr_mean_rv = list()
    size = info_aux[0][-1]
    cov_mat = np.zeros((len(info_aux), len(info_aux)))
    for i in range(len(info_aux)):
        ovr_mean_rv.append(info_aux[i][0])
    np_ds = np.asarray(dataset)
    np_ds = np_ds[:, :-1]
    np_ovr_mean = np.asarray(ovr_mean_rv)
    for i in range(len(np_ds)):
        xn = np.array([np_ds[i]])
        cov_mat += np.matmul((xn - np_ovr_mean).T,(xn - np_ovr_mean))
    cov_mat = cov_mat/float(size) 
    info = list()
    info.append(ovr_mean_rv)
    info.append(cov_mat)
    info.append(size)
    return info

# Calculate probabilities of predicting each class for given row
def calc_class_probs_bayes_g(info, row):
    probs = dict()
    for class_type, class_info in info.items():
        mean_rv = np.array(class_info[0])
        cov_matrix = class_info[1]
        probs[class_type] = pu.multi_normal(np.asarray(row)[:-1], mean_rv, cov_matrix)
    aux = 0
    for class_type, class_info in info.items():
        aux += probs[class_type]
    for class_type, class_info in info.items():
        probs[class_type] = probs[class_type]/aux
    return probs

# Predict class type for given row
def predict_bayes_g(info, row):
    probs = calc_class_probs_bayes_g(info, row)
    best_label, best_prob = None, -1
    for class_type, prob in probs.items():
        if best_label is None or prob > best_prob:
            best_prob = prob
            best_label = class_type
    return best_label

# Bayes Algorithm simple gaussian
def bayes_Gaussian(train, test):
    info = class_info(train)
    predictions = list()
    for row in test:
        output = predict_bayes_g(info, row)
        predictions.append(output)
    return(predictions)

# evaluate bayes (gaussian) algorithm
n_folds = 5
TrainAcc = list()
TestAcc = list()
Prec = list()
Recall = list()
Spec = list()
for i in range(6):
    TestScores, TrainScores, Pscores, Rscores, Fscores, Sscores = eval_algo(dataset, bayes_Gaussian, n_folds, 1, 0.5)
    print("For Class Type", i)
    print('Train Accuracy: %s' % TrainScores)
    print('Test Accuracy: %s' % TestScores)
    print('Mean Precision: %s' % Pscores)
    print('Mean Recall: %s' % Rscores)
    print('Mean F1: %s' % Fscores)
    print('Mean Specificity: %s' % Sscores)
    TrainAcc.append(TrainScores)
    TestAcc.append(TestScores)
    Prec.append(Pscores)
    Recall.append(Rscores)
    Spec.append(Sscores)
    
Macro_prec = np.mean(Prec)
Macro_recall = np.mean(Recall)
Macro_F1 = 2*Macro_prec*Macro_recall/(Macro_prec + Macro_recall)
print('Macro F1-score: %s' % Macro_F1)

For Class Type 0
Train Accuracy: 0.8596428571428572
Test Accuracy: 0.8471428571428572
Mean Precision: 0.8018489918489917
Mean Recall: 0.858154055457708
Mean F1: 0.8256951895602309
Mean Specificity: 0.8409347019866995
For Class Type 1
Train Accuracy: 0.8564285714285713
Test Accuracy: 0.86
Mean Precision: 0.8143965883691913
Mean Recall: 0.8614184782608696
Mean F1: 0.8367149881748421
Mean Specificity: 0.8543773940205405
For Class Type 2
Train Accuracy: 0.8582142857142857
Test Accuracy: 0.8528571428571429
Mean Precision: 0.8166757853518417
Mean Recall: 0.8425637429424245
Mean F1: 0.8287511213826647
Mean Specificity: 0.8563049557457179
For Class Type 3
Train Accuracy: 0.8589285714285715
Test Accuracy: 0.85
Mean Precision: 0.8064716234278674
Mean Recall: 0.8528923825626311
Mean F1: 0.8278641194693274
Mean Specificity: 0.848188606461133
For Class Type 4
Train Accuracy: 0.8578571428571428
Test Accuracy: 0.8542857142857143
Mean Precision: 0.8060627045348292
Mean Recall: 0.8712171961748233
Mean 

#### 1.2.2 WIth Gaussian Mixture Models as Class Conditional Density

In [9]:
# Preparing featureset and targets from dataset
data = np.asarray(dataset)
featureset = np.delete(data, 1024, axis=1)
target = data.T[1024]

x = featureset
y = target

train_accuracy = list()
test_accuracy = list()
Pscores = list()
Rscores = list()
Fscores = list()
Sscores = list()

TrainAcc = list()
TestAcc = list()
Prec = list()
Recall = list()
Spec = list()

for obs_label in range(6):
    for i in range(5):
        x_train, x_test, y_train, y_test = au.cross_val_split(x, y, 5)[i]
        model_gmm = pu.GMM()
        model_gmm.fit(x_train)
        y_test_hat = model_gmm.predict(x_test)
        y_train_hat = model_gmm.predict(x_train)
        train_accuracy.append(cmu.accuracy_calc(y_train, y_train_hat))
        test_accuracy.append(cmu.accuracy_calc(y_test, y_test_hat))
        Pscores.append(cmu.precision_calc(obs_label, y_test, y_test_hat))
        Rscores.append(cmu.recall_calc(obs_label, y_test, y_test_hat))
        Fscores.append(cmu.f1_calc(obs_label, y_test, y_test_hat))
        Sscores.append(cmu.specificity_calc(obs_label,y_test, y_test_hat))
    print("For Class Type", obs_label)
    print("Mean Train Accuracy:", np.mean(train_accuracy))
    print("Mean Test Accuracy:", np.mean(test_accuracy))
    print("Mean Precision:", np.mean(Pscores))
    print("Mean Recall:", np.mean(Rscores))
    print("Mean F1 score:", np.mean(Fscores))
    print("Mean Specificity:", np.mean(Sscores))
    TrainAcc.append(np.mean(train_accuracy))
    TestAcc.append(np.mean(test_accuracy))
    Prec.append(np.mean(Pscores))
    Recall.append(np.mean(Rscores))
    Spec.append(np.mean(Sscores))
    
Macro_prec = np.mean(Prec)
Macro_recall = np.mean(Recall)
Macro_F1 = 2*Macro_prec*Macro_recall/(Macro_prec + Macro_recall)
print('Macro F1-score: %s' % Macro_F1)

For Class Type 0
Train Accuracy: 0.8578571428571429
Test Accuracy: 0.8542857142857143
Mean Precision: 0.8099119955902758
Mean Recall: 0.8589668396120009
Mean F1: 0.8329288328571506
Mean Specificity: 0.8504243054243055
For Class Type 1
Train Accuracy: 0.8589285714285715
Test Accuracy: 0.85
Mean Precision: 0.8001486518636478
Mean Recall: 0.8614379442318206
Mean F1: 0.8293056505478047
Mean Specificity: 0.8397677631010965
For Class Type 2
Train Accuracy: 0.8589285714285714
Test Accuracy: 0.85
Mean Precision: 0.8042958870579838
Mean Recall: 0.8618044338770522
Mean F1: 0.8311050521157023
Mean Specificity: 0.8386757970434882
For Class Type 3
Train Accuracy: 0.8578571428571429
Test Accuracy: 0.8542857142857143
Mean Precision: 0.808132183908046
Mean Recall: 0.8685706117781591
Mean F1: 0.8350851410370657
Mean Specificity: 0.8461523034190377
For Class Type 4
Train Accuracy: 0.8578571428571429
Test Accuracy: 0.8542857142857143
Mean Precision: 0.8065827228327229
Mean Recall: 0.862317218960127
Mean 

### 1.3 K-Nearest Numbers

Here, we classify the data sample into one of the target classes by evaluating which class has more data points in its neighbourhood (Euclidean Distance).

In [11]:
# Preparing featureset and targets from dataset
data = np.asarray(dataset)
featureset = np.delete(data, 1024, axis=1)
target = data.T[1024]

# testing the KNN algorithm with k=800
x = featureset
y = target

train_accuracy = list()
test_accuracy = list()
Pscores = list()
Rscores = list()
Fscores = list()
Sscores = list()

TrainAcc = list()
TestAcc = list()
Prec = list()
Recall = list()
Spec = list()

for obs_label in range(6):
    for i in range(5):
        x_train, x_test, y_train, y_test = au.cross_val_split(x, y, 5)[i]
        model_knn = pu.KNN(k=10)
        y_test_hat = model_knn.predict(x_test, x_train, y_train)
        y_train_hat = model_knn.predict(x_train, x_train, y_train)
        train_accuracy.append(cmu.accuracy_calc(y_train, y_train_hat))
        test_accuracy.append(cmu.accuracy_calc(y_test, y_test_hat))
        Pscores.append(cmu.precision_calc(obs_label, y_test, y_test_hat))
        Rscores.append(cmu.recall_calc(obs_label, y_test, y_test_hat))
        Fscores.append(cmu.f1_calc(obs_label, y_test, y_test_hat))
        Sscores.append(cmu.specificity_calc(obs_label,y_test, y_test_hat))
    print("For Class Type", j)
    print("Mean Train Accuracy:", np.mean(train_accuracy))
    print("Mean Test Accuracy:", np.mean(test_accuracy))
    print("Mean Precision:", np.mean(Pscores))
    print("Mean Recall:", np.mean(Rscores))
    print("Mean F1 score:", np.mean(Fscores))
    print("Mean Specificity:", np.mean(Sscores))
    TrainAcc.append(np.mean(train_accuracy))
    TestAcc.append(np.mean(test_accuracy))
    Prec.append(np.mean(Pscores))
    Recall.append(np.mean(Rscores))
    Spec.append(np.mean(Sscores))
    
Macro_prec = np.mean(Prec)
Macro_recall = np.mean(Recall)
Macro_F1 = 2*Macro_prec*Macro_recall/(Macro_prec + Macro_recall)
print('Macro F1-score: %s' % Macro_F1)

For Class Type 0
Mean Train Accuracy: 0.8617857142857142
Mean Test Accuracy: 0.832857142857143
Mean Precision: 0.8279309968134412
Mean Recall: 0.7497721184288348
Mean F1 score: 0.7855974386523604
Mean Specificity: 0.8902497985495568
For Class Type 1
Mean Train Accuracy: 0.8598214285714286
Mean Test Accuracy: 0.8407142857142856
Mean Precision: 0.8462172004700284
Mean Recall: 0.7648084787301407
Mean F1 score: 0.8014707184232959
Mean Specificity: 0.8956741589280464
For Class Type 2
Mean Train Accuracy: 0.861904761904762
Mean Test Accuracy: 0.8323809523809523
Mean Precision: 0.8372225527495952
Mean Recall: 0.7642764987954268
Mean F1 score: 0.7965081400458401
Mean Specificity: 0.8828788689145715
For Class Type 3
Mean Train Accuracy: 0.8620535714285713
Mean Test Accuracy: 0.8325000000000001
Mean Precision: 0.8289414455867273
Mean Recall: 0.7720686463604793
Mean F1 score: 0.796929080985145
Mean Specificity: 0.8765978468710612
For Class Type 4
Mean Train Accuracy: 0.8642142857142858
Mean Test 

### 1.4 Parzen Window density estimates

In [12]:
# Preparing featureset and targets from dataset
data = np.asarray(dataset)
featureset = np.delete(data, 3, axis=1)
target = data.T[3]

# testing the PW algorithm with dist=1
x = au.normalize(au.standardize(featureset))
y = target

train_accuracy = list()
test_accuracy = list()
Pscores = list()
Rscores = list()
Fscores = list()
Sscores = list()

TrainAcc = list()
TestAcc = list()
Prec = list()
Recall = list()
Spec = list()

for obs_label in range(6):
    for i in range(5):
        x_train, x_test, y_train, y_test = au.cross_val_split(x, y, 5)[i]
        model_pw = pu.PW(dist=1)
        y_test_hat = model_pw.predict(x_test, x_train, y_train)
        y_train_hat = model_pw.predict(x_train, x_train, y_train)
        train_accuracy.append(cmu.accuracy_calc(y_train, y_train_hat))
        test_accuracy.append(cmu.accuracy_calc(y_test, y_test_hat))
        Pscores.append(cmu.precision_calc(obs_label, y_test, y_test_hat))
        Rscores.append(cmu.recall_calc(obs_label, y_test, y_test_hat))
        Fscores.append(cmu.f1_calc(obs_label, y_test, y_test_hat))
        Sscores.append(cmu.specificity_calc(obs_label,y_test, y_test_hat))
    print("For Class Type", j)
    print("Mean Train Accuracy:", np.mean(train_accuracy))
    print("Mean Test Accuracy:", np.mean(test_accuracy))
    print("Mean Precision:", np.mean(Pscores))
    print("Mean Recall:", np.mean(Rscores))
    print("Mean F1 score:", np.mean(Fscores))
    print("Mean Specificity:", np.mean(Sscores))
    TrainAcc.append(np.mean(train_accuracy))
    TestAcc.append(np.mean(test_accuracy))
    Prec.append(np.mean(Pscores))
    Recall.append(np.mean(Rscores))
    Spec.append(np.mean(Sscores))
    
Macro_prec = np.mean(Prec)
Macro_recall = np.mean(Recall)
Macro_F1 = 2*Macro_prec*Macro_recall/(Macro_prec + Macro_recall)
print('Macro F1-score: %s' % Macro_F1)

For Class Type 0
Mean Train Accuracy: 0.8578571428571429
Mean Test Accuracy: 0.8542857142857143
Mean Precision: 0.8353541251874075
Mean Recall: 0.8359336004497294
Mean F1 score: 0.8346152615305842
Mean Specificity: 0.8702176255117433
For Class Type 1
Mean Train Accuracy: 0.8567857142857142
Mean Test Accuracy: 0.8585714285714285
Mean Precision: 0.8285446777693066
Mean Recall: 0.842036083602801
Mean F1 score: 0.8345629221569231
Mean Specificity: 0.8703856204014567
For Class Type 2
Mean Train Accuracy: 0.8572619047619047
Mean Test Accuracy: 0.8566666666666667
Mean Precision: 0.8304936738841115
Mean Recall: 0.8409525507127203
Mean F1 score: 0.835075536641009
Mean Specificity: 0.868522949419583
For Class Type 3
Mean Train Accuracy: 0.8563392857142856
Mean Test Accuracy: 0.8603571428571429
Mean Precision: 0.8379047981459173
Mean Recall: 0.8434299464207207
Mean F1 score: 0.8399151900352138
Mean Specificity: 0.8729218713042066
For Class Type 4
Mean Train Accuracy: 0.8561428571428571
Mean Test 