###### ### The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2021 Semester 1

## Assignment 1: Pose classification with naive Bayes

###### Submission deadline: 7 pm, Monday 12 Apr 2021

**Student Name(s):**  'Jeremy Huang', 'Keyi Xiao'

**Student ID(s):**     `1073721`
                       `1046432`


# Imports and default values

In [1]:
#import all the important libaray for the project
import pandas as pd; import numpy as np; import math
import matplotlib.pyplot as plt; import copy
from scipy.stats import norm

# Preprocess

In [2]:
def preprocess(the_data, missing_value):
    """
       This function Reads file and converts it into a useful format for training and testing and return a dictionary
    """
    training = pd.read_csv(the_data, header=None); means = []
    
    # Delete the rows with average value equals to 9999
    trimmed = training[round(training.iloc[:,1:].mean(axis=1))!= missing_value].replace(9999, np.nan).reset_index(drop=True)
    labels = list(trimmed[0]); labels = sorted(set(labels), key=labels.index); indexs = trimmed[0]
    
    # Make each pose a independent key in a dictionary and the data is put in them accordingly
    classes = {}; new_class = {}
    for name in labels:
        classes[name] = [trimmed.iloc[i, 1:] for i in range(trimmed.shape[0]) if trimmed[0][i] == name]
    
    # To replace all np.nan to the mean from each point in each pose
    for name in labels:
        each = pd.DataFrame(classes[name]); means = list(each.describe().loc['mean'])
        for i in range(22):
            each.iloc[:, i] = each.iloc[:, i].replace(np.nan, means[i])
        new_class[name] = each
    
    # To reshape the data and make it easy to train
    result = new_class[list(new_class.keys())[0]]
    for name in range(1, len(new_class.keys())):
        result = result.append(new_class[list(new_class.keys())[name]])
    result[0] = indexs; result = result.set_index(0)
    
    return labels, result

# Train

In [3]:
def train(labels, cleaned_data):
    """
        This function calculates prior probabilities and return a trained model(a dictionary) with mean, standard deviation
    """
    # Get labels and get teh prior for each pose(class)
    prob_class =  [cleaned_data.loc[name].shape[0] for name in labels]
    total = sum(prob_class)
    for i in range(len(prob_class)):
        prob_class[i] = prob_class[i]/total

    # Make a dictionary contains all the features(prior, mean and std) needed to predict 
    model = {}; check = dict(zip(labels, prob_class))
    for nam in labels:
        small = cleaned_data.loc[nam].describe(); model[nam] = {'prior': check[nam]}
        model[nam]['mean'] = list(small.loc['mean']); model[nam]['std'] = list(small.loc['std'])

    return model

# Predict

In [4]:
def predict(labels, cleaned_data, model):
    '''
        This function predicts classes for new items in a test dataset (re-use the training data as a test set)
    '''
    # Loop through the training data to collect the result after predicting
    prediction = []; label_name = labels.copy()
        
    for idx in range(cleaned_data.shape[0]):
        name = list(cleaned_data.index.tolist())[idx]
        instance = cleaned_data.iloc[idx, :]
        score = []

        for pose in model.keys():
            # Prior probability from each class            
            probs = math.log2(model[pose]['prior'])

            # Likelihoods plus the prior from class using log
            for i, value in enumerate(instance):
                mu = model[pose]['mean'][i]; sigma = model[pose]['std'][i]
                if value == 0:
                    probs += 10**(-256)
                else:
                    probs += (-(value - mu)**2/ (2*sigma**2) - math.log2(math.sqrt(2*math.pi*sigma**2)))
            score.append(probs)
        # Collect index ,true name labels and the predicted name    
        prediction.append([idx, name, label_name[np.argmax(score)]])

    return prediction

# Evaluate

In [5]:
def evaluate(labels, predictions, beta = 1):
    '''
        This function evaluate the prediction performance by comparing the model’s class outputs and truth labels
    '''
    # Make a confusion matrix for better evaluating
    predicts = pd.DataFrame(predictions, columns=['index', 'actual','predict']).set_index('index')
    total = predicts.shape[0]; true_label = predicts['actual']; predict_label = predicts['predict']
    confusion_dict = {}; confusion = pd.crosstab(true_label, predict_label); total_TP = []; total_precision = []
    total_recall = []; total_F_score = []

    # Get the performance data from confusion matrix
    for name in labels:
        confusion_dict[name] = {'TFPN': [], 'precision': 0.0, 'recall': 0.0, 'F-score': 0.0}
        TP = confusion[name][name]; FP = sum(confusion.loc[name]) - TP
        FN = sum(confusion[name]) - TP; TN = total - (TP + FP + FN)
        precision = TP / (TP + FP); recall = TP / (TP + FN)
        F_score = ((1 + beta * beta) * precision * recall) / ((beta * beta * precision) + recall)
        confusion_dict[name]['TFPN'] = [TP, FP, FN, TN]
        confusion_dict[name]['precision'] = precision; confusion_dict[name]['recall'] = recall
        confusion_dict[name]['F_score'] = F_score
        total_TP.append(TP); total_precision.append(precision)
        total_recall.append(recall); total_F_score.append(F_score)
    

    # Print performance analysis for all data and each class
    print("Correct Prection: {}\nTotal_Accuracy: {}\nTotal_Precision: {}\nTotal_Recall: {}\nTotal_F_score: {}\n".format(sum(total_TP), (sum(total_TP) / total), (sum(total_precision) / len(labels)), (sum(total_recall) / len(labels)), (sum(total_F_score) / len(labels)) ))
    print("-------------------------")
    for name in labels:
        print("Pose : {}\n*********************\nTP, FP, FN, TN: {}\nPrecision: {}\nRecall: {}\nF_score: {}\n".format(name,\
                confusion_dict[name]['TFPN'], confusion_dict[name]['precision'], 
                confusion_dict[name]['recall'], confusion_dict[name]['F_score']))
    
    return confusion_dict

# Implementation

# 1. Prepocess 
      |
      |
     \|/
# 2. Train
      |
      |
     \|/
# 3. Predict
      |
      |
     \|/
# 4. Evaluate

In [6]:
# Get information from "README.txt" from Canvas
train_data = 'train.csv'; test_data = 'test.csv'; not_detected = 9999; index_num = 22

In [7]:
labels, cleaned = preprocess(train_data, not_detected) # Use the train.csv as the training data

trained = train(labels, cleaned) # Train a model from preprocessed train.csv

predictions = predict(labels, cleaned, trained) # To collect the prediction from the trained model

evaluations = evaluate(labels, predictions) # Evaluate the overall performance according to each pose(class) and print them out

Correct Prection: 650
Total_Accuracy: 0.8879781420765027
Total_Precision: 0.8903779417864419
Total_Recall: 0.9026112839276113
Total_F_score: 0.8870969959345972

-------------------------
Pose : bridge
*********************
TP, FP, FN, TN: [38, 40, 5, 649]
Precision: 0.48717948717948717
Recall: 0.8837209302325582
F_score: 0.628099173553719

Pose : childs
*********************
TP, FP, FN, TN: [59, 4, 1, 668]
Precision: 0.9365079365079365
Recall: 0.9833333333333333
F_score: 0.9593495934959351

Pose : downwarddog
*********************
TP, FP, FN, TN: [94, 4, 42, 592]
Precision: 0.9591836734693877
Recall: 0.6911764705882353
F_score: 0.8034188034188035

Pose : mountain
*********************
TP, FP, FN, TN: [148, 12, 0, 572]
Precision: 0.925
Recall: 1.0
F_score: 0.961038961038961

Pose : plank
*********************
TP, FP, FN, TN: [53, 4, 5, 670]
Precision: 0.9298245614035088
Recall: 0.9137931034482759
F_score: 0.9217391304347825

Pose : seatedforwardbend
*********************
TP, FP, FN, TN:

## Questions 

If you are in a group of 2, you will respond to **four** questions of your choosing.

A response to a question should take about 100–250 words, and make reference to the data wherever possible.

#### NOTE: you may develope codes or functions to help respond to the question here, but your formal answer should be submitted separately as a PDF.

### Q1
Since this is a multiclass classification problem, there are multiple ways to compute precision, recall, and F-score for this classifier. Implement at least two of the methods from the "Model Evaluation" lecture and discuss any differences between them. (The implementation should be your own and should not just call a pre-existing function.)

In [8]:
True_Positive = []; False_Positive = []; False_negative = []; Total_Precision = []; Total_Recall = []; Total_F_Score = []
 
for name in labels:
    the_info = evaluations[name]
    # Collect all the performance data
    True_Positive.append(the_info['TFPN'][0]); False_Positive.append(the_info['TFPN'][1]); False_negative.append(the_info['TFPN'][2])
    Total_Precision.append(the_info['precision']); Total_Recall.append(the_info['recall']); Total_F_Score.append(the_info['F_score'])

TP_Sum = sum(True_Positive); FP_Sum = sum(False_Positive); FN_Sum = sum(False_negative)

# Micro Average Values
micro_average_precision = TP_Sum / (TP_Sum + FP_Sum)
micro_average_recall = TP_Sum / (TP_Sum + FN_Sum)
micro_average_F_score = (2 * micro_average_precision * micro_average_recall) / ((1 * micro_average_precision) + micro_average_recall)

# Macro Average Values
macro_average_precision = sum(Total_Precision) / len(labels)
macro_average_recall = sum(Total_Recall) / len(labels)
macro_average_F_score = sum(Total_F_Score) / len(labels)

print("Micro_average_precision: {}\nMicro_average_recall:    {}\nMicro_average_F_score:   {}\n-------------------------------------------\nMacro_average_precision: {}\nMacro_average_recall:    {}\nMacro_average_F_score:   {}\n".format(micro_average_precision, micro_average_recall, micro_average_F_score, macro_average_precision, \
    macro_average_recall, macro_average_F_score))

Micro_average_precision: 0.8879781420765027
Micro_average_recall:    0.8879781420765027
Micro_average_F_score:   0.8879781420765027
-------------------------------------------
Macro_average_precision: 0.8903779417864419
Macro_average_recall:    0.9026112839276113
Macro_average_F_score:   0.8870969959345972



### Q2
The Gaussian naıve Bayes classifier assumes that numeric attributes come from a Gaussian distribution. Is this assumption always true for the numeric attributes in this dataset? Identify some cases where the Gaussian assumption is violated and describe any evidence (or lack thereof) that this has some effect on the classifier’s predictions.

In [9]:
for name in labels:
    test_model = trained[name]
    space = []
    for feature in range(len(test_model['mean'])):
        # Get mean and std, then make a table to fit the data
        mu = test_model['mean'][feature]; sigma = test_model['std'][feature]
        each_feature = cleaned.loc[name].iloc[feature]
        matrix = np.linspace(min(each_feature), max(each_feature))

        # Plotting
        plt.hist(each_feature, bins=30, density=True)
        plt.plot(matrix, norm.pdf(matrix, mu, sigma))
        plt.show()

### Q3
Implement a kernel density estimate (KDE) naive Bayes classifier and compare its performance to the Gaussian naive Bayes classifier. Recall that KDE has kernel bandwidth as a free parameter -- you can choose an arbitrary value for this, but a value in the range 5-25 is recommended. Discuss any differences you observe between the Gaussian and KDE naive Bayes classifiers. (As with the Gaussian naive Bayes, this KDE naive Bayes implementation should be your own and should not just call a pre-existing function.)

In [9]:
def kdeTrain(labels, cleaned_data):
    """
        This function calculates prior probabilities and return a trained model(a dictionary) with points
    """
    # Get labels and get teh prior for each pose(class)
    prob_class =  [cleaned_data.loc[name].shape[0] for name in labels]
    total = sum(prob_class)
    for i in range(len(prob_class)):
        prob_class[i] = prob_class[i]/total

    # Make a dictionary contains all the features(prior, mean and std) needed to predict 
    model = {}; check = dict(zip(labels, prob_class))
    for nam in labels:
        small = cleaned_data.loc[nam].reset_index(drop=True)
        model[nam] = {'prior': check[nam]}; model[nam]['point'] = small

    return model

In [10]:
def kdePredict(labels, cleaned_data, model):
    '''
        This function predicts classes for new items in a test dataset (re-use the training data as a test set)
    '''
    # Loop through the training data to collect the result after predicting
    prediction = []; label_name = labels.copy()
        
    for idx in range(cleaned_data.shape[0]):
        name = list(cleaned_data.index.tolist())[idx]
        score = []; k = 5
        for pose in model.keys():
            # Prior probability from each class            
            probs = math.log2(model[name]['prior'])
            # Likelihoods plus the prior from class using log
            for i in range(1, model[pose]['point'].shape[0]):
                each_diff = []; each_pdf = []
                
                # Get all the difference from test point to each point
                for x_test in model[pose]['point'].loc[i]:
                    x_test = float(x_test)
                    for x_train in model[pose]['point'].loc[i]:
                        x_train = float(x_train)
                        each_diff.append(x_test - x_train)

                # Get all the pdf from each difference 
                for diff in each_diff:
                    each_pdf.append(norm.pdf(diff, 0, k))
                probs += math.log2(sum(each_pdf) / model[pose]['point'].shape[0])
            score.append(probs)
        # Collect index ,true name labels and the predicted name    
        prediction.append([idx, name, label_name[np.argsort(score)[-2]]])

    return prediction

In [45]:
kde_label, kde_test = preprocess(test_data, not_detected)
kde_trained = kdeTrain(kde_label, kde_test)

In [56]:
kde_trained['bridge']['point'].loc[0]

1     126.835800
2      99.927500
3     -55.201525
4     -40.689275
5      47.555100
6      -9.784800
7       7.377900
8     -65.100400
9     -62.378800
10    -80.844800
11    -63.587400
12      2.479700
13    -14.561300
14    -57.158950
15    -90.310600
16    -30.339200
17    -41.216300
18     23.314600
19     57.562500
20    -24.693000
21     61.509400
22    -34.056500
Name: 0, dtype: float64

In [28]:
kdePredict = kdePredict(labels, kde_test, kde_trained)

['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.0]
['bridge', -3.

In [22]:
evaluate(labels, kdePredict)

ValueError: DataFrame constructor not properly called!

### Q4
Instead of using an arbitrary kernel bandwidth for the KDE naive Bayes classifier, use random hold-out or cross-validation to choose the kernel bandwidth. Discuss how this changes the model performance compared to using an arbitrary kernel bandwidth.

In [None]:
from sklearn.grid_search import GridSearchCV


### Q5
Naive Bayes ignores missing values, but in pose recognition tasks the missing values can be informative. Missing values indicate that some part of the body was obscured and sometimes this is relevant to the pose (e.g., holding one hand behind the back). Are missing values useful for this task? Implement a method that incorporates information about missing values and demonstrate whether it changes the classification results.

### Q6
Engineer your own pose features from the provided keypoints. Instead of using the (x,y) positions of keypoints, you might consider the angles of the limbs or body, or the distances between pairs of keypoints. How does a naive Bayes classifier based on your engineered features compare to the classifier using (x,y) values? Please note that we are interested in explainable features for pose recognition, so simply putting the (x,y) values in a neural network or similar to get an arbitrary embedding will not receive full credit for this question. You should be able to explain the rationale behind your proposed features. Also, don't forget the conditional independence assumption of naive Bayes when proposing new features -- a large set of highly-correlated features may not work well.