# Homework - Part 3 - Gaussian Naive Bayes - Implement

1. Implement Gaussian NB for the data below. This means: calculate prior probabilities and conditional pdfs 
2. Compare test results to those obtained by sklearn Gaussian NB
3. Explain why the second to last (9th) test sample has label 1 and not 0


In [1]:
# Data
import numpy as np
X_train = np.asarray([25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8,   # class 0
                      27.3, 30.1, 17.4, 29.5, 15.1]).reshape(-1,1)            # class 1
                     
y_train = np.asarray([0,0,0,0,0,0,0,0,0,1,1,1,1,1])

X_test = np.asarray([17.1, 21.8, 18.1, 31.7, 39.2, 20.4, 27.1, 30.2, 7.1, 25.4]).reshape(-1,1)       
y_test = np.asarray([  0,   0,    0,    1,    1,    0,    1,     1,    1,   0])        

In [2]:
#Calculate the priors
class_0_count = np.count_nonzero(y_train == 0)
class_1_count = np.count_nonzero(y_train == 1)

total_count = len(y_train)

class_0_prior = class_0_count / total_count
class_1_prior = class_1_count / total_count

#Print the priors
print("Prior of class 0: {:.3f}".format(class_0_prior))
print("Prior of class 1: {:.3f}".format(class_1_prior))

Prior of class 0: 0.643
Prior of class 1: 0.357


In [3]:
#Subset of X_train to class 0 and class 1 
X_train_class0 = X_train[y_train == 0]  
X_train_class1 = X_train[y_train == 1]    

mu_class0 = np.mean(X_train_class0)
mu_class1 = np.mean(X_train_class1)

sd_class0 = np.std(X_train_class0)
sd_class1 = np.std(X_train_class1)

print("Mean of class 0: {:.3f}".format(mu_class0))
print("Mean of class 1: {:.3f}".format(mu_class1))
print("Standard deviation of class 0: {:.3f}".format(sd_class0))
print("Standard deviation of class 1: {:.3f}".format(sd_class1))

Mean of class 0: 21.644
Mean of class 1: 23.880
Standard deviation of class 0: 2.219
Standard deviation of class 1: 6.341


In [4]:
import math

#Building the model
y_pred = []
for i in X_test:
    #Calculate continuous pdf for class 0
    conti_pdf_class0 = (1 / (math.sqrt(2 * math.pi * sd_class0**2))) * math.exp(-0.5*((i - mu_class0) / sd_class0)**2)
    
    #Calculate continuous pdf for class 1
    conti_pdf_class1 = (1 / (math.sqrt(2 * math.pi * sd_class1**2))) * math.exp(-0.5*((i - mu_class1) / sd_class1)**2)
    
    #Calculate posterior probability for class 0
    prob_class0 = conti_pdf_class0 * class_0_prior
    
    #Calculate posterior probability for class 1
    prob_class1 = conti_pdf_class1 * class_1_prior
    
    #MAP classifier
    if prob_class0 > prob_class1:
        y_pred.append(0)
    else:
        y_pred.append(1)
        
print("Predicted classes: ", y_pred)

#Accuracy
accuracy = sum(y_test == y_pred) / len(y_test) * 100
print("Accuracy: {:.1f}%".format(accuracy))

Predicted classes:  [0, 0, 0, 1, 1, 0, 1, 1, 1, 0]
Accuracy: 100.0%


In [5]:
from sklearn.naive_bayes import GaussianNB

#Gaussian classifier from sklearn
model = GaussianNB()
model.fit(X_train, y_train)
y_pred_model = model.predict(X_test)

model_accuracy = sum(y_pred_model == y_test) / len(y_pred_model) * 100
print("Accuracy: {:.1f}%".format(model_accuracy))

Accuracy: 100.0%


In [6]:
#Equal check
if np.array_equal(y_pred, y_pred_model):
    print("The two classifiers are equivalent and predict the same classes.")
else:
    print("The two classifiers are different.")
    
print(y_pred == y_pred_model)

The two classifiers are equivalent and predict the same classes.
[ True  True  True  True  True  True  True  True  True  True]


### 3.
The reason why the second to last (9th) test sample has label 1 and not 0 is because the data appears to follow a Gaussian distribution, where most of the instances close to the mean are labeled as 0, and those that are far from the mean are labeled as 1. In this case, the 9th test sample has a small value of 7.1, which is much smaller than the other values in the data. As a result, it is considered an outlier and is far from the mean, which leads to it being labeled as 1.