In this exercise, you will use the same data as in exercise 1 with 2×1 vectors, for the ML and MAP classifiers.


Q3-1 

Find the sample mean and covariance for the training set of the two classes in the MNIST dataset
and estimate the probability of the two classes as Gaussian distributions. Based on this, develop an ML
classifier and report the classification accuracy on the test set of the two classes.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import fetch_openml

In [2]:
# Load the MNIST dataset, False makes it return the data as a NumPy array 
mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='liac-arff')

# Flatten the images
X = mnist.data
y = mnist.target
#print(X.shape) #(70000, 784)

# Split the data into a training set and a test set
X_train, y_train = X[:60000], y[:60000]
X_test, y_test = X[60000:], y[60000:]

# Filter the training data for classes 3 and 4
mask_train = np.isin(y_train, ['3', '4'])
X_train, y_train = X_train[mask_train], y_train[mask_train]
mask_test = np.isin(y_test, ['3', '4'])
X_test, y_test = X_test[mask_test], y_test[mask_test]

# Convert labels to integers
y_train = y_train.astype(int)
y_test = y_test.astype(int)

In [3]:
# Use PCA to convert the 11973x784 vectors to 11973x2 vectors
pca_train = PCA(n_components=2)
X_pca_train = pca_train.fit_transform(X_train) #print(X_pca.shape) = (11973, 2)
pca_test = PCA(n_components=2)
X_pca_test = pca_test.fit_transform(X_test) #print(X_pca.shape) = (1992, 2)

# print(X_pca_train.shape) #(11973, 2)
# print(y_train.shape) #(11973,)
# print(X_pca_test.shape) #(1992, 2)
# print(y_test.shape) #(1992, )


In [4]:
# Separate the training data
X_train_3 = X_pca_train[y_train == 3] #(6131, 2)
X_train_4 = X_pca_train[y_train == 4]

# Compute the mean 
mean_train_3 = np.mean(X_train_3, axis=0) #(2,)
mean_train_4 = np.mean(X_train_4, axis=0)

# Compute the covariance
cov_train_3 = np.cov(X_train_3, rowvar=False) #(2,2)
cov_train_4 = np.cov(X_train_4, rowvar=False)

In [5]:
def matrix_inverse(matrix):
    # Calculate the determinant of the matrix
    determinant = np.linalg.det(matrix)
    
    # Check if the matrix is invertible
    if determinant == 0:
        raise ValueError("The matrix is not invertible.")
    
    # Calculate the adjugate of the matrix
    adjugate = np.zeros(matrix.shape)
    for i in range(matrix.shape[0]):
        for j in range(matrix.shape[1]):
            minor = np.delete(np.delete(matrix, i, axis=0), j, axis=1)
            adjugate[j, i] = ((-1) ** (i + j)) * np.linalg.det(minor)
    
    # Calculate the inverse of the matrix
    inverse_matrix = adjugate / determinant
    
    return inverse_matrix

# Compute the inverse of the covariance matrices
inv_cov_train_3 = matrix_inverse(cov_train_3)
inv_cov_train_4 = matrix_inverse(cov_train_4)

# Function to compute the log of the Gaussian pdf
def log_gaussian_pdf(x, mean, inv_cov):
    diff = x - mean
    return -0.5 * np.dot(diff, np.dot(inv_cov, diff))

# Function to classify a sample
def classify(sample):
    # Compute the log-likelihood under each class model
    log_likelihood_3 = log_gaussian_pdf(sample, mean_train_3 , inv_cov_train_3)
    log_likelihood_4 = log_gaussian_pdf(sample, mean_train_4 , inv_cov_train_4)
    
    # Return the class that gives the highest log-likelihood
    return 3 if log_likelihood_3 > log_likelihood_4 else 4

# Classify the test samples
y_pred_test = np.array([classify(x) for x in X_pca_test])

# Compute the classification accuracy
accuracy = np.mean(y_pred_test == y_test)
print("ML classification accuracy:", accuracy)


ML classification accuracy: 0.981425702811245


Q3-2

Now, let’s assume that the prior probabilities for the two classes are p(C1) = 0.58 and p(C2) = 0.42.
Using these prior probabilities, and the means and covariances of the two classes, develop an MAP
classifier and report the classification accuracy on the test set.

In [6]:
# Prior probabilities for the two classes
prior_3 = 0.58
prior_4 = 0.42

# Function to classify a sample
def classify(sample):
    # Compute the log-likelihood under each class model
    log_likelihood_3 = log_gaussian_pdf(sample, mean_train_3, inv_cov_train_3) + np.log(prior_3)
    log_likelihood_4 = log_gaussian_pdf(sample, mean_train_4, inv_cov_train_4) + np.log(prior_4)
    
    # Return the class that gives the highest log-likelihood
    return 3 if log_likelihood_3 > log_likelihood_4 else 4

# Classify the test samples
y_pred_test = np.array([classify(x) for x in X_pca_test])

# Compute the classification accuracy
accuracy = np.mean(y_pred_test == y_test)
print("MAP classification accuracy:", accuracy)

MAP classification accuracy: 0.9774096385542169


Q3-3

Based on the results, do you think that assuming the probability distributions of the two classes
as Gaussian was correct? Explain.

In both MAP and ML classifications, I’ve used the Gaussian probability density function (pdf) as a modeling choice. The high accuracy achieved by both classifiers suggests that the features of classes 3 and 4 may indeed follow a Gaussian distribution. However, further statistical tests would be needed for confirmation.

Q3-4

Compare the ML, MAP, MED, MMD, and kNN classifiers based on the classification accuracy.
Which classifier is the best? Could the inferior classifiers be better for different datasets? Explain.

ML Accuracy: 0.9814 
MAP Accuracy: 0.9774 
MED Accuracy: 0.9764
MMD Accuracy: 0.9819
kNN (k=1): 0.9694
kNN (k=2): 0.9649
kNN (k=3): 0.9769
kNN (k=4): 0.9739
kNN (k=5): 0.9769

When it comes to high accuracy, the MMD and ML classifiers outperform others and their accuracy is very close. The MMD classifier has a slightly higher accuracy than the ML classifier, making it the best classifier for this particular case. However, it's important to note that a classifier that performs well on one dataset may not perform as well on another. For instance, the kNN classifier may work better on a dataset with complex decision boundaries, while the ML or MAP classifiers would be more suitable for a dataset where the class distributions are known and follow the assumed distribution.