# Assignment - 1 Bayesian Decision Theory

In this assignment we apply discriminant analysis to recognize the digits in the MNIST datas set. We are using 60000 digit images as the training set while there are 10000 digit images as the testing set which are of 28 x 28 vector. We consider this vector as the features of the digit images. 

In [726]:
from mlxtend.data import loadlocal_mnist
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import cv2 as cv
import math
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from numpy.linalg import inv

## Part 1 - Mean Digits and Standard Deviation Digits

First, we calculate the mean digits of the digits (0-9). We do this by first seperating the training data set into its corresponding digit label. Then add all the corresponding features together and divide it by the number of images of that digit class.

\begin{align}
\mu_i = \frac{\sum_{N_i}(  x_{ij})}{N_i}
\end{align}

Second, we calculate the Standard Deviation by subtracting the Mean of the digit classes from each of the images of that class and squaring the value. We then sum up the values of the corresponding features together and divide by the number of images of that digit class. This gives us the Variance of the digit class. By taking the square root of the value, we arrive at the standard deviation.

\begin{align}
    \sigma_i = \sqrt{\frac{ \sum( x_{ij} - \mu_i)}{N_i}}
\end{align}


In [727]:
trainImgs, trainLabels = loadlocal_mnist( 
        images_path='train-images-idx3-ubyte',
        labels_path='train-labels-idx1-ubyte')
testImgs, testLabels = loadlocal_mnist( 
        images_path='t10k-images-idx3-ubyte',
        labels_path='t10k-labels-idx1-ubyte')


imgDF = pd.DataFrame(trainImgs, trainLabels)

imgTestDF = pd.DataFrame(testImgs)


FileNotFoundError: [Errno 2] No such file or directory: 'train-labels-idx1-ubyte'

In [None]:
def calVarianceStd(digitMean, imgClass):
    digitMean = digitMean.reshape(1,784)
    imgClass = imgClass.values
    diff_sq = np.square(imgClass - digitMean)
    variance = diff_sq.sum(axis=0)/len(imgClass)
    std_deviation = np.sqrt(variance)
    return std_deviation.reshape(28,28)
    

In [None]:
def printMeanStd():
    for i in range(10):
        fileNameMean = "Class " + str(i)+" Mean.png"
        cv.imwrite(fileNameMean, imgDigitMean[i])
        fileNameStd = "Class " + str(i)+" Std_Dev.png"
        cv.imwrite(fileNameStd, imgDigitStd[i])

In [None]:
imgDigitMean = []
imgDigitStd = []

for i in range(10):
    imgClass = imgDF.loc[i]
    imgDigit =  imgClass.sum()/len(imgClass)
    imgDigitMean.append(imgDigit.values.reshape(28,28))
    imgDigitStd.append(calVarianceStd(imgDigitMean[i], imgClass))

printMeanStd()

These are the Mean Digit Images

![Class%200%20Mean.png](attachment:Class%200%20Mean.png) 
![Class%201%20Mean.png](attachment:Class%201%20Mean.png)
![Class%202%20Mean.png](attachment:Class%202%20Mean.png)
![Class%203%20Mean.png](attachment:Class%203%20Mean.png)
![Class%204%20Mean.png](attachment:Class%204%20Mean.png)
![Class%205%20Mean.png](attachment:Class%205%20Mean.png)
![Class%206%20Mean.png](attachment:Class%206%20Mean.png)
![Class%207%20Mean.png](attachment:Class%207%20Mean.png)
![Class%208%20Mean.png](attachment:Class%208%20Mean.png)
![Class%209%20Mean.png](attachment:Class%209%20Mean.png)

The Standard Deviation Images are :
![Class%200%20Std_Dev.png](attachment:Class%200%20Std_Dev.png)
![Class%201%20Std_Dev.png](attachment:Class%201%20Std_Dev.png)
![Class%202%20Std_Dev.png](attachment:Class%202%20Std_Dev.png)
![Class%203%20Std_Dev.png](attachment:Class%203%20Std_Dev.png)
![Class%204%20Std_Dev.png](attachment:Class%204%20Std_Dev.png)
![Class%205%20Std_Dev.png](attachment:Class%205%20Std_Dev.png)
![Class%206%20Std_Dev.png](attachment:Class%206%20Std_Dev.png)
![Class%207%20Std_Dev.png](attachment:Class%207%20Std_Dev.png)
![Class%208%20Std_Dev.png](attachment:Class%208%20Std_Dev.png)
![Class%209%20Std_Dev.png](attachment:Class%209%20Std_Dev.png)

# Part 2 - Classify using Discriminant Analysis

According to Bayesian Decision Rule, we try to minimize the Conditional Risk for the images. Instead of minimizing the Risk, we can maximize the posterior probability to find the corresponding class the image belongs to. Hence we can use Discriminant Analysis to classify the test images.

Since the images are in Multivariate Normal Distribution, we derive the discriminant function by taking the log likelihood of the Guassian Distribution.

### Guassian Distribution

\begin{align}
p(x)=\frac{1}{(2π)^{d/2}|\sum|^{1/2}}exp(−\frac{1}{2}(x−\mu)^{′}\sum^{−1}(x−\mu))
\end{align}

### Log Likelihood

\begin{align}
g_{i}(x) = \log p(x|\omega_i) + \log P(\omega_i)
\end{align}
\begin{align}
= -\frac{1}{2}(x - \mu_{i})^{-1} \sum^{-1}(x - \mu_{i}) - \frac{d}{2} \log (2 \pi) -\frac{1}{2} \log |\sum_i| + \log P(\omega_i)
\end{align}

While calculating the Mean and Covariance, we see that the Covariance matrices are different for each class, hence we shall apply the Case 3 equation. This is the Quadratic Discriminant Function which is expressed as the following:

### Quadratic Discriminant Functions
\begin{align}
g_i = x^{t} W_i x + N_{i}^{t} x + B_{i0}
\end{align}

\begin{align}
where, W_i = - \frac{1}{2} \sum_{i}^{-1} , N_{i} = \sum_{i}^{-1} \mu_{i} and B_{i0} = - \frac{1}{2} \mu_{i}^{t} \sum_{i}^{-1} \mu_{i} + ln P(w_i) - \frac{1}{2} ln |\sum_{i}| 
\end{align}

After finding the values of G(i) for each image, we then maximize G to find the corresponding class the test image belongs to. 


In [None]:
def getCovarianceMat():
    epsilon = 0.1
    covarianceMat = []
    invCovarianceMat = []
    
    for i in range(len(imgDigitStd)):
        var = np.asarray(imgDigitStd[i]).reshape(28*28) + epsilon
        coVar = np.diag(np.square(var))
        covarianceMat.append(coVar)
        
        invCovarianceMat.append(inv(coVar))
    return covarianceMat, invCovarianceMat

In [None]:
def quadDiscrFunc(x):
    g = []
    for i in range(10):
        W = -1/2*invCovarianceMat[i]
        N = np.matmul(invCovarianceMat[i],imgDigitMean[i].reshape(28*28,1))
        s, det = np.linalg.slogdet(covarianceMat[i])
        prior = np.log(1/10)
        prod = np.matmul(-1/2*imgDigitMean[i].reshape(1,28*28),(np.matmul(invCovarianceMat[i],imgDigitMean[i].reshape(28*28,1))))
        B0 = prod + prior
        tempG = np.matmul(np.matmul(x.reshape(1,784),W),x.reshape(784,1)) + np.matmul(np.asarray(N).T,x)+ B0
        g.append(tempG)
    return np.argmax(g)

### 0-1 Error Function

This loss function is used to determine how many of the test images are of the correct label and how much is the error rate. This tells us the accuracy and the error rate. It is given by :


\begin{align}
  \lambda =
\begin{cases}
     0, i=j\\
    \newline 1,i \neq j\\
\end{cases}
\end{align}

In [None]:
def calAccuracy():
    predicted = np.asarray(pred)
    accuracy = (predicted == testLabels).sum()/len(testLabels)*100
    return accuracy, 100-accuracy


In [None]:
covarianceMat, invCovarianceMat = getCovarianceMat()

In [None]:
pred = []
for i in range(imgTestDF.shape[0]):
    pred.append(quadDiscrFunc(imgTestDF.loc[i].values))
#pred

In [None]:
acc , err = calAccuracy()
print ("Accuracy : " + str(acc))
print ("Error Rate : " + str(err))

## Why it doesn't perform as good as many other methods on LeCuns web page?

In this process, we are assuming that the images are in Gaussian Distribution. This may not actually be the case for the images. Since we are finding the discriminant function by utilizing the Gaussian Distribution, the whole process might be wrong and we will get a lower accuracy. The other methods such as SVM, NN all find an optimal hyperplance to seperate the classes to accurately predict the classes. This is not possible by using this method.  


# Part 3 - Fisher Digits


The scatter matrices are pair wise plot of several variables presented in a matrix format.

The Scatter W (Sw) is the scatter plot within the classes, while the Scatter B (Sb) is between the classes.

In [None]:
def scatterW():
    Sw = np.empty([784,784])
    for i in range(len(imgDigitMean)):
        s = np.matmul((imgDF.loc[i] - imgDigitMean[i].reshape(784)).T,(imgDF.loc[i]- imgDigitMean[i].reshape(784)))
        #print (s.shape)
        Sw = Sw + s
    return Sw

In [None]:
def totalMean():
    Mu = np.empty([784])
    for i in range(len(imgDigitMean)):
        Mu = Mu + np.asarray(imgDigitMean[i]).reshape(784) * imgDF.loc[i].shape[0]
        #print (Mu)
    
    return Mu/imgDF.shape[0]

In [None]:
def scatterB():
    Sb = np.empty([784,784])
    for i in range(len(imgDigitMean)):
        Sb = Sb + (imgDF.loc[i].shape[0] * (np.matmul((imgDigitMean[i].reshape(784) - Mu),(imgDigitMean[i].reshape(784) - Mu).T)))
    return Sb
        

In [None]:
Sw = scatterW()
Mu = totalMean()
Sb = scatterB()

Here we find the eigen values and vectors by taking the product of inv(Sw) and Sb
Then we take the eigen vectors which corresponds to eigen values which are high and of the desired Dimensionality. This will be your new feature vector.

In [None]:
mat = np.dot(np.linalg.pinv(Sw), Sb)
eigvals, eigvecs = np.linalg.eig(mat)

np.argmax(eigvals)
eigvals[:300]
W = eigvecs[:3]
W.shape
#eigvecs.shape
y = np.matmul( W,testImgs.T)

y.T.shape
z = y.T