# IRIS classification with the multivariate Gaussian

In this notebook, we return to IRIS classification, using the full set of 4 features.

**Note:** You can use built-in code for mean, variance, covariance, determinant, etc.

## 1. Load in the data 

As in the bivariate case, we start by loading in the IRIS data set.
Recall that there are 150 data points, each with 4 features and a label (0,1,2). As before, we will divide this into a training set of 105 points and a test set of 45 points.

In [1]:
# Standard includes
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Useful module for dealing with the Gaussian density
from scipy.stats import norm, multivariate_normal #in case you use buit-in library
from sklearn import datasets

In [2]:
# Load data set.
iris = datasets.load_iris()
X = iris.data
Y = iris.target
featurenames = ['petal_length', 'petal_width', 'sepal_length', 'sepal_width']

# Split 150 instances into training set (trainx, trainy) of size 105 and test set (testx, testy) of size 45
np.random.seed(0)
perm = np.random.permutation(150)
trainx = X[perm[0:105],:]
trainy = Y[perm[0:105]]
testx = X[perm[105:150],:]
testy = Y[perm[105:150]]

## 2. Fit a Gaussian generative model

We now define a function that fits a Gaussian generative model to the data.
For each class (`j=0,1,2`), we have:
* `pi[j]`: the class weight
* `mu[j,:]`: the mean, a 4-dimensional vector
* `sigma[j,:,:]`: the 4x4 covariance matrix

This means that `pi` is a 3x1 array, `mu` is a 3x13 array and `sigma` is a 3x13x13 array.

In [3]:
def fit_generative_model(x,y):
    k = 3  # labels 1,2,...,k
    d = (x.shape)[1]  # number of features
    mu = np.zeros((k,d))
    sigma = np.zeros((k,d,d))
    pi = np.zeros(k)
    for label in range(0,k):
        indices = (y == label)
        ### START CODE HERE ###
        mu[label] = x[indices,:].mean(axis = 0)
        sigma[label] = np.cov(x[indices, :], rowvar = False)
        pi[label] = len(x[indices, :]) / len(y)
        ### END CODE HERE ###
    return mu, sigma, pi

In [4]:
# Fit a Gaussian generative model to the training data
mu, sigma, pi = fit_generative_model(trainx,trainy)

In [5]:
print(mu.shape)
print("________________________________________________________________")
print(sigma.shape)
print("________________________________________________________________")
print(pi.shape)

(3, 4)
________________________________________________________________
(3, 4, 4)
________________________________________________________________
(3,)


In [6]:
sigma

array([[[0.14314394, 0.12413826, 0.03385417, 0.01444129],
        [0.12413826, 0.17030303, 0.02354167, 0.01370265],
        [0.03385417, 0.02354167, 0.03729167, 0.00645833],
        [0.01444129, 0.01370265, 0.00645833, 0.01132576]],

       [[0.26969697, 0.09757576, 0.1869697 , 0.05181818],
        [0.09757576, 0.11304813, 0.09286988, 0.0426738 ],
        [0.1869697 , 0.09286988, 0.22174688, 0.06320856],
        [0.05181818, 0.0426738 , 0.06320856, 0.03213012]],

       [[0.32970128, 0.08479374, 0.22034139, 0.04783784],
        [0.08479374, 0.08509246, 0.06652916, 0.05216216],
        [0.22034139, 0.06652916, 0.22130868, 0.04783784],
        [0.04783784, 0.05216216, 0.04783784, 0.08108108]]])

## 3. Use the model to make predictions on the test set

<font color="magenta">**For you to do**</font>: Define a general purpose testing routine that takes as input:
* the arrays `pi`, `mu`, `sigma` defining the generative model, as above
* the test set (points `tx` and labels `ty`)
* a list of features `features` (chosen from 0-3)

It should return the number of mistakes made by the generative model on the test data, *when restricted to the specified features*. For instance, using the just two features 0 (`'petal_length'`), 1 (`'petal_width'`) results in 7 mistakes (out of 45 test points), so 

        `test_model(mu, sigma, pi, [0,1], testx, testy)` 

should print 7/45.

**Hint:** The way you restrict attention to a subset of features is by choosing the corresponding coordinates of the full 4-dimensional mean and the appropriate submatrix of the full 4x4 covariance matrix.

In [7]:
def NormalPDF(x, mu, covar, pi):
    d = len(mu)
    expPart = -0.5 * np.matmul(np.matmul(np.transpose(x - mu), np.linalg.inv(covar)), x-mu)
    consPart = 1/(np.power(2*np.pi, d/2)* np.sqrt(np.linalg.det(covar)))
    res = pi * consPart * np.exp(expPart)
    return np.log(res)

In [16]:
# Now test the performance of a predictor based on a subset of features
def test_model(mu, sigma, pi, features, tx, ty):
    nt = len(ty)
    k = 3 
    score = np.zeros((nt,k))
    for i in range(0,nt):
        for label in range(0,k):
            ### START CODE HERE ###
            # Implement the formula for normal pdf. 
            # If you can't, use the built-in multivariate_normal.logpdf but to get the full grades you should implement your own 
            score[i,label] = NormalPDF(tx[i][features], mu[label][features], sigma[label][features][:,features], pi[label])
    predictions = np.argmax(score, axis = 1)
    ### END CODE HERE ###
    # Finally, tally up score
    errors = np.sum(predictions != ty)
    print (str(errors) + '/' + str(nt))
    print("test Error = %.2f%%"%(100*errors / nt))

### Question

Exercise 1. How many errors are made on the test set when using the single feature 'petal_length'?

In [17]:
test_model(mu, sigma, pi, [0], testx, testy)

12/45
test Error = 26.67%


Exercise 2. How many errors when using 'petal_length' and 'petal_width'?

In [18]:
test_model(mu, sigma, pi, [0, 1], testx, testy)

10/45
test Error = 22.22%


Exercise 3. How many errors when using all the 4 features?

In [19]:
test_model(mu, sigma, pi, [0, 1, 2, 3], testx, testy)

2/45
test Error = 4.44%
