# Classifying IRIS species using univariate Gaussian Classifier

**Note:** You can use built-in code for mean, variance, covariance, determinant, etc.

In [1]:
# Standard includes
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Useful module for dealing with the Gaussian density
from scipy.stats import norm, multivariate_normal #in case you use buit-in library
# installing packages for interactive graphs
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive, fixed, interact_manual, IntSlider
from sklearn import datasets

ModuleNotFoundError: No module named 'ipywidgets'

### Loading the IRIS dataset

In [2]:
iris = datasets.load_iris()
X = iris.data
Y = iris.target
featurenames = ['petal_length', 'petal_width', 'sepal_length', 'sepal_width']

Confirm the dimensions:

In [3]:
X.shape, Y.shape

((150, 4), (150,))

In [4]:
# Split 150 instances into training set (trainx, trainy) of size 105 and test set (testx, testy) of size 45
np.random.seed(0)
perm = np.random.permutation(150)
trainx = X[perm[0:105],:]
trainy = Y[perm[0:105]]
testx = X[perm[105:150],:]
testy = Y[perm[105:150]]

Let's see how many training points there are from each class.

In [5]:
sum(trainy==0), sum(trainy==1), sum(trainy==2)

(33, 34, 38)

### Q1. Can you figure out how many test points there are from each class? 

In [6]:
# TODO: add your code to find how many test points there are from each class
sum(testy==0), sum(testy==1), sum(testy==2)

(17, 16, 12)

### Look at the distribution of a single feature from one of the species

Let's pick just one feature: 'petal_length'. This is the first feature, that is, number 0. Here is a *histogram* of this feature's values under species 1, along with the *Gaussian fit* to this distribution.

<img src="density.png">

In [11]:
@interact_manual( feature=IntSlider(0,0,3), label=IntSlider(0,0,2))
def density_plot(feature, label):
    plt.hist(trainx[trainy==label,feature], density=True)
    #
    mu = np.mean(trainx[trainy==label,feature]) # mean
    var = np.var(trainx[trainy==label,feature]) # variance
    std = np.sqrt(var) # standard deviation
    x_axis = np.linspace(mu - 3*std, mu + 3*std, 1000)
    plt.plot(x_axis, norm.pdf(x_axis,mu,std), 'r', lw=2)
    plt.title("Species "+str(label) )
    plt.xlabel(featurenames[feature], fontsize=14, color='red')
    plt.ylabel('Density', fontsize=14, color='red')
    plt.show()

interactive(children=(IntSlider(value=0, description='feature', max=3), IntSlider(value=0, description='label'…

### Q2. In the function **density_plot**, the code for plotting the Gaussian density focuses on the region within 3 standard deviations of the mean. Do you see where this happens? Why do you think we make this choice?
yes in varfying x axis,this will give us the best distebution covring the plot, and going to zero

### Q3. Here's something for you to figure out: for which feature (0-3) does the distribution of (training set) values for species-2 have the *smallest* standard deviation? what is the value?
with 3 feature we will have the samallest std with mean +3standard daviation equal aproxemantly to 0.28

In [33]:
# modify this cell
### START CODE HERE ###
feature=0
label=2
var = np.var(trainx[trainy==label,feature]) # variance
std = np.sqrt(var) # standard deviation
print('with feature 0',std)

feature=1
label=2
var = np.var(trainx[trainy==label,feature]) # variance
std = np.sqrt(var) # standard deviation
print('with feature 1',std)


feature=2
label=2
var = np.var(trainx[trainy==label,feature]) # variance
std = np.sqrt(var) # standard deviation
print('with feature 2',std)


feature=3
label=2
var = np.var(trainx[trainy==label,feature]) # variance
std = np.sqrt(var) # standard deviation
print('with feature 3',std)




with feature 0 0.5665906200670131
with feature 1 0.28784229292369096
with feature 2 0.46420336550151836
with feature 3 0.2809757434745082


### 3. Fit a Gaussian to each class
Let's define a function that will fit a Gaussian generative model to the three classes, restricted to just a single feature.

In [58]:
# Assumes y takes on values 0,1,2
def fit_generative_model(x,y,feature):
    k = 3 # number of classes
    mu = np.zeros(k+1) # list of means
    var = np.zeros(k+1) # list of variances
    pi = np.zeros(k) # list of class weights
    for label in range(0,k):
        indices = (y==label)
        ### START CODE HERE ###
        mu[label] = np.mean(x[y==label,feature]) # mean
        var[label] = np.var(x[y==label,feature]) # variance 
        pi[label] = sum(trainy==label)/105
        ### END CODE HERE ###
    return mu, var, pi

Call this function on the feature 'petal_length'. What are the class weights?

In [59]:
feature = 0 # 'petal_length'
### START CODE HERE ###
m,v,p = fit_generative_model(trainx,trainy,feature)
print(p)
### END CODE HERE ###

[0.31428571 0.32380952 0.36190476]


Next, display the Gaussian distribution for each of the three classes

In [60]:
@interact_manual( feature=IntSlider(0,0,3) )
def show_densities(feature):
    mu, var, pi = fit_generative_model(trainx, trainy, feature)
    colors = ['r', 'k', 'g']
    for label in range(0,3):
        m = mu[label]
        s = np.sqrt(var[label])
        x_axis = np.linspace(m - 3*s, m+3*s, 1000)
        plt.plot(x_axis, norm.pdf(x_axis,m,s), colors[label], label="species-" + str(label))
    plt.xlabel(featurenames[feature], fontsize=14, color='red')
    plt.ylabel('Density', fontsize=14, color='red')
    plt.legend()
    plt.show()

interactive(children=(IntSlider(value=0, description='feature', max=3), Button(description='Run Interact', sty…

### Questions:

Use the widget above to look at the three class densities for each of the 4 features. Here are some questions for you:
1. For which feature (0-3) do the densities for classes 0 and 2 *overlap* the most?
2. For which feature (0-3) is class 2 the most spread out relative to the other two classes?
3. For which feature (0-3) do the three classes seem the most *separated* (this is somewhat subjective at present)?

How well can we predict the class (0, 1, 2) based just on one feature? The code below lets us find this out.

In [62]:
@interact( feature=IntSlider(0,0,3) )
def test_model(feature):
    mu, var, pi = fit_generative_model(trainx, trainy, feature)

    k = 3 # Labels 0,1,2,...,k
    n_test = len(testy) # Number of test points
    score = np.zeros((n_test,k))
    predictions = np.zeros(n_test)
    for i in range(0,n_test):
        for label in range(0,k):
            exp = np.exp(-1 * np.power(testx[i,feature] - mu[label],2) / 2 * np.power(var[label],2)) 
            pdf = 1 / np.sqrt(np.power(var[label],2) * 2 * np.pi) * exp 
            score[i,label] = np.log(pdf) * pi[label] 
    
    for j in range(n_test):
        predictions[j] = np.argmax(score[j,])
        
    # Finally, tally up score
    errors = np.sum(predictions != testy)
    print ("Test error using feature " + featurenames[feature] + ": " + str(errors) + "/" + str(n_test))

interactive(children=(IntSlider(value=0, description='feature', max=3), Output()), _dom_classes=('widget-inter…

### Questions:
In this notebook, we are looking at classifiers that use just one out of a possible 4 features. Choosing a subset of features is called **feature selection**. In general, this is something we would need to do based solely on the *training set*--that is, without peeking at the *test set*.

For the IRIS data, compute the training error and test error associated with each choice of feature.

In [None]:
#done in the report

Based on your findings, answer the following questions:
* Which two features have the lowest training error? List them in order (best first).
* Which two features have the lowest test error? List them in order (best first).