# Classification of Fruits and Vegies
I used the dataset of fruits and vegetables that was collected in our class. Instead of operating on the raw pixel values, I operate on extracted HSV colorspace histogram features from the image. HSV histogram features extract the color spectrum of an image, so I expect these features to serve well for distinguishing produce like bananas from apples.

# Part 1: Dimentionality Reduction
The input state x ∈ R^729, which is an HSV histogram generated from an RGB image with a fruit centered in it. Each data point will have a corresponding class label, which corresponds to their matching produce. Given 25 classes, I can denote the label as y ∈ {0, ..., 24}.


Classification here is still a hard problem because the state space is much larger than the amount of data we obtained in the class – I am trying to perform classification in a 729 dimensional space with only a few hundred data points from each of the 25 classes. In order to obtain higher accuracy, I will examine how to perform hyper-parameter optimization and dimensionality reduction. 

I will first build out each component and test on a smaller dataset of just 3 categories: apple, banana, eggplant. Then I will combine the components to perform a search over the entire dataset.


Before I classify our data, I will study how to reduce the dimensionality of our data. I will project some of the dataset into 2D to visualize how effective different dimensionality reduction procedures are. 

* 1st method: random projection, where a matrix is randomly created and the data is linearly projected along it. For random projections, it produces a matrix, A ∈ R2×729 where each element Aij is sampled independently from a normal distribution (i.e. Aij ∼ N (0, 1)).

In [2]:
from numpy.random import uniform
from numpy.random import randn
import random
import time
import matplotlib.pyplot as plt
from scipy.linalg import eig
from scipy.linalg import sqrtm
from numpy.linalg import inv
from numpy.linalg import svd
from utils import create_one_hot_label
from utils import subtract_mean_from_data
from utils import compute_covariance_matrix
import numpy as np
import numpy.linalg as LA
import sys
from numpy.linalg import svd
import IPython

In [4]:
class Project2D():

    ''' Class to draw projection on 2D scatter space'''

    def __init__(self,projection, clss_labels):

        self.proj = projection
        self.clss_labels = clss_labels


    def project_data(self,X,Y,white=None):

        '''
        Takes list of state space and class labels
        State space should be 2D
        Labels shoud be int'''

        p_a = []
        p_b = []
        p_c = []

        ###PROJECT ALL DATA###
        proj = np.matmul(self.proj,white)
        X_P = np.matmul(proj,np.array(X).T)

        for i in range(len(Y)):
            if Y[i] == 0:
                p_a.append(X_P[:,i])
            elif Y[i] == 1:
                p_b.append(X_P[:,i])
            else:
                p_c.append(X_P[:,i])

        p_a = np.array(p_a)
        p_b = np.array(p_b)
        p_c = np.array(p_c)

        plt.scatter(p_a[:,0],p_a[:,1],label = 'apple')
        plt.scatter(p_b[:,0],p_b[:,1],label = 'banana')
        plt.scatter(p_c[:,0],p_c[:,1],label = 'eggplant')
        plt.legend()
        plt.show()


In [None]:
class Projections():

    def __init__(self,dim_x,classes):
        '''
        dim_x: the dimension of the state space x
        classes: The list of class labels'''

        self.d_x = dim_x
        self.NUM_CLASSES = len(classes)


    def get_random_proj(self):

        '''
        Return A which is size 2 by 729'''

        return randn(2,self.d_x)


    def pca_projection(self,X,Y):

        '''
        Return U_2^T '''
                
        X,Y= subtract_mean_from_data(X,Y)
        C_XX = compute_covariance_matrix(X,X)
        u,s,d = svd(C_XX)
        return u[:,0:2].T

    def cca_projection(self,X,Y,k=2):

        '''
        Return U_K^T, \Simgma_{XX}^{-1/2} '''

        Y = create_one_hot_label(Y,self.NUM_CLASSES)
        X,Y = subtract_mean_from_data(X,Y)


        C_XY = compute_covariance_matrix(X,Y)
        C_XX = compute_covariance_matrix(X,X)
        C_YY = compute_covariance_matrix(Y,Y)

        dim_x = C_XX.shape[0]
        dim_y = C_YY.shape[0]

        A = inv(sqrtm(C_XX+1e-5*np.eye(dim_x)))
        B = inv(sqrtm(C_YY+1e-5*np.eye(dim_y)))
        C = np.matmul(A,np.matmul(C_XY,B))

        u,s,d = svd(C)
        return u[:,0:k].T, A

    def project(self,proj,white,X):
        '''
        proj, numpy matrix to perform projection
        whit, numpy matrix to perform whitenting
        X, list of states
        '''

        proj = np.matmul(proj,white)
        X_P = np.matmul(proj,np.array(X).T)
        return list(X_P.T)