# Assignment 01

Note: this is the 2017 version of this assignment.

In this assignment you will practice putting together a simple image classification pipeline, based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

* understand the basic Image Classification pipeline and the data-driven approach (train/predict stages)
* understand the train/val/test splits and the use of validation data for hyperparameter tuning.
* develop proficiency in writing efficient vectorized code with numpy
* implement and apply a k-Nearest Neighbor (kNN) classifier
* implement and apply a Multiclass Support Vector Machine (SVM) classifier
* implement and apply a Softmax classifier
* implement and apply a Two layer neural network classifier
* understand the differences and tradeoffs between these classifiers
* get a basic understanding of performance improvements from using higher-level representations than raw pixels (e.g. color histograms, Histogram of Gradient (HOG) features)

To access the data files, follow the instruction.

In [3]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

Loaded in this way, each of the batch files contains a dictionary with the following elements:

* data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.

* labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object.

It has the following entries:<br>
label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. <br>
For example, label_names[0] == "airplane", label_names[1] == "automobile", etc.

## Q1: k-Nearest Neighbor classifier (20 points)
The IPython Notebook knn.ipynb will walk you through implementing the kNN classifier.

In [2]:
import numpy as np

class NearestNeighbor:
    def __init__(self):
        pass
    
    def train(self, X, y):
        """ X is N x D where each row is an example. Y is 1-dimension of size N """
        # the nearest neighbor classifier simply remembers all the training data
        self.Xtr = X
        self.ytr = y
        
    def predict(self, X):
        """ X is N x D where each row is an example we wish to predict label for """
        num_test = X.shape[0]
        # lets make sure that the output type matches the inpute type
        Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
        
        # loop over all test rows
        for i in range(num_test):
            # find the nearest training image to the i'th test image
            # using the L1 distance (sum of absolute value differences)
            distances = np.sum(np.abs(self.Xtr - x[i,:]), axis = 1)
            min_index = np.argmin(distances) # get the index with smallest distance
            Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
        
        return Ypred

## Q2: Training a Support Vector Machine (25 points)
The IPython Notebook svm.ipynb will walk you through implementing the SVM classifier.

## Q3: Implement a Softmax classifier (20 points)
The IPython Notebook softmax.ipynb will walk you through implementing the Softmax classifier.

## Q4: Two-Layer Neural Network (25 points)
The IPython Notebook two_layer_net.ipynb will walk you through the implementation of a two-layer neural network classifier.

## Q5: Higher Level Representations: Image Features (10 points)
The IPython Notebook features.ipynb will walk you through this exercise, in which you will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

## Q6: Cool Bonus: Do something extra! (+10 points)
Implement, investigate or analyze something extra surrounding the topics in this assignment, and using the code you developed. For example, is there some other interesting question we could have asked? Is there any insightful visualization you can plot? Or anything fun to look at? Or maybe you can experiment with a spin on the loss function? If you try out something cool we’ll give you up to 10 extra points and may feature your results in the lecture.