In [1]:
import numpy as np
import data_utils

## Nearest Neighbor Classifier

Firs of all, you need to download the dataset:

``` bash 
$ wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz -P /tmp
$ tar -xzvf /tmp/cifar-10-python.tar.gz -C /tmp
```

Load dataset:

In [2]:
Xtr, Ytr, Xte, Yte = data_utils.load_CIFAR10('/tmp/cifar-10-batches-py/')

Where:
* $Xtr$ is the training set (50000 32x32x3 matrices).
* $Ytr$ is the training labels (50000 numbers).
* $Xte$ is the test set (10000 32x32x3 matrices).
* $Yte$ is the test labels (10000 numbers).

In [3]:
print('Training set shape: ', Xtr.shape)
print('Training labels shape: ', Ytr.shape)
print('Test set shape: ', Xte.shape)
print('Test set shape: ', Yte.shape)

Training set shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test set shape:  (10000, 32, 32, 3)
Test set shape:  (10000,)


Flatten out all images to be one-dimensional:

In [4]:
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3)
print('Training set shape: ', Xtr_rows.shape)
print('Test set shape: ', Xte_rows.shape)

Training set shape:  (50000, 3072)
Test set shape:  (10000, 3072)


Define our classifier:

In [5]:
class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X, T):
    """ X is N x D where each row is an example we wish to predict label for, T is 'L1' or 'L2' distance"""
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    if T=='L1':
        # loop over all test rows
        for i in range(num_test):
            # find the nearest training image to the i'th test image
            # using the L1 distance (sum of absolute value differences)
            distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
            min_index = np.argmin(distances) # get the index with smallest distance
            Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
    elif T=='L2':
        # loop over all test rows
        for i in range(num_test):
            # find the nearest training image to the i'th test image
            # using the L1 distance (sum of absolute value differences)
            distances = np.sum(np.square(self.Xtr - X[i,:]), axis = 1)
            min_index = np.argmin(distances) # get the index with smallest distance
            Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
    else:
        print('Uknown type of distance ', T)
        for i in range(num_test):
            Ypred[i] = -1

    return Ypred

Create de classifier object:

In [6]:
nn = NearestNeighbor()
nn.train(Xtr_rows, Ytr)

Predict the first N samples of the test dataset with the **L1 distance**, and check the **accuracy** (which is the average number of examples that are correctly predicted).

It takes time...

In [13]:
N = 100
Yte_predict_L1 = nn.predict(Xte_rows[0:N], 'L1')
print('accuracy: %f' % ( np.mean(Yte_predict_L1 == Yte[0:N]) ) )

accuracy: 0.340000


Now, predict the same N samples with the **L2 distance**.

In [14]:
N = 100
Yte_predict_L2 = nn.predict(Xte_rows[0:N], 'L2')
print('accuracy: %f' % ( np.mean(Yte_predict_L1 == Yte[0:N]) ) )

accuracy: 0.340000
