# K-nearest neighbors
***

In this notebook I will attempt to implement the K-nearest neighbors algorithm using the [CIFAR-10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html) for the Stanford course: [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/2017/syllabus)


In [1]:
# Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pickle

## Loading the CIFAR-10 dataset
***

In [2]:
def load_dataset(files):
    # From: https://www.cs.toronto.edu/~kriz/cifar.html
    data = []
    data_labels = []
    for i in range(len(files)):
        with open(files[i], 'rb') as fo:
            dict = pickle.load(fo, encoding='bytes')
            data.append(dict[b'data'])
            data_labels.append(dict[b'labels'])

    data = np.array(data)
    data = data.reshape(data.shape[0] * data.shape[1], data.shape[2])
    data_labels = np.array(data_labels)
    data_labels = data_labels.reshape(data_labels.shape[0] * data_labels.shape[1])

    return data, data_labels

In [3]:
b1 = "../cifar-10-batches-py/data_batch_1"
b2 = "../cifar-10-batches-py/data_batch_2"
b3 = "../cifar-10-batches-py/data_batch_3"
b4 = "../cifar-10-batches-py/data_batch_4"
b5 = "../cifar-10-batches-py/data_batch_5"
batches = [b1]

In [4]:
train_x, train_y = load_dataset(batches)

In [5]:
test_batch = ["../cifar-10-batches-py/test_batch"]
test_x, test_y = load_dataset(test_batch)

In [6]:
print(f"train_x shape: {train_x.shape}")
print(f"train_y shape: {train_y.shape}")
print(f"test_x shape: {test_x.shape}")
print(f"test_y shape: {test_y.shape}")

train_x shape: (10000, 3072)
train_y shape: (10000,)
test_x shape: (10000, 3072)
test_y shape: (10000,)


Creating KNearestNeighbor class:

In [16]:
class KNearestNeighbor(object):
    def __init__(self):
        pass
    
    def train(self, X, y):
        self.Xtr = X
        self.ytr = y
        
    def predict(self, X):
        num_test = X.shape[0]
        y_pred = np.zeros(num_test, dtype = self.ytr.dtype)

        for i in range(num_test):
            distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
            min_index = np.argmin(distances) # get the index with smallest distance
            y_pred[i] = self.ytr[min_index] # predict the label of the nearest example
        return y_pred

Testing class' methods and accuracy.

In [17]:
knn = KNearestNeighbor()

In [18]:
knn.train(train_x, train_y)

In [19]:
dist_predict = knn.predict(test_x)

In [20]:
print(f"Accuracy: {np.mean(dist_predict == test_y)}")

Accuracy: 0.2176


Testing with validation.