# Nearest neighbor for handwritten digit recognition
In this notebook we will build a classifier that takes an image of a handwritten digit and outputs a label 0-9 using **nearest neighbor classifier**

## 1. The MNIST dataset
`MNIST` is a dataset, consisting of 28x28 gray-scale images handwritten digits.

In [23]:
import numpy as np
from numpy import genfromtxt
from sklearn.neighbors import KDTree
import pandas as pd

train = genfromtxt('train.csv', delimiter=',')
test_data = genfromtxt('test.csv', delimiter=',')
submssion = genfromtxt('sample_submission.csv', delimiter=',')


## 2. Understanding and pre-processing data
The **training data-set** consists of **42001** and the **test data-set** consists of **28001**  images spread as a row of **784 pixels** 

In [24]:
print('Shape of train: ', np.shape(train))
print('Shape of test_data: ', np.shape(test_data))
print('Shape of submission: ', np.shape(submssion))

Shape of train:  (42001, 785)
Shape of test_data:  (28001, 784)
Shape of submission:  (28001, 2)


In [25]:
train_data = train[1:,1:]
train_labels = train[1:,0:1]
test_data = test_data[1:,:] 

## 3. Building a classifier
Here we use a Kd tree data structure for fast computation of K Nearest neighbour algorithm.

In [26]:
kd_tree = KDTree(train_data)

In [27]:
test_neighbors = np.squeeze(kd_tree.query(test_data, k=1, return_distance=False))

In [28]:
kd_tree_predictions = train_labels[test_neighbors]

In [29]:
np.unique(kd_tree_predictions)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [66]:
kd_tree_predictions = kd_tree_predictions.astype(int)

In [67]:
kd_tree_predictions

array([[2],
       [0],
       [9],
       ...,
       [3],
       [9],
       [2]])

In [69]:
file = pd.read_csv('sample_submission.csv')
data = pd.read_csv('train.csv')

In [70]:
data.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [71]:
file.head()

Unnamed: 0,ImageId,Label
0,1,0
1,2,0
2,3,0
3,4,0
4,5,0


In [72]:
file.columns

Index(['ImageId', 'Label'], dtype='object')

## 4. Extraction of output file

In [73]:
del file['Label']

In [74]:
file.insert(1, 'Label', kd_tree_predictions )

In [75]:
file.head()

Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,0
4,5,3


In [76]:
file.to_csv('Kaggle.csv', index=False)