# K-Nearest Neighbours

K-Nearest Neighbours (KNN) is a way of classifying data based its proximity to other data in the 'training' dataset. Here *proximity* is defined as the Euclidean distance in paramter space between two data points.

This KNN classifies a test data point by averaging over the nearest *N* datapoints in parameter space and rounding to the nearest class.

In [1]:
import sklearn
import numpy as np

## Load in dataset

In [2]:
from sklearn.datasets import load_iris
iris = load_iris()

In [3]:
iris.keys()

['target_names', 'data', 'target', 'DESCR', 'feature_names']

In [4]:
X = iris.data[:,[2,3]]
y = iris.target

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=0)

Define an unweighted K Nearest Neighoburs classifier

In [6]:
class KNN:
    """An unweighted K Nearest Neighbours classifier"""
    
    def __init__(self,neighbours):
        """Function takes input of:
            neighbours: the user specified number of neighbours 
            to compare a given point to"""
        self.neighbours = neighbours
        
    def metric(self,point1,point2):
        """Euclidean metric"""
        if len(point1) != len(point2):
            print("Error, points are of different lengths")
        else:
            return np.power(np.sum(np.power(point1-point2,2)),0.5)
             
    def classify(self,xdata,ydata,to_classify):
        """Classification function"""
        prediction = np.zeros(len(to_classify))
        for k in range(len(to_classify)):
            distance = np.zeros(len(xdata))
            for i in range(len(xdata)):
                distance[i] = self.metric(xdata[i],to_classify[k])
            
            nearest = np.argsort(distance)[:self.neighbours]
            nearest_prediction = ydata[nearest]
            
            prediction[k] = np.around(np.average(nearest_prediction))
        
        return prediction  

In [7]:
KNN = KNN(5)

In [8]:
y_pred = KNN.classify(X_train,y_train,X_test)

from sklearn.metrics import accuracy_score
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

Accuracy: 0.97


## TODO: Build weighted KNN algorithm