## K Nearest Neighbor

KNN is can be used for both classification and regression.

Intuition is we are using K nearest neightbors(hence the name) to aid us in classifying or regressing our input of interest. How do we select the nearest neighbors? We decide by calculating the Minkowski Distance between out input data point and all the data. Then we select the top-k nearest instances.

For classification, we aggregate the classes of the k nearest instances. The mode class will be the predicted class.

For regression, we take the mean of the attribute of interes of the k nearest instances and that will be our prediction.

While its simplicity, KNN face difficulties doing inference when a large amount of training data is used. This is due to the requirement of computing the relative distance array for each data point.




In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

In [72]:
# Load Dataset
data = load_iris()
x = data['data']
y = data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, train_size=0.8)

As a whole:
1. Calculate 

In [259]:
class KNNClassifier():
    def __init__(self, k, regression=False):
        self.k = k
        self.regression = regression

    def fit(self, x, y):
        assert x.shape[0] == y.shape[0], 'Number of data points in x is not the same as that in y'
        self.data = x
        self.labels = y

    def MinkowskiDistance(self, x, z, p):
        """
        Compute distance matrix
        """
        return np.power(np.sum(np.power(np.abs(np.tile(x,(z.shape[0])).reshape(x.shape[0],z.shape[0],z.shape[-1])-z),p), axis=-1),1/p)

    def predict(self, x):
        if len(x.shape)==2:
            distances=self.MinkowskiDistance(x, self.data, 2)
            indices = np.argsort(np.array(distances))
            labels = self.labels[indices[:,:10]]
            if self.regression:
                return(np.mean(labels))
            else:
                res = []
                for li in labels:
                    unique, counts = np.unique(li, return_counts=True)
                    res.append(unique[np.argmax(counts)])
                return np.array(res)


In [260]:
model  = KNNClassifier(k=10)
model.fit(x_train, y_train)

In [265]:
out = model.predict(x_test)

In [266]:
# Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, out))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         9
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        12

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [267]:
# Using sklearn example
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X=x_train, y=y_train)
out = knn.predict(x_test)
print(classification_report(y_test, out))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         9
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        12

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

