# K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a simple, easy-to-understand, versatile, and one of the topmost machine learning algorithms. KNN algorithm used for both classification and regression problems. The KNN algorithm is based on feature similarity, meaning how closely out-of-sample features resemble our training set determines how we classify a given data point.

![KNN](https://miro.medium.com/max/753/1*2zYNhLc522h0zftD1zDh2g.png)

Image source: [Medium](https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7)

## How does KNN work?

Given a new observation, KNN algorithm goes through the whole dataset to find the k-observations that are nearest to the measurement of the new observation. These k observations are called the k-nearest neighbors.

If KNN is used for classification, the output can be calculated as the class with the highest frequency from the K-most similar instances. Each instance in essence votes for their class and the class with the most votes is taken as the prediction.

If KNN is used for regression, the output can be calculated as the average of the numerical target of the K-most similar instances.

The distance between two points can be Euclidean, Manhattan, Minkowski and Hamming distance. First three functions are used for continuous function and fourth one (Hamming) for categorical variables. If we are dealing with the large number of features, it's suggested to use standardization before applying KNN algorithm.

## KNN Algorithm

1. Load the data
2. Initialize K to your chosen number of neighbors
3. For each example in the data
  1. Calculate the distance between the query example and the current example from the data.
  2. Add the distance and the index of the example to an ordered collection
4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels

In [None]:
# Importing required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

In [None]:
# Load the data
dataset = pd.read_csv('diabetes.csv')
print(len(dataset))
print(dataset.head())

In [None]:
# Split the dataset
X = dataset.iloc[:, 0:8]
y = dataset.iloc[:, 8]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)

In [None]:
# Feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

In [None]:
# Define the model: Init K-NN
classifier = KNeighborsClassifier(n_neighbors=11, p=2, metric='euclidean')
classifier.fit(X_train, y_train)

In [None]:
# Predict the test set results
y_pred = classifier.predict(X_test)
y_pred

In [None]:
# Evaluate Model
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(f1_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

In the example above, we used the KNN algorithm to build a model on the Pima Indians diabetes dataset. We used 'euclidean' distance function and the number of neighbors was 11. The accuracy was then calculated.

## References:

1. Aha, D.W., Kibler, D., & Albert, M.K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37-66.
2. Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.
3. Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.
4. Dasarathy, B. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. ISBN 0-8186-8930-7.
5. Gates, G. W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, May 1972, 431-433.
6. Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, May 1968, 515-516.
7. Shortliffe, E. H. (1975). Computer-Based Medical Consultations: MYCIN. Elsevier/North Holland.
8. Skalak, D. B. (1993). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Conference on Machine Learning, 293-301.
9. Zhang, H. (2006). The optimality of Naive Bayes. Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, 562-567.