# Canonical Clustering

## $k$ Nearest-Neighbor
### Distance Measure
The $k$ nearest neighbor is based on the simple idea that when a sample is close to other samples, changes are that they share the same label. To know how close one sample is from another sample, there needs to be a notion of close. For this application we will use euclidian distance also known as the $L^2$ norm

$$ \left\| x \right\|_2 \overset{\textrm{def}}= \left( \sum_{i=1}^n x_i^2 \right)^{1/2}$$

### Classification
In classification we would like to assign certain samples to a class. To do so we would like to implement an algorithm that assigns the class based on the classes of the $k$ nearest neighbors. Try to do so in `.code/knn.py`

In [2]:
import sys
sys.path.append('code/')
import knn

from sklearn import datasets, model_selection

# import some data to play with
X, y = datasets.load_iris(return_X_y=True)

# inspect the shape of the data
print(X.shape)
print(y.shape)

# split data into train and test
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y)

# fit model
model = knn.fit(X_train, y_train)

# and predict
y_predicted = knn.predict(X_test, model)

# evaluate predictions using y_predicted and y

(150, 4)
(150,)
