<h1>K Nearest Neighbors</h1>

<h3>Basic info:</h3>
<li> Algorithm classifies new objects based on class of majority and k of his closest neighbors.
<li> It's example of supervised learning.
<li> It's really simply method but it isn't fast so it may not be best choice when dealing with a lot of data.
<li> It works in multi dimensional datasets.

<h3>How to choose k value</h3>
<li> k should be even number if number of classes is odd. Otherwise k should be even.
<li> k should be greater than number of classes.

<h3>Alternative metrics</h3>
<li> Euclidean distance
<li> Manhattan distance




<h2>Import stuff</h2>

In [1]:
import numpy as np

<h2>Implementing metrics</h2>

Euclidean metrics:
$$
  d = \sqrt{(a_1 - b_1)^2 + (a_2 - b_2)^2 + \ldots + (a_n - b_n)^2 }
$$

In [2]:
def EuclideanMetrics(p1, p2) -> float:
    d = 0
    for a, b in zip(p1,p2):
        d += (a - b) ** 2
    return np.sqrt(d)

$$
d = \sum^n_{i=0}|p1_i - p2_i|
$$

In [3]:
def ManhattanMetrics(p1, p2) -> float:
    d = 0
    for a, b in zip (p1, p2):
        d += np.abs(a - b)
    return d


<h1>My KNN class

In [30]:
class KNN():
    def __init__(self, k=5, metric='Euclidean'):
        self.k_ = k
        self.metric_ = metric
    
    def fit(self, X, y):
        self.X_ = X.copy()
        self.y_ = y.copy()

    def predict(self, X) -> np.ndarray:
        predicted_values = []
        for x in X:
            # Calculate distance between x and trained points 
            distances = self.dist(x)
            # Sort all objects in X_ by distance to x 
            # and select k of them that are the closest to x
            closest_idx = np.argsort(distances)[:self.k_]
            # Return class that has majority of them
            predicted_values.append(max(self.y_[closest_idx]))
        return np.array(predicted_values)

    def dist(self, x) -> list:
        distances = []
        for x_trained in self.X_:
            if self.metric_ == "Euclidean":
                distances.append(EuclideanMetrics(x, x_trained))
            elif self.metric_ == "Manhattan":
                distances.append(ManhattanMetrics(x, x_trained))
        return distances
    
    def calc_error(self, X, y):
        y_predicted = self.predict(X)
        return sum(y_predicted == y)/len(y)




In [33]:
knn = KNN(k=3)
X = [[0], [1], [2], [1], [7], [8], [7], [8]]
y = [0, 0, 0, 0, 1, 1, 1, 1]
X = np.array(X)
y = np.array(y)

[0.]


<h3>Bibliography</h3>
<li> https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
<li>https://en.wikipedia.org/wiki/Taxicab_geometry
<li>https://aszokalski.github.io/AI/KNN.html
<li>Data Science Algorithms in a Week, Dávid Natingga, Packt