## KNN
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. To evaluate any technique we generally look at 3 important aspects:

1. Ease to interpret output

2. Calculation time

3. Predictive Power


### Distance measure:
Suppose the feature space x is an n-dimensional real vector space, $x_i, x_j \in \chi $, $x_i = (x^{(1)}_i, x^{(2)}_i, ..., x^{(n)}_i)^T$, $x_j = (x^{(1)}_j, x^{(2)}_j, ..., x^{(n)}_j)^T$. Then the $L_p$ distance between $x_i$ and $x_j$ is defined as:
$$L_p(x_i, x_j)= (\sum_{l=1}^{n}|x^{(i)}_i-x^{(l)}_j|^p)^{\frac{1}{p}}$$

   - p = 1, Manhattan distance
   $$L_1 (x_i, x_j) = \sum_{l=1}^{n}\left | x^{(l)}_i-x^{(l)}_j \right |$$
   - p = 2, Euclidean distance
   $$L_2(x_i, x_j) = (\sum_{l=1}^{n}\left | x^{(l)}_i-x^{(l)}_j \right |^2)^\frac{1}{2}$$
   - p = ∞, Chebyshev distance
$$L_\infty (x_i, x_j) = \underset{l}{max}\left | x^{(l)}_i-x^{(l)}_j \right |$$

We can implement a KNN model by following the below steps:

1. Load the data
2. Initialise the value of k
3. For getting the predicted class, iterate from 1 to total number of training data points
4. Calculate the distance between test data and each row of training data. Here we will use Euclidean distance as our distance metric since it’s the most popular method. The other metrics that can be used are Chebyshev, cosine, etc.
5. Sort the calculated distances in ascending order based on distance values
6. Get top k rows from the sorted array
7. Get the most frequent class of these rows
8. Return the predicted class

In [2]:
import math
from itertools import combinations

In [7]:
def L(x, y, p=2):
    if len(x) == len(y) and len(x)>1:
        sum_res = 0
        for i in range(len(x)):
            sum_res += math.pow(abs(x[i]-y[i]), p)
        return math.pow(sum_res, 1/p)
    else:
        return 0

### 3.1

In [4]:
x1 = [1, 1]
x2 = [5, 1]
x3 = [4, 4]

In [8]:
# x1, x2
for i in range(1, 5):
    r = {'1-{}'.format(c):L(x1, c, p=i) for c in [x2, x3]}
    print(min(zip(r.values(), r.keys())))

(4.0, '1-[5, 1]')
(4.0, '1-[5, 1]')
(3.7797631496846193, '1-[4, 4]')
(3.5676213450081633, '1-[4, 4]')
