# K-nearest neighbors implementation from scratch


### Algorithm procedure

The k-Nearest Neighbors classification is comprised of three major steps:

1. Calculate Euclidean Distance as the distance metric:

$d(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}$ 

2. Identify k nearest neighbors.
3. Assign class label by majority vote.

### Example train and test data

(the last element in each subarray train_data[i] represents the true class label)

In [6]:
train_data = [[-2.6, 1.9, 2.0, 1.0, 1.0],
 [-2.8, 1.7, -1.2, 1.5, 2.0],
 [2.8, -9.9, 0.3, 3.3, 1.0],
 [-1.5, 3.8, -1.6, -1.1, 0.0],
 [0.8, -1.2, 6.4, 2.7, 1.0],
 [-1.8, 3.2, -2.6, 2.0, 0.0],
 [-3.3, 1.1, 2.6, 0.2, 1.0],
 [2.1, 0.2, 0.9, -0.5, 0.0],
 [1.2, -0.9, 3.6, -1.9, 1.0],
 [1.3, -0.2, 1.3, -0.9, 1.0],
 [1.1, -0.8, -0.3, -1.3, 1.0],
 [-1.3, 2.4, 2.6, -1.2, 1.0],
 [1.6, -0.1, 1.6, -1.0, 1.0],
 [0.3, 3.0, -1.1, -0.7, 0.0],
 [2.8, -8.6, 3.8, -7.9, 2.0]]

test_data = [[-0.8, 1.2, 8.4, -1.0],
 [1.3, 0.2, -1.3, -0.8],
 [-1.1, 2.6, 3.9, -1.9],
 [2.0, -0.7, 8.3, -1.9],
 [0.3, 1.4, -6.3, -1.4]]


Skeleton code:

In [7]:
# necessary packages
import numpy as np

In [3]:
# define distance metric
def euc_dist(value1: list[float], value2: list[float]) -> float:
    # distance from between every pair of elements in both lists
    distance = np.sqrt(sum((a - b) ** 2 for a, b in zip(value1, value2)))
    return distance

# identify k nearest neighbors
def k_neighbors(train_data: list[list[float]], test_case: list[float], k) -> list:
    # calculate distances from test_case to all points in the train_data
    distances = []
    for row in train_data:
        dist = euc_dist(row[:-1], test_case)
        distances.append((row, dist)) # append the distance and what point it comes from
    # and now we have to sort the distances in ascending order 
    distances.sort(key=lambda x:x[1]) # order according to the distance element in the tuple

    #and then we get the first k elements in the ordered list
    neighbors = [distances[i][0] for i in range(k)]

    return neighbors

# assign class labels
def get_label(train_data: list[list[float]], test_case:list[float], k:int) -> int:
    neighbors = k_neighbors(train_data, test_case, k)
    labels = [row[-1] for row in neighbors]
    max_label = max(set(labels), key=labels.count)
    return max_label

# pull it together
def solution(train_data, test_data, k):
    final_labels = list()
    for row in test_data:
        label = get_label(train_data, row, k)
        final_labels.append(label)
    return final_labels

     

In [None]:
print(solution(train_data, test_data, 3))


[1.0, 1.0, 1.0, 1.0, 0.0]
