## K-Nearest Neighbors

A Query label is determinated by weighing the distance between data points using the Euclidean distance between two points ($x_1, y_1$) and ($x_2, y_2$):

$$ Distance = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}$$

#### Implementing Euclidean distance in Python

In [13]:
import math


def euclidean_distance(point1, point2):
    squares = [(p - q) ** 2 for p, q in zip(point1, point2)]
    return math.sqrt(sum(squares))

#### Implementing K-NN Classification

In [22]:
from collections import Counter

def k_nearest_neighbors(data, query, k, distance_fn):
    neighbor_distances_and_indices = []

    for idx, (point, cls) in enumerate(data):
        distance = distance_fn(point, query)
        neighbor_distances_and_indices.append((distance, idx))

    # Sorting by distances
    sorted_neigbor_distances_and_indices = sorted(neighbor_distances_and_indices)

    # Selecting K closest data points
    k_nearest_distances_and_indices = sorted_neigbor_distances_and_indices[:k]

    # Obtain class labels for those K data points
    k_nearest_labels = [data[i][1] for distance, i in k_nearest_distances_and_indices]

    # Majority Vote
    most_common = Counter(k_nearest_labels).most_common(1)
    return most_common[0][0]

#### Using K-NN

In [23]:
data = [
    ((2, 3), 0),
    ((5, 4), 0),
    ((9, 6), 1),
    ((4, 7), 0),
    ((8, 1), 1),
    ((7, 2), 1)
]
query = (5, 3)  # test point

# Perform the classification
predicted_label = k_nearest_neighbors(data, query, k=3, distance_fn=euclidean_distance)
print(predicted_label)  # Expected class label is 0

0


In [31]:
data = [
    ((2, 3), 0),
    ((5, 4), 0),
    ((9, 6), 1),
    ((4, 7), 0),
    ((8, 1), 1),
    ((7, 2), 1)
]


k = 3
print(sorted(data))
print(sorted(data)[k:])
range(0,5) # 0,1,2,3,4

[((2, 3), 0), ((4, 7), 0), ((5, 4), 0), ((7, 2), 1), ((8, 1), 1), ((9, 6), 1)]
[((7, 2), 1), ((8, 1), 1), ((9, 6), 1)]


range(0, 5)