### K Nearest Neigbors

Use the KNN to return k neigbors of the provided features.

These features are result of dimensionality reduction PCA on some operating-system data related to process and their intrusivity in some network. You will have access to an EXAMPLES dictionary, mapping each process identifier `"p_id"` to a respective dictionary containing its associated features as well as label representing whether the relevant process was intrusive to the network. A label `0` means that it was not intrusive , while a label `1` means that it was intrusive

In [1]:
# Sample Input
features = [4.30936122, 4.28739283, 4.29680938, 4.33571647, 4.28774593]
# As Features are reduced to 5 dimensions from the PCA.
k = 1
'''
# Pseudo Code -
1. For any new point
    1.1 Calculate Euclidean distance to all other points
    1.2 Sort the distance in ascending order
    1.3 Select K elements based on minimum distance
2. Return Most label in K elements

# Algo -
Input - Given Feature
1. for each pid:
    2. compute_distance(features, pid[`feature`])
3. Sort Distances
4. return most_frequent_label(distance())
'''

In [3]:
import math

def predict_label(features, examples, k, label_key="is_intrusive"):
    k_nearest_neighbor = find_k_nearest_neigbors(examples, features)
    k_nearest_neighbor_labels = [examples[pid][label_key] for pid in k_nearest_neighbor]
    return round(sum(k_nearest_neighbor_labels) / k)

def find_k_nearest_neigbors(examples, features):
    distances = {}
    for pid , features_label_map in examples.items():
        distance = get_euclidean_distance(features, features_label_map['features'])
        distances = distance
    return sorted(distances, key=distances.get)[:k]

def get_euclidean_distance(features, other_features):
    squared_distance = []
    for i in range(len(other_features)):
        squared_distance.append((other_features[i] - features[i]) ** 2)
    return math.sqrt(sum(squared_distance))
