# K-Nearest Neighbours (KNN) Classifier - From Scratch

## Problem Statement

You are provided with a small dataset of fruits classified based on their weight, size, and color.
The goal is to implement the K-Nearest Neighbors (KNN) algorithm from scratch using only Python and NumPy (no external libraries like sklearn, pandas, etc.) to classify new fruit samples into one of the three types: **Apple**, **Banana**, or **Orange**.

## Load and Pre-process the Data

Let's begin by loading the given dataset into a NumPy matrix and apply some pre-processing (namely ordinal encoding for the labels).

In [14]:
import numpy as np

data = [
    [150, 7.0, 1, 'Apple'],
    [120, 6.5, 0, 'Banana'],
    [180, 7.5, 2, 'Orange'],
    [155, 7.2, 1, 'Apple'],
    [110, 6.0, 0, 'Banana'],
    [190, 7.8, 2, 'Orange'],
    [145, 7.1, 1, 'Apple'],
    [115, 6.3, 0, 'Banana']
]

def preprocess_data(df):
    encoding = {'Apple':0,'Banana':1,'Orange':2}
    X = []
    y = []

    for row in df:
        X.append(row[:-1])
        y.append(encoding[row[3]])

    return np.array(X),np.array(y)

X,y = preprocess_data(data)

X

array([[150. ,   7. ,   1. ],
       [120. ,   6.5,   0. ],
       [180. ,   7.5,   2. ],
       [155. ,   7.2,   1. ],
       [110. ,   6. ,   0. ],
       [190. ,   7.8,   2. ],
       [145. ,   7.1,   1. ],
       [115. ,   6.3,   0. ]])

## Implementing the KNN Classifier

Next, we'll move on to creating our KNN Classifier class!

In [None]:
def calc_dist(p1,p2):
    return np.sum((p1-p2)**2)

class KNN_Classifier:

    def __init__(self,k=3):
        self.k = k
        self.X = None
        self.y = None
        self.encoding = {'Apple':0,'Banana':1,'Orange':2}

    def fit(self,X,y):
        self.X = X
        self.y = y

    def predict_one(self, x):
        dists = [(calc_dist(x,x_i),y_i) for x_i,y_i in zip(self.X,self.y)]
        dists.sort()

        neighour_occurences = {}
        for i in range(self.k):
            if dists[i][1] not in neighour_occurences.keys():
                neighour_occurences[dists[i][1]] = 1
            else:
                neighour_occurences[dists[i][1]] += 1
        
        max_occ = 0
        max_freq_label = ""

        for class_label in neighour_occurences.keys():
            if neighour_occurences[class_label] > max_occ:
                max_occ = neighour_occurences[class_label]
                max_freq_label = class_label
            
        for label, enc in self.encoding.items():
            if max_freq_label == enc:
                return label

            

    def predict(self, X_test):
        pass