## Theoretical Concept
Basically KNN, as the name suggests, K Nearest Neighbors is where the classification of a new datapoint depends on it's nearest neighbors (datapoints which are similar to it/ datapoints which are closer to it).

To calculate the distance between two points we can use any of the following distances - 
 - <a href="https://www.wikiwand.com/en/Euclidean_distance">Euclidean Distance</a>
 - <a href="https://www.wikiwand.com/en/Taxicab_geometry">Manhattan Distance</a>
 - <a href="https://www.youtube.com/watch?v=ieMjGVYw9ag">Cosine Distance</a>
 
 

## Dataset

We will be using the famous iris datasets from `sklearn.datasets`. The data set consists of 50 samples from each of three species of Iris -
 - Iris setosa
 - Iris virginica
 - Iris versicolor

## Importing the modules

In [7]:
import numpy as np
from sklearn.metrics import accuracy_score #To calculate the accuracy of the model
from sklearn.datasets import load_iris #The iris dataset
from numpy.random import randint #Random generator
from scipy.stats import mode #To calculate the mode of a given data
from scipy.spatial import distance as dis #To calculate euclidean distance

## Function to predict the label using KNN

In [8]:
def fx_knn(train_data, train_labels, test_data, k):
    labels_pred = []
    for item in test_data: #Looping through the data which is to be classified
        
        #To store the distances
        point_dist = []
        
        #Calculate and append the distances with each point in the train data
        for row_index in range(len(train_data)):
            distance = dis.euclidean(train_data[row_index,:], item)
            point_dist.append(distance)
        point_dist = np.array(point_dist)
        
        near_indices = np.argsort(point_dist)[:k] #Get the indices of the k closest points
        
        labels = train_labels[near_indices] #Labels of the closest points
        
        #voting to get the label
        lab = mode(labels) 
        lab = lab.mode[0]
        labels_pred.append(lab)
    
    return labels_pred #Returns the predicted labels
            
    

## Call the function with iris data

In [9]:
iris = load_iris()
X = iris.data #To store the features matrix
y = iris.target #To store the labels

#Creating the training Data
train_random_indices = randint(0,150,100)
train_data = X[train_random_indices]
train_labels = y[train_random_indices]
 
#Creating the testing Data
test_random_indices = randint(0,150,50) #taking 50 random samples
test_data = X[test_random_indices]
test_labels = y[test_random_indices]

#Get the set of predicted labels for test_data

labels_pred = fx_knn(train_data, train_labels, test_data, 5) #Using 5 nearest neighbors
print(labels_pred)



[2, 2, 0, 2, 2, 2, 2, 0, 2, 0, 0, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 2, 2, 1, 0, 1, 1, 0, 2, 0, 1, 2, 1, 0, 0, 0, 2, 1, 0, 2, 2, 1]


We have now gotten the predicted labels for our KNN function for the random test_data. 

Now let's calculate the accuracy of the model that we have generated:

In [10]:
print(f'The accuracy of the model generated with 5 nearest neighbors is {accuracy_score(labels_pred, test_labels)}')

The accuracy of the model generated with 5 nearest neighbors is 0.96
