#                **KNN Algorithm**





K nearest neighbors or KNN Algorithm is a simple algorithm which uses the entire dataset in its training phase. Whenever a prediction is required for an unseen data instance, it searches through the entire training dataset for k-most similar instances and the data with the most similar instance is finally returned as the prediction. 

## KNN Algorithm


1. Load the data
2. Initialize K to your chosen number of neighbors
3. For each example in the data

>> 3.1 Calculate the distance between the query example and the current example from the data.

>> 3.2 Add the distance and the index of the example to an ordered collection

4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels








## Code


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

class KNN:
    def __init__(self,path,k):
        self.k=k
        self.dataset=pd.read_csv(path)
    def fit(self):
        self.x=self.dataset.iloc[:, [1, 2, 3,4]].values
        self.y=self.dataset.iloc[:, -1].values 
        self.x_train, self.x_test, self.y_train, self.y_test = train_test_split(self.x, self.y, test_size=0.4, random_state=4, stratify=self.y )

    def predict(self,x):
        distances=[]
        for row, out in zip(self.x_train,self.y_train):
            distance=np.sqrt(np.sum((x-row)**2))
            distances.append((distance,out))
        distances.sort(key=lambda x:x[0])
        distances=distances[:self.k]
        count=dict()
        for distance in distances:
            if distance[1] in count:
                count[distance[1]]+=1
            else:
                count[distance[1]]=1
        return max(count)

    def evaluate(self):
        count=0
        for i in range(len(self.x_test)):
            if self.predict(self.x_test[i])==self.y_test[i]:
                count+=1
        return count/len(self.y_test)



 ### object constructor, with input are path and k  





*   path: path of data ( file csv)
*   k: the number of nearest neighbor points which are voting for the new test data’s class

### fit(self) method

Randomly split the dataset into training dataset (for making the prediction)and testing dataset (for evaluating the accuracy of the model)


### predict(self,x) method

1.   Calculate the distance from 1 point x to the neighbors in train dataset
- In this case for calculating the distance, we will use the Euclidean distance. This is defined as the square root of the sum of the squared differences between the two arrays of numbers
2.   Find k neighbors of train dataset with shortest distance to x 
3.   Predicted response based on those neighbors
- You can do this by allowing each neighbor to vote for their class attribute, and take the majority vote as the prediction.




### evaluate(self) method

- Now that we have all of the pieces of the kNN algorithm in place. Let’s check how accurate our prediction is!

- An easy way to evaluate the accuracy of the model is to calculate a ratio of the total correct predictions out of all predictions made

### Run 

In [None]:
data=KNN("/content/drive/MyDrive/AI_SGROUP/Machine_Learning/KNN/Iris.csv",5)
data.fit()
print(data.evaluate())

0.9333333333333333
