## **Implementation of K Nearest Neighbours from Scratch**

In this project we will implement KNN from scratch without using inbuilt classifier.

**KNN**

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.


Advantages
-The algorithm is simple and easy to implement.
-There’s no need to build a model, tune several parameters,or make additional assumptions.
-The algorithm is versatile. It can be used for classification, regression, and search (as we  will see in the next section).

Disadvantages
The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.

**Importing Libraries**

In [0]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from collections import Counter
from sklearn.metrics import accuracy_score

**Uploading data using pandas**

In [0]:
data = pd.read_csv('/content/heart.csv')

In [3]:
# Displaying top 5 rows of our data 
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


**Splitting data into train and test set**

In [0]:
x = data.drop("target",axis=1)
y = data["target"]
x=x.to_numpy()
y=y.to_numpy()
X_train,X_test,Y_train,Y_test = train_test_split(x,y,test_size=0.20,random_state=0)

**Made function which will return euclidean distance between two points**

According to the Euclidean distance formula, the distance between two points in the plane with coordinates (x, y) and (a, b) is given by



dist((x, y), (a, b)) = $\sqrt{(x - a)² + (y - b)²}$

In [0]:
def euclidean_distance(x1, x2):
        return np.sqrt(np.sum((x1 - x2)**2))

**Predict function**

In KNN we dont need to train since we will calculate eucledian distance from all training points and then predict.

In [0]:
def predict_one(x_train,y_train,x_test,k):
    #distances list will save the distance of the testing data point
    #from all training data point
    distances=[]
    for i in range(len(x_train)):
        #distance store is square difference distance of given testing point
        distance=euclidean_distance(x_train[i], x_test)
      
        #in distances we will store the ith training point and it's distance from testing point
        distances.append([distance,i])
        '''Sorting the distance list as here we check the distance from 
           training point and testing point should be minimum or we check what are the 
           training point near to the give testing point'''
        distances=sorted(distances)
    
    targets=[]
    for i in range(k):
            #here in index_of_training we get the training point
            index_of_trainig_data=distances[i][1]
           
            #and we are storing the class of index_of_training
            targets.append(y_train[index_of_trainig_data])
         
            #here we are taking the most common targets and it return how many time it occur
   
    return Counter(targets).most_common(1)[0][0]

In [0]:
'''Prediction function which take x_train,y_train and predict the classes for x_test_data,
   here we are also passing the k which is telling how many nearest neighbors we want to consider'''
def predict(x_train,y_train,x_test_data,k):
    predictions=[]
    for x_test in x_test_data:
        #Get the output from predict_one and storing it into the list 
        predictions.append(predict_one(x_train,y_train,x_test,k))
    return predictions

In [0]:
#Calling the predict function
y_predict=predict(X_train,Y_train,X_test,7)

**Accuracy Score**

In [9]:
#Getting the accuracy_score
accuracy_score(Y_test,y_predict)

0.6721311475409836

**Conclusion**

We have succesfully implemented KNN from scratch.