# KNN Mathematics

It is classification algorithm in which we find the K-Nearest Neighour which has the highest Similarity with new data point using **Euclidean Distance**.

#### The K-NN working can be explained on the basis of the below algorithm:

1. Select the number K of the neighbors
2. Calculate the Euclidean distance of each neighbors (**Euclidean distance** = $\sqrt{|x_{o1} - x_{A1}|^2 + |x_{o2} - x_{A2}|^2 +....+ |x_{on} - x_{An}|}$ )
3. Take the K nearest neighbors as per the calculated Euclidean distance.
4. Among these k neighbors, count the number of the data points in each category.
5. Assign the new data points to that category for which the number of the neighbor is maximum.

![Image](https://static.javatpoint.com/tutorial/machine-learning/images/k-nearest-neighbor-algorithm-for-machine-learning2.png)

## Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Importing Dataset

In [2]:
df = pd.read_csv("./Social_Network_Ads.csv")
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


## Solving KNN

###  1.Choose the K

In [3]:
K = 5

In [4]:
x_new = [35, 20000]

### 2.Calculate euclidean distance with each point

**Euclidean distance** = $\sqrt{|x_{o1} - x_{A1}|^2 + |x_{o2} - x_{A2}|^2 +....+ |x_{on} - x_{An}|}$

In [5]:
df["euclidean_distance"] = np.sqrt(np.square(abs(x_new[0] - df["Age"])) + np.square(abs(x_new[1] - df["EstimatedSalary"])))

In [7]:
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased,euclidean_distance
0,19,19000,0,1000.127992
1,35,20000,0,0.0
2,26,43000,0,23000.001761
3,27,57000,0,37000.000865
4,19,76000,0,56000.002286


### 3. Select the K nearest neighbours

Sort the dataframe according to Euclidean distance in ascending order and selecting the K-nearest neighbour

In [13]:
K_neighbours = df.sort_values("euclidean_distance")[:K]
K_neighbours

Unnamed: 0,Age,EstimatedSalary,Purchased,euclidean_distance
1,35,20000,0,0.0
60,27,20000,0,8.0
45,23,20000,0,12.0
25,47,20000,1,12.0
397,50,20000,1,15.0


### 4. Count data points in each category.

In [37]:
K_neighbours["Purchased"].value_counts()

0    3
1    2
Name: Purchased, dtype: int64

There are 3 data points with value 0 and 2 data points with value 1 

### 5. Assign the new data points to that category for which the number of the neighbor is maximum.

As 0 category have maximum neighbours, we will assign new datapoints to 0 category

In [41]:
K_neighbours["Purchased"].value_counts()

0    3
1    2
Name: Purchased, dtype: int64