<a href="https://www.kaggle.com/code/fareselmenshawii/knn-from-scratch?scriptVersionId=117154300" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div class="table-of-contents" style="background-color:#000000; padding: 20px; margin: 10px; font-size: 110%; border-radius: 25px; box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);">
  <h1 style="color:#FFD700;">TOC</h1>
  <ol>
    <li><a href="#1" style="color: #FFD700;">1. Overview</a></li>
      <li><a href="#2" style="color: #FFD700;">2. Imports</a></li>
    <li><a href="#3" style="color: #FFD700;">3. Data Analysis</a></li>
    <li><a href="#4" style="color: #FFD700;">4. Model Implementation From Scratch</a></li>
    <li><a href="#5" style="color: #FFD700;">5. SKlearn Implementation</a></li> 
    <li><a href="#6" style="color: #FFD700;">6. Evaluation</a></li>
    <li><a href="#7" style="color: #FFD700;">7. Thank You</a></li>  
  </ol>
</div>

<a id="1"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Overview</center></h1>

# Overview
  
**In this notebook  wel'll be implementing KNN Algorithm from scratch**
    
**Let's get started !**    

<a id="2"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Imports</center></h1>

# Imports
    

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

from sklearn.preprocessing import MinMaxScaler

<a id="3"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Data Analysis</center></h1>

# Data Analysis
  

In [2]:
iris = pd.read_csv("../input/iris/Iris.csv") #Load Data
iris.drop('Id',inplace=True,axis=1) #Drop Id column

In [3]:
iris.head().style.background_gradient(sns.color_palette("YlOrBr", as_cmap=True))

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
X = iris.iloc[:,:-1] #Set our training data

y = iris.iloc[:,-1] #Set training labels

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X.values, y.values, test_size=.2, random_state=41)

In [6]:
fig = px.scatter(data_frame=iris, x='SepalLengthCm',color='Species',template='plotly_dark')
fig.update_layout(width=800, height=600,
                  xaxis=dict(title='SepalLengthCm',color="#FFD700"),
                 yaxis=dict(title="Flower Name",color="#FFD700"))
fig.show()

In [7]:
fig = px.scatter(data_frame=iris, x='SepalWidthCm',color='Species',template='plotly_dark')
fig.update_layout(width=800, height=600,
                  xaxis=dict(title='SepalWidthCm',color="#FFD700"),
                 yaxis=dict(title="Flower Name",color="#FFD700"))
fig.show()

In [8]:
fig = px.scatter(data_frame=iris, x='PetalLengthCm',color='Species',template='plotly_dark')
fig.update_layout(width=800, height=600,
                  xaxis=dict(title='PetalLengthCm',color="#FFD700"),
                 yaxis=dict(title="Flower Name",color="#FFD700"))
fig.show()

In [9]:
fig = px.scatter(data_frame=iris, x='SepalLengthCm',y='SepalWidthCm',
           size='PetalLengthCm',color='Species',template='plotly_dark')

fig.update_layout(width=800, height=600,
                  xaxis=dict(title='SepalLengthCm',color="#FFD700"),
                 yaxis=dict(title="SepalWidthCm",color="#FFD700"))
fig.show()

<a id="4"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Model Implementation From Scratch</center></h1>

# Model Implementation From Scratch
  



## How the algorithm works

**We calculate the euclidean distance between a new sample and all points**

**We determine the label of the sample based on the majority vote**

## Key Points:

### Euclidean Distance
**Euclidean distance is defined as the distance between two points**

**Where it's represented by this equation :
$$\sqrt{\sum\limits_{i = 0}^{m-1} (x - y)^2}$$**


**Now let's start by implementing euclidean distance**

In [10]:
class KNN:
    def __init__(self, n_neighbors=5):
        self.n_neighbors = n_neighbors
        
    def euclidean_distance(self, x1,x2):
        return np.sum(np.square(x1 - x2)**2)

    def fit(self, X_train, y_train):
        # Note in knn fit function just stores the values of X_train, y_train
        self.X_train = X_train
        self.y_train = y_train

    def predict(self, X):
        #create empty array to store the predictions
        predictions = []
         # loop over X examples
        for x in X:
            #get prediction using the prediction helper funciton
            prediction = self._predict(x)
            # append the prediction to the predictions list
            predictions.append(prediction)
        return np.array(predictions)

    def _predict(self, x):
        # create empty array to store distances
        distances = []
        # loop over all training examples  and compute the distance between x and all the training examples 
        for x_train in self.X_train:
            distance = self.euclidean_distance(x, x_train)
            distances.append(distance)
        distances = np.array(distances)
        
        #Sort by ascendingly distance  and return indices of the given n neighbours
        n_neighbors_idxs = np.argsort(distances)[: self.n_neighbors]
        
        # Get labels of nneighbour indexes
        labels = y_train[n_neighbors_idxs]                  
        labels = list(labels)
        #get the highest present class in the array
        most_occuring_value = max(labels, key=labels.count)
        return most_occuring_value



In [11]:
model = KNN(7)
model.fit(X_train, y_train)


<a id="5"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Sklearn Implementation</center></h1>

# Sklearn Implementation
  

In [12]:
from sklearn.neighbors import KNeighborsClassifier
skmodel = KNeighborsClassifier(n_neighbors=7)
skmodel.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=7)

<a id="6"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Evaluation</center></h1>

# Evaluation
  

In [13]:
from sklearn.metrics import accuracy_score

predictions = model.predict(X_test)
sk_predictions = skmodel.predict(X_test)

accuracy = accuracy_score(y_test, predictions)
sk_accuracy = accuracy_score(y_test, sk_predictions)

print(f" our model got accuracy score of : {accuracy}")    
print(f" sklearn-model got accuracy score of : {sk_accuracy}")


 our model got accuracy score of : 0.9666666666666667
 sklearn-model got accuracy score of : 0.9666666666666667


<a id="7"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #FFD700;'>Thank You</center></h1>

# Thank You


**Thank you for taking your time going through this notebook**

**If you have any feedback please let me know**