### K-Nearest Neighbour (KNN) Algorithm
- KNN is a supervised machine learning algorithm generally used for classification but can also be used for regression tasks.
- It works by finding the "k" closest data points(neighbous) to a given input and makes a predictions based on the majority class(for classification) (or) the average value (for regression).
- It is also called as a lazy learner algorithm because it does not learn from the training set immediately instead it stores the entire dataset and performs computations only at the time of classification.
- Ex - Imagine you're deciding which fruit it is based on its shape and size.You compare it to fruits you already know.
  - If k=3, the algorithm looks at the 3 closest fruits to the new one.
  - If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an apple because most of its neighbours are apples.

### How to choose the value of k for KNN Algorithm?
- The value of k in KNN decides how many neighbors the algorithm looks at when making a prediction.
- Choosing the right k is important for good results.
- If the data has lots of noise (or) outliers, using a larger k can be make the predictions more stable.
- But if k is too large the model may become too simple and miss important patterns and this is called underfitting.
- So k should be picked carefully based on the data.

### Statistical Methods for Selecting k
#### 1. Cross-Validation:
- It is a good way to find the best value of "k" is by using k-fold cross-validation.This means dividing the dataset into "k" parts.
- The model is trained on some of these parts and tested on remaining ones. This process is repeated for each part.
- The "k" value that gives the highest average accuracy during these tests is usually the best one to use.
#### 2. Elbow Method:
- In Elbow Method, we draw graph showing the error rate(or) accuracy for different k values.
- As "k" increases the error usually drops first. But after a certain point error stops decreasing quickly.
- The point where curve changes direction and looks like an "elbow" is usually the best choice for "k".
#### 3. Odd Values for k:
- It's good idea to use an odd number for "k" especially in classification problems.
- This helps avoid ties when deciding which class is the most common among the neighbors.

### Distance metrics Used in KNN algorithm:
- KNN uses distance metrics to identify nearst neighbor,these neighbors are used for classification and regression tasks.
- To identify nerast neighbor we use below distance metrics:
#### 1. Euclidean Distance:
- It is defined as straight-line between two points.
- Formula, d = √(x2-x1)2+(y2-y1)2
 
#### 2. Manhatten Distance:
- It is a metric used to calculate the distance between two points in a grid-like space by summing the absolute differences of their coordinates across all dimensions.
- It is also called as "taxicab distance".
- Formula, d(x,y) = Σ|x(i) - y(i)|
#### 3. Minikowski Distance:
- It is like a family of distances, which includes both Euclidean and Manhatten distances as special cases:
- Formula, d(x,y) = (Σ(x)(i)-y(i))p)1/p
- From the formula above,
  - when p=2, it becomes the same as the Euclidean distance formula.
  - when p=1, it turns into the Manhatten distance formula.
- Minkowski distance is essentially a flexible formula that can represent either Euclidean (or) Manhatten distance depending on the value of p.

### Implementing KNN from Scratch in Python
#### 1. Importing Libraries:
- Counter is used to count the occurances of elements in a list (or) iterable.

In [7]:
import numpy as np
from collections import Counter

#### 2. Defining the Euclidean Distance Function
- euclidean_distance - is to calculate euclidean distance between points.

In [8]:
def euclidean_distance(point1, point2):
    return np.sqrt(np.sum((np.array(point1) - np.array(point2))**2))

#### 3. KNN Prediction Function
- distance.append - It saves how far each training point is from the test point, along with its label.
- distances.sort - It is used to sorts the list so the nearst points come first.
- k_nearst_labels - picks the labels of the "k" closest points.
- Uses Counter to find which label apperas most among those "k" labels that becomes the prediction.

In [9]:
def knn_predict(training_data, training_labels, test_point, k):
    distances = []
    for i in range(len(training_data)):
        dist = euclidean_distance(test_point, training_data[i])
        distances.append((dist, training_labels[i]))
    distances.sort(key=lambda x: x[0])
    k_nearst_labels = [label for _, label in distances[:k]]
    return Counter(k_nearst_labels).most_common(1)[0][0]

#### 4. Training Data, Labels and Test Points


In [10]:
training_data = [[1,2], [2,3], [3,4], [6,7], [7,8]]
training_labels = ['A', 'A', 'A', 'B', 'B']
test_point = [4,5]
k = 3

#### 5. Prediction

In [11]:
prediction = knn_predict(training_data, training_labels, test_point, k)
print(prediction)

A
