# K Nearest Neighbours

The K-Nearest Neighbour algorithm is a classification algorithm that takes bunch of labeled points and uses them to learn how to label other points. This algorithm classifies cases based on their similarity to other scores. Cases that are near each other are called "neighbours".<br>
### KNN Algorithm <br>
1. Pick a K value.
2. Calculate the distance of unknown case from all cases.
3. Select the K-observations in the training data that are "nearest" to the unknown data point.
4. Predict the response of the unknown data point using the most popular response value from the K-nearest neighbors

### Distance = $\sqrt{\sum \limits_{i=0} ^{n}(x_{1i}-x_{2i})^2}$

### Jaccard Index
$$J(y,\hat{y}) = \frac{|y\bigcap\hat{y}|}{|y\bigcup\hat{y}|} = \frac{|y\bigcap\hat{y}|}{|y| + |\hat{y}|- |y\bigcap\hat{y}|}  $$

### F1 Score
#### Precision = $\frac{True Positive}{True Positive + False Positive}$
#### Recall = $\frac{True Positive}{True Positive + False Negative}$
#### F1 Score = $2*(\frac{Precision * Recall}{Precision + Recall})$

### Log loss 
Measures the performance of a classifier where the predicted output is probability value between 0 and 1.<br>
$$Log loss = -\frac{1}{n}{\sum(y * log(\hat{y}) + (1-y) * log(1-\hat{y}))}$$

### Loading the required libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics




### Loading the data

In [2]:
data = pd.read_csv("TelecomCustomers.csv")

In [3]:
"""
    X (ndarray): The feature columns extracted from the input DataFrame.
    y (ndarray): The target column extracted from the input DataFrame.
"""
X = data[["region", "tenure", "age", "marital", "address", "income", "ed", "employ", "retire", "gender", "reside"]].values
y = data["custcat"].values

In [4]:
"""
Standardize features by removing the mean and scaling to unit variance.
"""
X = preprocessing.StandardScaler().fit(X).transform(X.astype(float))

In [5]:
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

# Printing the shape of the training and testing sets
print("Train set: {} {}".format(X_train.shape, y_train.shape))
print("Test set: {} {}".format(X_test.shape, y_test.shape))


Train set: (800, 11) (800,)
Test set: (200, 11) (200,)


In [6]:
k = 4

# Create a K-Nearest Neighbors classifier with k neighbors
neigh = KNeighborsClassifier(n_neighbors=k).fit(X_train, y_train)
k = 4
neigh = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)

In [7]:
"""
Predicts the target variable using the K-Nearest Neighbors algorithm.
"""

y_hat = neigh.predict(X_test)

In [8]:
# Train set accuracy
print("Train set accuracy : {}".format(metrics.accuracy_score(y_train, neigh.predict(X_train))))

# Test set accuracy
print("Test set accuracy : {}".format(metrics.accuracy_score(y_test, y_hat)))


Train set accuracy : 0.5475
Test set accuract : 0.32
