# K-Nearest Neighbors Algorithm

# KNN Algorithm

* Initialize K to your chosen number of neighbors
* 3. For each example in the data
* - 3.1 Calculate the distance between the query example and the current example from the data.
* - 3.2 Add the distance and the index of the example to an ordered collection
* 4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances
* 5. Pick the first K entries from the sorted collection
* 6. Get the labels of the selected K entries
* 7. If regression, return the mean of the K labels
* 8. If classification, return the mode of the K labels

# KNN ning ijobiy va salbiy tomonlari
**`Ijobiy tomonlari`**
* Bu tushunish va talqin qilish uchun juda oddiy algoritm.
* Bu chiziqli bo'lmagan ma'lumotlar uchun juda foydali, chunki bu algoritmda ma'lumotlar haqida hech qanday taxmin yo'q.
* Bu ko'p qirrali algoritm, chunki biz undan tasniflash va regressiya uchun foydalanishimiz mumkin.
* U nisbatan yuqori aniqlikka ega, ammo KNNga qaraganda ancha yaxshi nazorat qilinadigan o'rganish modellari mavjud.

**`Kamchiliklari`**
* Bu hisoblash uchun biroz qimmat algoritm, chunki u barcha o'quv ma'lumotlarini saqlaydi.
* Boshqa nazorat qilinadigan o'rganish algoritmlariga qaraganda yuqori xotirani saqlash talab qilinadi.
* Katta N bo'lsa bashorat sekin.
* U ma'lumotlar ko'lamiga, shuningdek, ahamiyatsiz xususiyatlarga juda sezgir.

In [27]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
from collections import Counter

**`Evklid masofasi`**

In [7]:
def euclidean_distance(x1,x2):
    dist = np.linalg.norm(x1-x2)
    return dist

**`KNN classification`**

In [None]:
class KNN:
    def __init__(self, k = 3):
        self.k = k
        
    
    def fit(self, X, y):
        self.X = X
        self.y = y
        
    def predict(self, X):
        predicted_labels = [self._predict(x) for x in X]
        return np.array(predicted_labels)
    
    def _predict(self,x):
        
        distances = [euclidean_distance(x, x_train) for x_train in self.X]
        
        k_index = np.argsort(distances)[:self.k]
        k_nearest_label = [self.y[i] for i in k_index]
        
        most_common = Counter(k_nearest_label).most_common(1)
        return most_common[0][0]       

### Test

**`Load dataset iris`**

In [10]:
data = datasets.load_iris()

In [12]:
X, y = data.data, data.target

In [13]:
X.shape

(150, 4)

In [14]:
y.shape

(150,)

In [21]:
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, random_state=42)

In [22]:
knn = KNN(k=5)

In [23]:
knn.fit(X_train, y_train)

In [24]:
predict = knn.predict(X_test)

In [25]:
predict

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0])

In [26]:
y_test

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0])

In [31]:
accuracy_score(y_test, predict)

1.0