# K-Nearest Neighbors



## 1. Iris data set et K-Nearest Neighbors

Le set des Iris est un classique en machine learning, on a un petit jeu de données (150 éléments) avec 3 variétés d'Iris (Setosa, Versicolour, Virginica) et 4 "features" : largeur et longueur des pétales et des sépales.

On cherche à développer un classifieur qui soit capable de déterminer de quel type de fleur il s'agit

In [None]:
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()

print('#' * 25)
print(iris.keys())
print('#' * 25)
print(iris['DESCR'])

In [None]:
# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause
# Source : https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

plt.figure(2, figsize=(8, 6))
plt.clf()

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1,
            edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

plt.show()

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)

On affiche les dimensions pour vérfier 

In [None]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

On va utiliser ici la classifier des K-NN :

In [None]:
from sklearn.neighbors import KNeighborsClassifier

nbrs = KNeighborsClassifier(n_neighbors=3).fit(X_train, y_train)

print(100*(nbrs.predict(X_test) == y_test).mean())

In [None]:
distances, indices = nbrs.kneighbors(X_test)
prediction = nbrs.predict(X_test)

print("### On compare les prédictions au vrai résultats ###")
print(prediction == y_test)

wrong_predictions_indices = np.argwhere(prediction != y_test)
print(wrong_predictions_indices)

print("### On regarde les voisins trouvés et leur classe ###")
print(indices[wrong_predictions_indices])
print(y_train[indices[wrong_predictions_indices]])

print(y_test[wrong_predictions_indices])

# 2. K-NN as regression

https://scikit-learn.org/stable/auto_examples/neighbors/plot_regression.html#sphx-glr-auto-examples-neighbors-plot-regression-py

In [None]:
print(__doc__)

# Author: Alexandre Gramfort <alexandre.gramfort@inria.fr>
#         Fabian Pedregosa <fabian.pedregosa@inria.fr>
#
# License: BSD 3 clause (C) INRIA


# #############################################################################
# Generate sample data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import neighbors

np.random.seed(0)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
T = np.linspace(0, 5, 500)[:, np.newaxis]
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 1 * (0.5 - np.random.rand(8))

# #############################################################################
# Fit regression model
n_neighbors = 5

for i, weights in enumerate(['uniform', 'distance']):
    knn = neighbors.KNeighborsRegressor(n_neighbors, weights=weights)
    y_ = knn.fit(X, y).predict(T)

    plt.subplot(2, 1, i + 1)
    plt.scatter(X, y, color='darkorange', label='data')
    plt.plot(T, y_, color='navy', label='prediction')
    plt.axis('tight')
    plt.legend()
    plt.title("KNeighborsRegressor (k = %i, weights = '%s')" % (n_neighbors,
                                                                weights))

plt.tight_layout()
plt.show()