# K-nearest neighbors (kNN) 

L'objectif de ce problème est de développer un modèle de classification capable de classer des iris en fonction de leurs caractéristiques botaniques. Le dataset Iris est composé de mesures de quatre variables (longueur et largeur des sépales, longueur et largeur des pétales) pour trois espèces d'iris : setosa, versicolor et virginica.

<table>
    <tr>
        <th>
            <img src="https://www.nassimbahri.ovh/docs/cours/content/data-science/img/iris_setosa.jpeg">
        </th>
        <th>
            <img src="https://www.nassimbahri.ovh/docs/cours/content/data-science/img/iris_versicolor.jpeg">
        </th>
        <th>
            <img src="https://www.nassimbahri.ovh/docs/cours/content/data-science/img/iris_virginica.jpeg">
        </th>
    </tr>
    <tr>
        <th style="text-align:center">iris setosa</th>
        <th style="text-align:center">iris virginica</th>
        <th style="text-align:center">iris versicolor</th>
    </tr>
</table>
<img src="https://www.nassimbahri.ovh/docs/cours/content/data-science/img/features.jpeg" style="width:500px">

### Étape 1 : Chargement de l'ensemble de données

In [None]:
import pandas as pd
iris = pd.read_csv('iris.csv')
iris

In [None]:
iris.shape

In [None]:
iris.describe()

In [None]:
iris.head()

In [None]:
iris['species'].value_counts()

In [None]:
import matplotlib.pyplot as plt
iris.hist()
plt.show()

In [None]:
import numpy as np
species_name = {'Iris-versicolor':0,'Iris-virginica':1,'Iris-setosa':2}
color = [species_name[item] for item in iris['species']]
scatter = plt.scatter(iris['sepal_length'],iris['sepal_width'],c=color)
plt.xlabel('Sepal Length (en cm)')
plt.ylabel('Sepal Width (en cm)')
plt.legend(handles = scatter.legend_elements()[0],labels=species_name)
plt.show()

In [None]:
species_name = {'Iris-versicolor':0,'Iris-virginica':1,'Iris-setosa':2}
color = [species_name[item] for item in iris['species']]
scatter = plt.scatter(iris['petal_length'],iris['petal_width'],c=color)
plt.xlabel('Petal Length (en cm)')
plt.ylabel('Petal Width (en cm)')
plt.legend(handles = scatter.legend_elements()[0],labels=species_name)
plt.show()

### Étape 2 : Analyse et Visualisation des données

### Étape 3 : Préparation des données de training/test

In [None]:
y = iris["species"]
X = iris.drop(["species"], axis=1)

In [None]:
print("Shape X: ",X.shape)
print("Shape y: ",y.shape)

In [None]:
X.head()

In [None]:
y.head()

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [None]:
print(X_train.shape)
print(X_test.shape)

### Étape 4 : Normalisation des données

In [None]:
from sklearn.preprocessing import MinMaxScaler
s = MinMaxScaler()
X_train = s.fit_transform(X_train)
X_test = s.fit_transform(X_test)

In [None]:
X_test

### Étape 5 : Appliquer l'algorithme KNN

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train,y_train)

In [None]:
y_pred = knn.predict(X_test)

In [None]:
y_pred

### Evaluation

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
accuracy = accuracy_score(y_pred, y_test)
accuracy

In [None]:
confusion_matrix(y_pred,y_test)

In [None]:
print(classification_report(y_pred,y_test))