# Basic kNN Classifier
**Iris flowers** dataset:
This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray

The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.  

<img src='./iris.JPG'>

More info: [Wikipedia link](https://en.wikipedia.org/wiki/Iris_flower_data_set).

In [3]:
import pandas as pd

from sklearn import neighbors        # neighbors.KNeighborsClassifier
from sklearn import model_selection  # model_selection.train_test_split
from sklearn import preprocessing    # preprocessing.StandardScaler
from sklearn import metrics          # metrics.accuracy_score, metrics.confusion_matrix, metrics.classification_report

from termcolor import colored

### Read our datasets

In [4]:
df = pd.read_csv("../datasets/iris_pandas.csv")

In [5]:
df.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Name
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [11]:
features = df.columns[:4].tolist()  # Extract features
                                    # If we wante to specify specific features:
                                    # features = ["Petal_Length",  "Petal_Width"]
features

['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']

In [26]:
x = df[features]  # pandas.core.frame.DataFrame, (150, 4)
x.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [54]:
y = df['Name']  # pandas.core.frame.DataFrame, (150,)
y.head()

0    Iris-setosa
1    Iris-setosa
2    Iris-setosa
3    Iris-setosa
4    Iris-setosa
Name: Name, dtype: object

### Split data 

In [55]:
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, train_size=0.8, test_size=0.2, stratify=y)

### Preprocessing

In [56]:
scaler = preprocessing.StandardScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

### kNN model

In [57]:
k = 4  # Number of neighbours to try
p = 1  # Power for Minkowski metrics (Taxi)

r = neighbors.KNeighborsClassifier(n_neighbors=k, p=p)

In [58]:
r.fit(x_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=4, p=1,
           weights='uniform')

### Model evaluation

In [64]:
y_pred = r.predict(x_test)

In [66]:
# Get accuracy metric
metrics.accuracy_score(y_test, y_pred)

0.9666666666666667

In [67]:
# Create confusion matrix and display it
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
print("Confusion matrix", cnf_matrix, sep="\n")
print("\n")

Confusion matrix
[[10  0  0]
 [ 0 10  0]
 [ 0  1  9]]




In [70]:
# Get other classification metrics
class_report = metrics.classification_report(y_test, y_pred)
print("Classification report", class_report, sep="\n")

Classification report
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       0.91      1.00      0.95        10
 Iris-virginica       1.00      0.90      0.95        10

    avg / total       0.97      0.97      0.97        30

