# Instance Based Method using KNN on the Iris Data set
----------
### Step (1): Environment Setup
In order to find out, what's the classification of the Iris given by it's properties *`[ 4.8, 2.5, 5.3, 2.4 ]`*, let's setup the environment first:

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets, neighbors
from sklearn.model_selection import train_test_split
# load the iris data set
iris = datasets.load_iris()

----------
### Step (2): Looking what's in the iris data set
The Iris data set contains the following keys to retrieve information from:

In [2]:
iris.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

... it contains the following *Iris classifications*:

In [3]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

... and features:

In [4]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

----------
### Step (3): Performing the KNN classification
To perform the KNN classification, `sklearn`offers two ways in doing so.<br />
But before we start we convert the iris data set into a pandas data frame for better handling:

In [5]:
# assemble a data frame
iris_df = pd.DataFrame(iris.data)
iris_df.columns = iris.feature_names
iris_df['classes'] = iris.target

With this dataframe in place, our data we will perform<br />
the KNN with in just a seconds looks like this now:

In [6]:
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),classes
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


The next preparation step is to already define our *Iris Sample* which we want to classify given by the properties *`[ 4.8, 2.5, 5.3, 2.4 ]`* <br />
at this point (for usage in both approaches):

In [7]:
iris_sample = np.array([4.8, 2.5, 5.3, 2.4]).reshape(1, -1)
iris_sample

array([[4.8, 2.5, 5.3, 2.4]])

---------------
#### Approach #1:
We start instanciating a KNN classifier setup ***`k = 10`***,<br />
which will allow us to claculate and select *10* nearest neighbors:

In [8]:
knn_classifier_1 = neighbors.NearestNeighbors(10)
knn_classifier_1.fit(iris_df.drop(['classes'], 1))

NearestNeighbors(algorithm='auto', leaf_size=30, metric='minkowski',
         metric_params=None, n_jobs=1, n_neighbors=10, p=2, radius=1.0)

With this KNN-clasifier in place, we can perform the clasification on the *Iris Sample*:

In [9]:
result = knn_classifier_1.kneighbors(iris_sample, 10)

What we retrieved from this classification approach is a two-dimensional array<br />
containing an **array with all distances** and an **array containing the index from all nearest neighbors**:

In [10]:
# get the distances and index of the five most nearest neighbors found
distances = result[0][0]
nearest_neighbors = result[1][0]
# display these found neighbors from the dataframe
print('Distances:\n', distances, '\n\nNeighbor Index:\n', nearest_neighbors)

Distances:
 [1.02469508 1.02956301 1.06301458 1.06770783 1.15325626 1.15325626
 1.36381817 1.43527001 1.46969385 1.51657509] 

Neighbor Index:
 [121 113 114 106 101 142 149  84  83 138]


What this means is that we found the following rows from the data frame, <br />which represents the nearest neighbors regarding our *Iris Sample*:

In [11]:
iris_df.loc[nearest_neighbors]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),classes
121,5.6,2.8,4.9,2.0,2
113,5.7,2.5,5.0,2.0,2
114,5.8,2.8,5.1,2.4,2
106,4.9,2.5,4.5,1.7,2
101,5.8,2.7,5.1,1.9,2
142,5.8,2.7,5.1,1.9,2
149,5.9,3.0,5.1,1.8,2
84,5.4,3.0,4.5,1.5,1
83,6.0,2.7,5.1,1.6,1
138,6.0,3.0,4.8,1.8,2


But what's the classification of our *Iris Sample*?<br />
To find this out, we need to calculate the maximum from our distance array we got from the classification result, <br />
determine the index of this maximum and with this index, we can retrieve the classification index:



In [12]:
maximum = distances.max()
distance_index = np.where(distances==maximum)[0][0]
neighbor_index = nearest_neighbors[distance_index]
class_index = iris_df['classes'][neighbor_index]
class_index

2

With this classification index calculated, we can get the *Iris classification name*:

In [13]:
iris.target_names[class_index]

'virginica'

### ==> The classification of the given Iris is *Iris-Virginica*

---------------
#### Approach #2:
We start splitting our dataframe into a training and test data sets first:

In [14]:
X = np.array(iris_df.drop(['classes'], 1))
y = np.array(iris_df['classes'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Then, we can perform some training session on a ***KNeighborClassifier*** instance we can use:

In [15]:
knn_classifier_2 = neighbors.KNeighborsClassifier()
knn_classifier_2.fit(X_train, y_train)
accuracy = knn_classifier_2.score(X_test, y_test)
print('Accuracy: %.2f' % accuracy)

Accuracy: 0.97


With this KNN classifier in place, we can now do some prediction of the *Iris Sample* we declared previously:

In [16]:
prediction = knn_classifier_2.predict(iris_sample)
prediction[0]

2

What we got as prediction is the predicted classification index.<br />
As before in approach #1, with this index we now can determine our *Iris classification name*:

In [17]:
iris.target_names[prediction[0]]

'virginica'

### ==> The classification of the given Iris is *Iris-Virginica*