### For implementing the KNN algorithm for classification, we will be using the Iris-Flower dataset. 

Each example in the dataset has 4 attributes:
1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 

So each example is 4-dimentional. The dataset has 3 classes:
1. Iris Setosa 
2. Iris Versicolour 
3. Iris Virginica

So each example falls into one of the 3 above mentioned classes. The classification task here is no longer a __binary classification problem__ but a __multi-class classification problem__. You can read more about the dataset at: https://archive.ics.uci.edu/ml/datasets/iris

In [0]:
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split
import numpy as np

data = load_iris() #load the iris dataset
X = data.data
y = data.target
print(len(X)) #print number of examples

150


In [0]:
#split into train set and test set first by using the library function, 20% of the data goes to the test set
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2) 

In [0]:
print(len(X_train),len(X_test)) #number of examples in the train and test set

120 30


In [0]:
#split the training set again into training and vaidation set by using the library function, 
#with 20% of the training set examples going inside the validation set
X_train, X_validation, y_train, y_validation = train_test_split(X_train,y_train,test_size=0.2) 

In [0]:
print(len(X_train),len(X_validation),len(X_test)) #number of examples in the train, validation and test set

96 24 30


## Use scikit-learn to build the KNN model

In [0]:
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.metrics import accuracy_score

In [0]:
K = 3 #number of neighbors
model = KNN(n_neighbors=K) #initialize KNN model with n as the number of neighbors
model.fit(X_train,y_train) #fit the model/train the model
predictions = model.predict(X_validation) #get the predictions for all examples in the validation set
accuracy = accuracy_score(predictions,y_validation) #get the accuaracy on the validation set by using the built in function accuracy score
print(accuracy)

0.9166666666666666


## Do it yourself
You can see that using __K = 3__, results in a fairly good accuracy in the validation set, but it may not be the optimal value. What you need to do now is to find out the best value for K from a set of values which you must define yourself. Run the above process for each value of K and find out which value of K gives the maximum accuracy on the validation set. 

Then by using the best value for K, calculate the overall accuracy of the model on the test set.

In [0]:
def x(K):
    model = KNN(n_neighbors=K) #initialize KNN model with n as the number of neighbors
    model.fit(X_train,y_train) #fit the model/train the model

    predictions = model.predict(X_validation) #get the predictions for all examples in the validation set
    accuracy = accuracy_score(predictions,y_validation) #get the accuaracy on the validation set by using the built in function accuracy score

    testing_predictions = model.predict(X_test) 
    testing_accuracy = accuracy_score(testing_predictions,y_test) 

    return accuracy, testing_accuracy

l = []
for i in range(2,10):
    if x(i)[0] > 0.9:
        print(i, x(i))

2 (0.9166666666666666, 1.0)
3 (0.9166666666666666, 1.0)
4 (0.9166666666666666, 1.0)
5 (0.9166666666666666, 1.0)
6 (0.9166666666666666, 1.0)
7 (0.9166666666666666, 1.0)
8 (0.9583333333333334, 1.0)
9 (0.9583333333333334, 1.0)
