# Implenting Custom Classifier

In this example, we implemented our first custom K-Nearest Neighbors and then we compare it accuracy with the sklearn KNN classifier. For this we are using IRIS dataset and detail about the IRIS dataset can be found [here](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)

### Iris Dataset 
The Iris dataset is consists of 3 different types of irises (Setosa Versicolour ad Virginica) petail and sepal length stored in a 150x4 numpy,ndarray. More detail can be [found here](https://en.wikipedia.org/wiki/Iris_flower_data_set)

So, we have 4 features ( Sepal length, Sepal Width, Petal Length, Petal Width) and we have one Output (Species), it is the name of Iris or in terms of machine learning, the name of the class to which it belongs

#### Importing Dataset 
In the first step, we are importing the iris dataset and then partitioning it into test and training dataset


In [53]:
from sklearn import datasets
iris = datasets.load_iris()

# X = inputs for the classifier
X = iris.data
#print (X)

# y = ouput 
y = iris.target


In [42]:
# We can either manually partition dataset into test and training dataset or either use cross validation
from sklearn.cross_validation import train_test_split

#help(train_test_split)
# Using half of the dataset for testing
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.7)


## K-Neighbors Classifier
#### 1) Creating custom KNN classifier

In [48]:
import random
from scipy.spatial import distance

def getdistance(point1 , point2):
    return distance.euclidean(point1, point2)

class CustomKNNClassifier():
    def fit(self,X_train, y_train):
        self.X_train = X_train
        self.y_train = y_train
    
    def predict(self,X_test):
        predictions = []
        for row in X_test:
            label = self.closestlabel(row)
            predictions.append(label)
        
        return predictions
    
    def closestlabel(self,row):
        best_index = 0
        best_dist = euc(row,self.X_train[0])
        
        for i in range(1, len(self.X_train)):
            dist = getdistance(row,self.X_train[i])
            if best_dist > dist:
                best_dist = dist
                best_index = i
        
        return self.y_train[best_index]
    
    
    

In [49]:
my_classifier = CustomKNNClassifier()
my_classifier.fit(X_train, y_train)

predictions = my_classifier.predict(X_test)
print (predictions)


[2, 1, 1, 1, 1, 2, 2, 2, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 2, 1, 0, 1, 0, 1, 2, 1, 0, 1, 0, 1, 0, 1, 1, 2, 0, 2, 2, 0, 0, 1, 1, 2, 2, 0, 1, 0, 1, 2, 1, 2, 2, 1, 1, 1, 1, 0, 2, 0, 0, 2, 2, 2, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 0, 0, 1, 1, 1, 0, 2, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 0, 0, 2, 2, 1, 1, 1, 0, 0, 0, 0, 1, 1, 2, 2]


In [50]:
# Checking accuracy of the classifier
from sklearn.metrics import accuracy_score
accuracy_score(y_test,predictions)

0.93333333333333335

#### 2) Using Sklearn KNN 

In [51]:
from sklearn.neighbors import KNeighborsClassifier
classifier = KNNClassifier()
classifier.fit(X_train, y_train)

predict = classifier.predict(X_test)
# Checking accuracy of the classifier
from sklearn.metrics import accuracy_score
accuracy_score(y_test,predict)

0.93333333333333335

### Summary
#### Pros:
1. Relatively easy and simple to implement and understand

#### Cons
1. Computationally intensive
2. Relationships between the features are hard to represent
