# Recipe 5 - Writing Our First Classifier

<a href="https://www.youtube.com/watch?v=AoeEHqVSNOw
" target="\_blank"><img src="http://img.youtube.com/vi/AoeEHqVSNOw/0.jpg" 
width="480" height="360" border="10" /></a>

---

__Description:__ We create our own KNN classifier. We use some of the code from Recipe 4 because classifiers tend to have the same interface for using them. The actual classifier itself we write from scratch.

In [11]:
import random

# Lets create a random classifier first
class ScrappyKNNRandom():
    
    # This function just takes in the data
    def fit(self, x_train, y_train):
        self.x_train = x_train
        self.y_train = y_train
    
    # This function gives a random choice for each dataset
    def predict(self, x_test):
        predictions = []
        for row in x_test:
            label = random.choice(self.y_train) #Choose a random label
            predictions.append(label)
        
        return predictions

from scipy.spatial import distance

def euc(a,b):
    return distance.euclidean(a,b)

# Lets create a real classifier next
class ScrappyKNN():
    
    # This function just takes in the data
    def fit(self, x_train, y_train):
        self.x_train = x_train
        self.y_train = y_train
    
    # This function applies an algorithm to predict
    def predict(self, x_test):
        predictions = []
        for row in x_test:
            label = self.closest(row) #Take each row of data and find it's closest point to find out what it is
            predictions.append(label)
        
        return predictions
    
    # This function iterates through each element in our training data and finds the smallest distance.
    def closest(self, row):
        best_dist = euc(row, self.x_train[0])
        best_index = 0
        for i in range(1, len(self.x_train)):
            dist = euc(row, self.x_train[i])
            if dist < best_dist:
                best_dist = dist
                best_index = i
        return self.y_train[best_index]

In [12]:
from sklearn import datasets
iris = datasets.load_iris()

x = iris.data
y = iris.target

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = .5) #Split training data in half


#from sklearn.neighbors import KNeighborsClassifier
my_classifier1 = ScrappyKNNRandom()
my_classifier1.fit(x_train, y_train)

predictions = my_classifier1.predict(x_test)

from sklearn.metrics import accuracy_score
print("The accuracy of our random classifier is", accuracy_score(y_test, predictions))


my_classifier2 = ScrappyKNN()
my_classifier2.fit(x_train, y_train)

predictions = my_classifier2.predict(x_test)
print("The accuracy of our real classifier is", accuracy_score(y_test, predictions))

The accuracy of our random classifier is 0.24
The accuracy of our real classifier is 0.946666666667


## KNN Classifier
__Description:__ In short the KNN classifier finds the nearest datapoint near an object and considers that as a possible label. The accuracy of a KNN classifier may depend on the number of neighbors, "K". For instance if a datapoint is in the middle between two neighbors, then maybe you might need to find out which label is in the majority within a group of the nearest 11 neighbors.

__Pros__
- Relatively simple

__Con__
- Computationally intensive (iterating through a lot)
- Hard to represent relationships between features