# Predict Classes with a kNN Classifier
## Mini-Lab 1: Repurposing the Classifier

Welcome to your final mini-lab! Go ahead an run the following cell to get started. You can do that by clicking on the cell and then clickcing `Run` on the top bar. You can also just press `Shift` + `Enter` to run the cell.

In [None]:
from datascience import *
import numpy as np
import otter

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

grader = otter.Notebook("m12_l1_tests")

For our final lab, we'll take things into our own hands and build a classifier from scratch using the k-Nearest Neighbors algorithm. A lot of the code for the classifier has been build for you already but instead of reinventing the wheel we'll be repurposing it instead for another dataset. Go ahead and run the cell below to import the kNN code.

In [None]:
def distance(point1, point2):
    """Returns the distance between point1 and point2
    where each argument is an array
    consisting of the coordinates of the point"""
    return np.sqrt(np.sum((point1 - point2)**2))

def all_distances(training, new_point):
    """Returns an array of distances
    between each point in the training set
    and the new point (which is a row of attributes)"""
    attributes = training.drop('Class')
    def distance_from_point(row):
        return distance(np.array(new_point), np.array(row))
    return attributes.apply(distance_from_point)

def table_with_distances(training, new_point):
    """Augments the training table
    with a column of distances from new_point"""
    return training.with_column('Distance', all_distances(training, new_point))

def closest(training, new_point, k):
    """Returns a table of the k rows of the augmented table
    corresponding to the k smallest distances"""
    with_dists = table_with_distances(training, new_point)
    sorted_by_distance = with_dists.sort('Distance')
    topk = sorted_by_distance.take(np.arange(k))
    return topk

def majority(topkclasses):
    ones = topkclasses.where('Class', are.equal_to(1)).num_rows
    zeros = topkclasses.where('Class', are.equal_to(0)).num_rows
    if ones > zeros:
        return 1
    else:
        return 0

def classify(training, new_point, k):
    closestk = closest(training, new_point, k)
    topkclasses = closestk.select('Class')
    return majority(topkclasses)

This code was specfically built for the wine dataset which we'll import and test below. The classifier should output `1`.

In [None]:
wine = Table().read_table("../datasets/wine.csv")
classify(wine, wine.drop("Class").rows[0], 5)

Cool right? Sadly, if we try to use this classifier for a different purpose it wouldn't work out. Below we have imported the data for 500 NBA atheletes in 2013. Our final task then will be to reprupose this classifier to not only classify NBA athletes for `Position` based on `Height`, `Weight`, and `Age`, but also to classify between three separate positions rather than the standard binary classification. Go ahead and run the next cell to import our NBA dataset.

In [None]:
nba = Table().read_table("../datasets/nba2013.csv").drop("Name")
nba.show(5)

If we try running the `classify` function again we'll run into issues. To get around this, we'll have to modify our current code in order to classify based in our NBA dataset rather than the wine dataset. Below are the three functions that you need to modify. change these functions so that that we can classify a different dataset!

*Note*: Two of these functions only need minor changes whereas the last function needs a major rehaul in order to function correctly. Can you identify which are which? Also, you may find the following code snippet useful:

```
positions = make_array("Guard", "Center", "Forward")
positions[np.argmax(make_array(x, y, z))]
```

If the variables `x`, `y`, and `z` correspond to the number of `Guard`, `Center`, and `Forward` athletes, then the code snippet return which position has a majority. For example, if `x` = 2, `y` = 6, and `z` = 4, then this code snippet would return `Center`. Where would this be useful?

In [None]:
def all_distances(training, new_point):
    """Returns an array of distances
    between each point in the training set
    and the new point (which is a row of attributes)"""
    attributes = training.drop('Class')
    def distance_from_point(row):
        return distance(np.array(new_point), np.array(row))
    return attributes.apply(distance_from_point)


def majority(topkclasses):
    ones = topkclasses.where('Class', are.equal_to(1)).num_rows
    zeros = topkclasses.where('Class', are.equal_to(0)).num_rows
    if ones > zeros:
        return 1
    else:
        return 0


def classify(training, new_point, k):
    closestk = closest(training, new_point, k)
    topkclasses = closestk.select('Class')
    return majority(topkclasses)

In [None]:
#  This cell should output "Guard" if you implemented the above fixes correctly.
classify(nba, nba.drop("Position").rows[16], 15)

Congratulations! Not only have you repurposed a classidier and finished the final lab, you've also completed this course!

In [None]:
grader.check_all()