# A Preview of Machine Learning ٩(^ᴗ^)۶
![alt text](http://www.r2d3.us/static/pages/decision-trees-part-1/preview.png)
###### What is a Classifier?

A classifier is a function which takes an unlabeled (unclassified) data point and returns a predicted label. Each classifier is created using a learning algorithm, trained with a specific set of labeled data. 

###### What does it mean for a set of data to have labels? 
Think of a set of fruit widths, heights, colors, and weights. These measurements called features, so this set of data has four features. You give your learning algorithm this set of fruit observations along with the type of fruit (apple, banana, orange) corresponding to each data point. Apple, banana, and orange are the labels of this data set. The classifier trained using this data set and your learning algorithm of choice will be able to take a new input data of (width, height, color, weight) and give you a label: apple, banana, or orange.

You can see the goal of creating a classifier trained to a set of data is to be able to take a new, unlabeled (unclassified) data point and predict with high accuracy the label of this unlabeled point, using its knowledge of the labeled points.

In this homework, we will be using a specific learning algorithm called K-Nearest-Neighbors.


###### What is a KNN Classifier?
A KNN-Classifier is a classifier trained using the K-Nearest-Neighbors method. This classifier  slurps in a bunch of labeled data points, e.x. hundreds of fruit observations (width, height, color, weight) and then predicts the label of an unlabeled input by looking at the known data points closest to this input. 

Check [here](https://www.analyticsvidhya.com/blog/2014/10/introduction-k-neighbours-algorithm-clustering/) for a good visual explanatino of KNN.

And click [this](http://wittawat.com/knn_boundary.html) for an interactive version. The author provides two classes which you can add data points for by clicking on the grid. He displaysthe resulting decision boundaries in red and blue. Decision boundaries are areas of classification. So if you input a data point that falls in the blue region, the classifier will predict blue.

#### Example of some decision boundaries using three different training data sets

![alt text](http://perclass.com/doc/kb/images/16_knn_decisions.png)

# Matplotlib

A useful visualization library in Python is [Matplotlib](http://matplotlib.org/).
For this assignment, you'll only need to provide scatter plots of your data will colored points. This uses matplotlib.pyplot. Check [here](http://matplotlib.org/users/pyplot_tutorial.html) for a tutorial.

In [None]:
# Sets plots to appear inline with notebook
%matplotlib inline 
# Default import statement as "plt"
import matplotlib.pyplot as plt

##### Initialize some x's and correponding y's

In [None]:
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]

##### Default pyplot is a line graph

In [None]:
plt.plot(x, y) # Default line plot
plt.axis([0, 6, 0, 20]) # axis boundaries

##### Let's change this to a simple scatter plot by specifying the symbol as 'o'

In [None]:
plt.plot(x, y, 'o') # 'o' shaped data points. If you don't specify this, it'll show a line.
plt.axis([0, 6, 0, 20]) # axis boundaries

##### Add some labels and change the symbol

In [None]:
plt.plot(x, y, 'x') # 'x' shaped data points
plt.title("Example diagram") # Add a title
plt.xlabel('Feature One') # Add axis labels
plt.ylabel('Feature Two') # Add axis labels
plt.axis([0, 6, 0, 20])

## Everything You Need For The Homework's Plot
#### A little pizzazz

In [None]:
# Multiple classes of data
x1 = [1, 2, 3, 4]
y1 = [1, 4, 9, 16]
x2 = [1, 1, 1, 2, 2.5, 3, 3, 4]
y2 = [1, 2, 4, 4, 7.5, 9, 6, 16]

In [None]:
# Organize the data
data = {"apples": [x1, y1], 
        "bananas": [x2, y2]}
styles = {"apples": 'ro', 
          "bananas": 'yv'}

#Instead of subplots you can work with plt itself for the HW
fig, ax = plt.subplots()
ax.margins(0.05) # Add padding to graph margins

# Plot the different data groups
for group, data in data.items():
    ax.plot(data[0], data[1], styles[group], label=group)
    
# Add some customization
plt.title("Fruit Graph With Color Coding!") # Add a title
plt.xlabel('Fruit Height') # Add axis labels
plt.ylabel('Fruit Width') # Add axis labels
plt.axis([0, 6, 0, 20])

ax.legend() # Show the key

##add plt.show() here

### IMPORTANT Don't forget "plt.show()" at the end of your python code! Otherwise your graph won't pop up. Using ipython notebook with "%matplotlib inline"  lets the graph appear without this .show() call, so we don't have it here.