# A Visual Introduction to Machine Learning
* Let's look at that example with the tools we're going to be using for this course
* Don't worry, at this point you're not expect to know the details...

In [None]:
# Pandas is a commonly-used data analysis tool for Python
import pandas as pd

In [None]:
# Read in the SF vs. NY data and take a look...
data = pd.read_csv('https://raw.githubusercontent.com/jadeyee/r2d3-part-1-data/master/part_1_data.csv', skiprows=2)
data.head()

In [None]:
# sklearn (SciKit-learn) is a popular ML package we will be using
from sklearn.tree import DecisionTreeClassifier

## Digression: Terminology
* It's common in ML for the input (i.e., the data used to train the model) to be labeled __`X`__ and the target (i.e., what you are trying to predict) to be labeled __`y`__
* Note that what we want to predict is whether the home is in SF or NY
* The first column tells us that by using a "dummy variable", an integer whose value is 0 if the home is in NY or 1 if it's in SF
* We need to remove that column and identify it as __`Y`__

In [None]:
X = data
y = X.in_sf
X = X.drop(columns=['in_sf'])

In [None]:
X

In [None]:
y

## Let's create a Decision Tree...

In [None]:
tree = DecisionTreeClassifier(max_depth=2)

## Now let's train the Decision Tree on our data
* this process is called _fitting_ the data
* once we fit the data, we have a _model_, which is essentially a predictor
  * we "feed" the model some data describing a home...
  * ...and it will predict whether the home is in SF or NY

In [None]:
tree.fit(X, y)

## Let's take a look at the decision tree...

In [None]:
# We'll use a tool called graphviz to generate a graph (.dot file)
# If it doesn't work, we'll use scikit-learn to see the tree...
from sklearn.tree import export_graphviz
export_graphviz(tree, out_file="sf_vs_ny_tree.dot",
               feature_names=X.columns,
               class_names='NY SF'.split(),
               rounded=True,
               filled=True)

In [None]:
# The dot command will convert the tree from a .dot file to a .png
!dot -Tpng sf_vs_ny_tree.dot -o sf_vs_ny.png
from IPython.display import Image
Image('sf_vs_ny.png')

In [None]:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 7))
plot_tree(tree, feature_names=X.columns);

## Now let's check our accuracy...

In [None]:
tree.score(X, y)