# Day 3
  
## Analysis With Python Pandas

# Agenda

* What is Machine Learning?
* ML Vocabulary Overview
* Closer Look at Decision Trees

* Data Viz Crash Course

* Basics of Graph Theory
* Measuring Importance
* Mapping in NetworkX

<h1> What is Machine Learning? </h1>

<img src="images/ML_robot.png" style="width: 300px;" align="middle"/>


<img src="images/ML_types.png" style="width: 1200px;" align="middle"/>

There are seveal different types of machine learning.  In basic terms, "Supervised" means you give your algorithm feedback on what kind of results you want to see, and "Unsupervised" means you just tell your alogirthm to find patterns all on its own. (More on this later)

<img src="images/class_reg.png" style="width: 1200px;" align="middle"/>

http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

# DecisionTrees in SK Learn


<img src="images/irises.png" style="width: 500px;" align="middle"/>
---
<img src="images/schem_iris.gif" style="width: 500px;" align="middle"/>

The "Irises Problem" is basically the "hello world" of Machine learning.  By looking at the sepal length/width and petal length/width, it is possible to classify different sub-species of irsises using machine learning.

<img src="images/gsgscmat.gif" style="width: 500px;" align="middle"/>

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import pandas as pd

In [None]:
iris = pd.read_csv('data/iris_train.csv')
iris.head()

Given these data points, we are going to create a "classifier" to identify which three of the different iris types we are looking at.

## Convert string types to integer representation

In order to train an algorithm, you need to convert all of your string data into something that a mathmatical formula can process...like numbers! So we need to convert any fields that we deem important to our training, into integers or floats.  We do this using the map() command in pandas.

In [None]:
iris['type_code'] = iris['type'].map( {'versicolor': 0,
                                       'virginica': 1,
                                       'setosa': 2} ).astype(int)

In [None]:
iris.columns

# Building the model

In [None]:
clf = DecisionTreeClassifier(max_depth=1)

clf.fit()

Signature: clf.fit(X, y, sample_weight=None, check_input=True, X_idx_sorted=None)
Docstring:
Build a decision tree from the training set (X, y).


In [None]:
clf.fit(iris[['sepal length (cm)', 
             'sepal width (cm)', 
             'petal length (cm)',
             'petal width (cm)']].values,
              iris['type_code'])

## Testing our Alogorithm

In [None]:
iris_test = pd.read_csv('data/iris_test.csv')

#convert our testing data using the same mapper
iris_test['type_code'] = iris_test['type'].map( {'versicolor': 0,
                                                 'virginica': 1,
                                                 'setosa': 2} ).astype(int)
iris_test

In [None]:
def iris_model_tester():
    wrong = 0
    correct = 0
    for row in iris_test.values:
        guess = clf.predict([row[0:4]])
        if guess[0] != row[-1]:
            wrong+=1
        else:
            correct +=1
    print("{}% Accuracy Rate".format(correct/len(iris_test)))

In [None]:
iris_model_tester()

<img src="images/plot_d1-1.png" style="width: 400px;" align="middle"/>

In [None]:
clf = DecisionTreeClassifier(max_depth=2) #change the split to two
clf.fit(iris[['sepal length (cm)', 
             'sepal width (cm)', 
             'petal length (cm)',
             'petal width (cm)']].values,
              iris['type_code'])

<img src="images/tree_d2-2.png" style="width: 400px;" align="middle"/>

In [None]:
iris_model_tester()

<img src="images/plot_d2-2.png" style="width: 400px;" align="middle"/>

# Machine Learning Exercise

* Read in the "Free Day" Data Set
* This Data consists of 150 different records
* 4 features: Weather, Are Parents Visiting?, Bank Account
* 4 targets: stay in, play tennis, movies, shopping


### Steps:
1. Read the test and train datasets
2. Convert the string type features to int
3. Experiment with different levels of depth
4. Determine best depth - avoid overfitting

# Answer