# Ch.6 -  Decision Trees

- Regression
- Classification
- Multi-output tasks

## Introduction

### Training and Visualizing a Decision Tree

To create a Decision Tree classifier, we can again use scikit-learn. Building one and then seeing what it does is apparently the quickest route. Here, we do that with the iris flower dataset.

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load the iris dataset
iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

# Train a DecisionTreeClassifier on the iris dataset
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')

In [2]:
from sklearn.tree import export_graphviz

export_graphviz(
        tree_clf,
        out_file="iris_tree.dot",
        feature_names=iris.feature_names[2:],
        class_names=iris.target_names,
        rounded=True,
        filled=True
    )

We use the following command to convert the output to a form like a .png:

**dot -Tpng iris_tree.dot -o iris_tree.png**

This can be done in the command line. It produces the following image:

![iris_tree.png](attachment:iris_tree.png)

This is... really freaking awesome.

### How a Decision Tree Makes Predicitions

Start at the top (the _root node_). The first line in the box is the question. Is the petal width less than 0.8 cm? From there, it is a choose your own adventure. Follow the answer to each question until you reach an end point (a _leaf node_), which denotes the classification. 

Advantages to Decision Trees - little data preparation (no feature scaling or centering usually needed)

Some of the fields in the boxes...

1. _Samples_: # of training samples its parent node statement applied to
2. _Value_: How many samples in the training set this statement applied to (for the versicolor, it applied to 0 Iris-sertosa, 49 of the Iris-versicolor, and 5 of the Iris-virginica).
3. _Gini_: Measures "purity", or the degree to which the node contains instances belonging to the same class.

To calculate the Gini impurity...

$$ \boxed { G_i = 1 - \sum_{k=1}^{n} {p_{i,k}^2} } $$

Where $p_{i,k}^2$ is the ratio of class _k_ instances among training instances. In our example above...

$$ G_2 = 1 - \frac{0}{54}^2 - \frac{49}{54}^2 - \frac{5}{54}^2 \approx 0.168$$

In [4]:
g2 = 1.0 - ((0.0/54.0)**2) - ((49.0/54.0)**2) - ((5.0/54.0)**2)
print("G_2 (Gini Score on Node 2): ", g2)

G_2 (Gini Score on Node 2):  0.1680384087791495


Scikit-learn uses the **CART** algorithm - produces only binary trees (nonleaf nodes only have two branches, TRUE or FALSE, like 20 questions). **ID3** algorithm can produce non-binary branches.

Interpreting a decision tree's classification can be seen below. Unlike neural nets, this is not a black box. We can see the process that is made in making a decision:

![decision_tree_classification_graph_example.PNG](attachment:decision_tree_classification_graph_example.PNG)

### Estimating Class Probabilities

