# Chapter 6. Decision Trees

- In this Chapter, we will start by discussing how to train, validate, and make predictions with decision trees.
- Then we will go through the CART training algorithm used by Scikit-Learn, we will discuss how to regularize trees and use them in regression tasks.
- Finally, we will discuss some of the limitations of decision trees.

## Training & Visualizing a Decision Tree

- To understand decision trees, let's start by building one and taking a look at its predictions.

In [2]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

In [3]:
iris = load_iris()

In [5]:
X = iris.data[:, 2:]  # Petal length and width
y = iris.target
X.shape, y.shape

((150, 2), (150,))

In [6]:
tree_clf = DecisionTreeClassifier(max_depth=2)

In [7]:
tree_clf.fit(X, y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')

- You can visualize the decision tree by using the `export_graphiz()` method to export a graph representation file then taking a look at it:

In [8]:
from sklearn.tree import export_graphviz

In [9]:
export_graphviz(tree_clf, 
                out_file='models/06/iris_tree.dot', 
                feature_names=iris.feature_names[2:],
                class_names=iris.target_names,
                rounded=True,
                filled=True)

Let's convert the graph file into a .png file:

In [10]:
! dot -Tpng models/06/iris_tree.dot -o static/imgs/iris_tree.png

And here it is:

<div style="text-align:center"><img style="width:33%" src="static/imgs/iris_tree.png"></div>

## Making Predictions

- To classify a new data point, you start at the root node of the graph (on the top), and you answer the binary questions and you reach the end leaf.
    - That end leaf represents your class.
        - It's really that simple!
- One of the many qualities of decision trees is that they require very little data preparation.
    - In fact, they don't require feature scaling or centering at all!
- A node's `samples` attribute counts how many training instances are sitting on the node.
- A node's `value` attribute tells you have many instances of each class are setting on the node.
- A node's `gini` attribute measures the nodes impurity (pure == 0)
- The following equation shows how the training algorithm computes the gini scores of the ith node:

$$G_i=1-\sum_{k=1}^n{p_{i,k}}^2$$

- Where $p_{i,k}$ is the ratio of class $k$ instances among the training instances in that particular node.
    - In our case: $k \in \{1,2,3\}$.
- Scikit-learn uses the CART algorithm, which produces only binary trees
    - Non-leaf nodes only have two children
- However, other algorithms such as ID3 can produce decision trees with nodes that have more than 2 children.
- The following figure shows the decision boundaries of our decision tree
    - Decision Trees tend to create lines/rectangles/boxes/.. and split the feature space linearly but iteratively.
 
<div style="text-align:center"><img style="width:50%" src="static/imgs/decision_tree_boundaries.png"></div>

- Decision Trees are intuitive, and their predictions are easily interpretable.
    - These types of models are called **white box** models.
- In contrast, as we will see, Random Forests and Neural Networks are generally considered Black Box models.

## Estimating Class Probabilities

- A decision tree can also estimate the probability that a certain instance belongs to a certain class.
    - It just returns the ratio of that class over the sum of all instances in the leaf.
- Let's check this in scikit-learn:

In [11]:
tree_clf.predict_proba([[5, 1.5]])

array([[0.        , 0.90740741, 0.09259259]])

In [12]:
tree_clf.predict([[5, 1.5]])

array([1])

- Interesting insight: you'll get the same probability as long as you're in a certain leaf box
    - It doesn't matter if your new data point gets closer to the decision boundaries.

## The CART Training Algorithm

- ...