# Chapter 6 - Decision Trees

In [4]:
from sklearn.datasets import load_iris 
from sklearn.tree import DecisionTreeClassifier, export_graphviz

In [5]:
iris = load_iris() 
X = iris.data[:, 2:] # petal length and width 
y = iris.target

## Classifier

In [6]:
tree_clf = DecisionTreeClassifier(max_depth=2) 
tree_clf.fit(X, y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [8]:
# Create a graphic of our decision tree
export_graphviz(
    tree_clf,
    out_file='iris_tree.dot',
    feature_names=iris.feature_names[2:],
    class_names=iris.target_names,
    rounded=True,
    filled=True
)

### To view the graphic (windows instructions):
1. Download the Graphviz program from https://graphviz.gitlab.io/_pages/Download/Download_windows.html
2. Install the program, and add the `bin` folder to your system path
3. Run this command on the dot file: `$ dot -Tpng iris_tree.dot -o iris_tree.png`

### Decision Tree Result

![Image of Decision Tree](./iris_tree.png)

| Node Content | Meaning |  
| --- | --- |
| Condition | If... else condition to traverse nodes |
| Gini | Node "*Purity*". |
| Samples | Total Number of **training** instances it applies to |
| Value | Number of **training** instances it applies to by class |
| Class | The classification to return |

A node is **"pure"** if all training instances it applies to belong in the same class. That is **G<sub>i</sub> = 0**.

G <sub>i</sub> = 1 - $\sum_{k=1}^{n} P_i$ k <sup>2</sup> **... or** 

G <sub>i</sub> = 1 - $\sum_{k=1}^{n}$ ($\frac{Value_i}{Samples}$)<sup>2</sup>

E.G. for the Versicolor Node: 

G<sub>2</sub> = 1 - (  ($\frac{0}{54}$) <sup>2</sup> + ($\frac{49}{54}$) <sup>2</sup> + ($\frac{5}{54}$) <sup>2</sup>  )
= 0.168

Because the tree is easy to interpret and draw conclusions from, we call decision trees a whitebox model. Random Forest and Neural Network Models are considered much more complicated so they are blackbox models.