# Decision Trees

A [Decision Tree](https://scikit-learn.org/stable/modules/tree.html) (DT) is a type of model that is used for classification and regression. The goal of the DT is to use the data to generate a series of if-else statements that lead to a prediction. 

#### Some of the benefits of Decision Trees:

- Simple to understand and to interpret. Trees can be visualised.
- Able to handle both numerical and categorical data.
- Able to handle multi-output problems.
- Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.


#### Some of the disadvantages of Decision Trees:

- Easy to create overly complex models that overfit data
- Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. This problem is mitigated by using decision trees within an ensemble.


### Import dependencies

[Iris](https://medium.com/codebagng/basic-analysis-of-the-iris-data-set-using-python-2995618a6342) is a standard dataset that is widely used for demonstrating ML classification models. The content within this dataset details attributes of 3 classes of Iris plants.

In [None]:
from sklearn import tree

# loading a test dataset
from sklearn.datasets import load_iris
iris = load_iris()
print(iris.DESCR)

### View a sample of the data

In [None]:
print("Feature Names", iris.feature_names)
print("Values: ", iris.data[:5])
print("Targets: ", iris.target[:5])

### Fit a *DecisionTree* model to our data

In [None]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

### View Decision Tree

This next cell is not critical for the model. It simply visualizes the decision tree. If this cell does not run after you install both graphviz and pydotplus, do not worry about it. In the real world, decision trees are oftentimes too complex to visualize anyway. This has no influence on the model. You can see a provided [image of the model here](iris.png).

In [None]:
import graphviz 
dot_data = tree.export_graphviz(
    clf, out_file=None, 
    feature_names=iris.feature_names,  
    class_names=iris.target_names,  
    filled=True, rounded=True,  
    special_characters=True)  

import pydotplus
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_png('iris.png')

graph = graphviz.Source(dot_data)  
graph 