## We need graphviz for visualization

There's a [tutorial](https://bobswift.atlassian.net/wiki/spaces/GVIZ/pages/20971549/How+to+install+Graphviz+software) to help with the installation, specially the extra stuff needed for Windows or OSX.

In [None]:
!pip install graphviz  # graphviz for python

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
%matplotlib inline

import graphviz
import matplotlib.pyplot as plt
import numpy as np

from sklearn import tree
from sklearn import datasets
from sklearn import model_selection
from sklearn.metrics import classification_report

# Decision trees

(example from sklearn)

In [None]:
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, test_size=0.33, random_state=3)

In [None]:
clf = tree.DecisionTreeClassifier(max_depth=2)
clf = clf.fit(X_train, y_train)

In [None]:
dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)
graph = graphviz.Source(dot_data)
graph

In [None]:
predictions = clf.predict(X_train)
print(classification_report(y_train, predictions, target_names=["setosa", "versicolor", "virginica"]))

### Increasing the depth...

In [None]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)

In [None]:
dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)
graph = graphviz.Source(dot_data)
graph

In [None]:
predictions = clf.predict(X_train)
print(classification_report(y_train, predictions, target_names=["setosa", "versicolor", "virginica"]))

### And what if we look at the accuracy over the test data?

In [None]:
predictions = clf.predict(X_test)
print(classification_report(y_test, predictions, target_names=["setosa", "versicolor", "virginica"]))

## Regression Trees

In [None]:
boston = datasets.load_boston()

X = boston.data[:, 12]  # Only using the LSTAT feature (percentage of lower status of the population)
y = boston.target

# Sort X and y by ascending values of X

sort_idx = X.flatten().argsort()
X = X[sort_idx].reshape(-1, 1)
y = y[sort_idx]

In [None]:
clf = tree.DecisionTreeRegressor(max_depth=3, criterion="mse")
clf = clf.fit(X, y)

### What do the leafs return in this case?

In [None]:
dot_data = tree.export_graphviz(clf, out_file=None, 
                         filled=True, rounded=True,  
                         special_characters=True)
graph = graphviz.Source(dot_data)
graph

### Let's check it out

In [None]:
plt.figure(figsize=(16, 8))
plt.scatter(X, y, c='steelblue',
            edgecolor='white', s=70)
plt.plot(X, clf.predict(X),
         color='black', lw=2)
plt.xlabel('% lower status of the population [LSTAT]')
plt.ylabel('Price in $1000s [MEDV]')
plt.show()