# **Decision Trees** (DT)

Import common modules. Make sure matplotlib plots figures inline. Check Python 3 or later is installed (Python 2.x may work, but it is deprecated in colab, so better to move to v3). Check sklearn ≥0.20 is installed.

In [0]:
# Python ≥3 is required
import sys
assert sys.version_info >= (3)
#for py3.5: assert sys.version_info >= (3, 5)


# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# to make the notebook's output stable across subsequent runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

In [0]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2, random_state=42)
tree_clf.fit(X, y)

In [0]:
from graphviz import Source
from sklearn.tree import export_graphviz

export_graphviz(
        tree_clf,
        out_file=os.path.join("./iris_tree.dot"),
        feature_names=iris.feature_names[2:],
        class_names=iris.target_names,
        rounded=True,
        filled=True
    )

Source.from_file(os.path.join("./iris_tree.dot"))

In [0]:
tree_clf.predict_proba([[5, 1.5]])

In [0]:
tree_clf.predict([[5, 1.5]])

## <font color=red>Exercise 1</font>

Train and fine-tune a Decision Tree for the moons dataset by following these steps:


1.   Use `make_moons(n_samples=10000, noise=0.4)` to generate a moons dataset
2.   Use `train_test_split()` to split the dataset into a training set and a test set
3.   Use grid search with cross-validation (with the help of the `GridSearchCV` class) to find good hyperparameter values for a DecisionTreeClassifier (hint: try various values for max_leaf_nodes
4.   Train it on the full training set using these hyperparameters, and measure your model’s performance on the test set. 

You should get roughly 85% to 87% accuracy.

### <font color='green'>Solution</font>

In [0]:
# type your code below

_Credits: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd Edition) by Aurélien Géron, O'Reilly Media Inc., 2019_