# Decision Trees Tutorial  
## Pruning 
If a decision tree is allowed to be too *bushy* it is likely to overfit the training data.  
Consequently decision trees are often pruned to prevent overfitting.  
In the example below (Iris Data) we use the `min_samples_leaf` attribute to control the size of the tree.  
1. What does the Iris Data tree look like when no pruning is enforced?
2. What other options does sklearn provide to manage the bushiness of the tree? https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
3. Use two other pruning strategies to produce similar trees.

In [None]:
import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

penguins_all = pd.read_csv('penguins_af.csv')
f_names = ['bill_length_mm', 'bill_depth_mm','flipper_length_mm', 'body_mass_g']
X = penguins_all[f_names].values
y = penguins_all['species']
species_names = np.unique(y)

In [None]:
ptree = DecisionTreeClassifier(criterion='entropy')
ptree.fit(X,y)

In [None]:
fig, ax = plt.subplots(figsize=(15, 15))
tree.plot_tree(ptree, feature_names=f_names,  
                   class_names=species_names,
                   filled=True) 
None

In [None]:
ptree.get_n_leaves()