## Chapter 14
# Trees and Forests

The basis of tree-based learners is the decision tree wherein a series of decision rules are chained. 
* https://www.stat.berkeley.edu/~breiman/RandomForests/

## 14.1 Training a Decision Tree Classifier

In [8]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import datasets

iris = datasets.load_iris()
features = iris.data
targets = iris.target

decisiontree = DecisionTreeClassifier(random_state=0)
model = decisiontree.fit(features, targets)

observation = [[5, 4, 3, 2]]
print("the prediction for {} is {}".format(observation, model.predict(observation)))
print("predicted probabilities for the three classes: {}".format(model.predict(observation)))

the prediction for [[5, 4, 3, 2]] is [1]
predicted probabilities for the three classes: [1]


### Discussion
Decision tree learners attempt to find a decision rule that produces the greatest decrease in impurity at a node. While there are a number of measurements of impurity, by default `DecisionTreeClassifier` uses Gini impurity:
$$
G(t) = 1 - \sum_{i=1}^c{p_i^2}
$$
where G(t) is the Gini impurity at node t and $p_i$ is the proportion of observations of class c at node t.

This process of finding the decision rules that create splits to increase impurity is repeated recursively untill all leaf nodes are pure (i.e. contain only one class) or some abritary cut-off is reached

We can change the `criterion` parameter to use a different impurity measurement

In [9]:
# create decision tree classifier using entropy
decisiontree_entropy = DecisionTreeClassifier(criterion='entropy', random_state=0)

model_entropy = decisiontree_entropy.fit(features, targets)

### See Also
* Decision Tree Learning, Princeton (https://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/mitchell-dectrees.pdf)

## Training a Decision Tree Regressor

In [12]:
from sklearn.tree import DecisionTreeRegressor
from sklearn import datasets

boston = datasets.load_boston()
features = boston.data[:,0:2]
target = boston.target

decisiontree = DecisionTreeRegressor(random_state=0)
model = decisiontree.fit(features, target)

observation = [[0.02, 16]]
model.predict(observation)

array([33.])

### Discussion
Decision tree regression works similarly to decision tree classification, however instead of reducing Gini impurity or entropy, potential splits are by default measure on how much they reduce mean squared error (MSE):
$$
MSE = \frac{1}{n} \sum_{i=1}^{n}{(y_i - \hat y_i)^2}
$$

where $y_i$ is the true value of the target and $\hat y_i$ is the predicted value.

We can use the `criterion` parameter to select the desired measurement of split quality. For example we can construct a tree whose splits reduce mean absolute error:

In [13]:
decisiontree_mae = DecisionTreeRegressor(criterion="mae", random_state=0)
model_mae = decisiontree_mae.fit(features, target)

### See Also
* Decision Tree Regression, scikit-learn (http://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html)
* http://saedsayad.com/decision_tree_reg.htm

## 14.3 Visualizing a Decision Tree Model

## 14.4 Training a Random Forest Classifier

## 14.5 Training a Random Forest Regressor

## 14.6 Identifying Important Features in Random Forests

## 14.7 Selecting Important Features in Random Forests

## 14.8 Handling Imabalnced Classes

## 14.9 Controlling Tree Size

## 14.10 Improving Performance Through Boosting

## 14.11 Evaluating Random Forests with Out-of-Bag Errors

In [None]:
t