# Feature importance and ensembles of decision trees

In this notebook, we see two different ways that several trees can be combined into a single classifier. We compare the performance of these to that of a single tree, and see that this can ameliorate several of the drawbacks of a single decision tree.

As always, we start by loading the relevant modules:

In [None]:
%matplotlib inline 

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import mglearn # for visualizations

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

import graphviz
from sklearn.tree import export_graphviz
import pydotplus

from sklearn.datasets import load_breast_cancer

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

### The dataset: breast cancer

Here we load the dataset, and explore it a little bit. We then split it into a training and testing set.

Let's also take a quick look at the data. We will look at it as a Pandas dataframe to get a nice printout :-) 

The data is split into a test and training set:

### A single decision tree

We first train a single decision tree on the data:

### Feature importance

Aggregated over the whole tree, how much is each feature used to predict the label? 

Let's plot this in a more useful manner:

We see that the tree rates "worst radius" as the most important feature. How do you interpret this result?

### Random forest 

One way to improve decision trees is just to train a lot of them! This requires a few extra parameters:
- n_estimators: determines how many trees are trained
- max_features: how many features should each tree choose between at each split

Compare this to the feature importance we obtained for a single tree!

### Gradient boosted trees

Instead of just blindly building many trees, one could use each tree to improve the next. This requires yet another set of parameters:
- n_estimators: how many trees to train
- max_depth: How deep each tree should be (usually small for Boosted trees)
- learning_rate: How strongly does each tree learn from the previous one?