# Improve Performance with Ensembles
<span class="mark">Ensembles</span> can give you a <span class="girk">boost in accuracy</span> on your dataset. In this chapter you will discover how you can create some of the most powerful types of ensembles in Python using scikit-learn. This lesson will step you through Boosting, Bagging and Majority Voting and show you how you can continue to ratchet up the accuracy of the models on your own datasets. After completing
this lesson you will know:

1. How to use <span class="mark">bagging</span> ensemble methods such as <span class="girk">bagged decision trees, random forest and extra trees</span>.
2. How to use <span class="mark">boosting</span> ensemble methods such as <span class="girk">AdaBoost and stochastic gradient boosting</span>.
3. How to use <span class="mark">voting ensemble</span> methods to <span class="mark">combine</span> the predictions from <span class="girk">multiple algorithms</span>.

# Combine Models Into Ensemble Predictions

- <span class="mark">Bagging</span>: Building <span class="mark"><span class="mark">multiple models</span></span> (typically of the <span class="girk">same type</span>) from <span class="mark">different subsamples of the training dataset</span>.
- <span class="mark">Boosting</span>: Building <span class="burk">multiple models (typically of the same type</span>) each of which <span class="girk">learns to fix the prediction errors of a prior model</span> in the sequence of models.
- <span class="mark">Voting</span>: Building multiple models (typically of <span class="mark">differing types</span>) and simple statistics (like calculating the mean) are used to <span class="girk">combine predictions</span>.



## Bagged Decision Trees
<span class="mark">Bagging performs best with algorithms that have high variance</span>. A popular <span class="burk">example are decision trees</span>, often constructed without pruning. In the example below is an example of using the BaggingClassifier with the Classification and Regression Trees algorithm (DecisionTreeClassifier1). A total of 100 trees are created.

In [2]:
# Pima Indians Diabetes Dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

#Loading dataset
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv('pima-indians-diabetes.data',names=names)

# separate array into input and output components
X = df.drop('class',axis='columns')
Y = df['class']

In [3]:

kfold = KFold(n_splits=10, random_state=7)
cart = DecisionTreeClassifier()

model = BaggingClassifier(base_estimator=cart, n_estimators=100, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.770745044429


## Random Forest
Random Forests is an extension of bagged decision trees.

<span class="mark">Samples of the training dataset are taken with replacement,</span> but the <span class="mark">trees are constructed</span> in a way that <span class="girk">reduces the correlation
between individual classifiers</span>.

<span class="burk">Specifically, rather than greedily</span> choosing the best split point in the construction of each tree, only a <span class="girk">random subset of features are considered for each split</span>.

In [4]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, max_features=3)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.757775119617


## Extra Trees
Extra Trees are another modification of bagging where random trees are constructed from samples of the training dataset.

In [5]:
from sklearn.ensemble import ExtraTreesClassifier

model = ExtraTreesClassifier(n_estimators=100, max_features=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.762952836637


# Boosting Algorithms
Boosting ensemble algorithms creates a sequence of models that <span class="girk">attempt to correct the mistakes of the models before them</span> in the sequence. Once created, the <span class="mark">models make predictions</span> which may be <span class="girk">weighted by their demonstrated accuracy</span> and the <span class="mark">results are combined to create a final output prediction</span>. The two most common boosting ensemble machine learning algorithms are:

- AdaBoost.
- Stochastic Gradient Boosting.

## AdaBoost
AdaBoost was perhaps the first successful boosting ensemble algorithm. It generally works by <span class="mark">weighting instances in the dataset</span> by how easy or difficult they are to classify, allowing the algorithm to pay more or less attention to them in the construction of subsequent models.

In [7]:
from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(n_estimators=30, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.76045796309


## Stochastic Gradient Boosting
Stochastic Gradient Boosting (also called Gradient Boosting Machines) are one of the most sophisticated ensemble techniques. It is also a technique that is proving to be perhaps one of the best techniques available for improving performance via ensembles.

In [8]:
from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(n_estimators=100, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.766900205058


## Voting Ensemble
Voting is one of the simplest ways of combining the predictions from multiple machine learning algorithms. It works by first <span class="mark">creating two or more standalone models from your training</span> dataset. A <span class="mark">Voting Classifier</span> can then be <span class="girk">used to wrap your models and average the predictions</span> of the sub-models when asked to make predictions for new data. The predictions of the sub-models can
be weighted, but <span class="mark">specifying the weights for classifiers manually or even heuristically is difficult.</span> More advanced methods can learn how to best weight the predictions from sub-models, but this is called <span class="burk">stacking (stacked aggregation)</span> and is currently not provided in scikit-learn.

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

estimators = []

estimators.append(('log',LogisticRegression()))
estimators.append(('cart',DecisionTreeClassifier()))
estimators.append(('svm',SVC()))


ensemble = VotingClassifier(estimators)
results = cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())

0.734278879016
