# Improve Performance with Ensemble Learning

**Ensembles** are **methods combining the predictions from different models**. 

We will go through **Bagging**, **Boosting** and **Majority Voting** in sklearn and show how you can continue to increase the accuracy of the models on your own datasets. 

The goal is to learn:
1. How to use bagging ensemble methods such as bagged decision trees, random forest and extra trees
2. How to use boosting ensemble methods such as AdaBoost and stochastic gradient boosting
3. How to use voting ensemble methods to combine the predictions from multiple algorithms.

## Combine Models Into Ensemble Predictions

The three most popular methods for combining the predictions from different models are:

* ***Bagging***. Building multiple models (typically of the same type) **from different subsamples of the training dataset**.
* ***Boosting***. Building multiple models (typically of the same type) **each of which learns to fix the prediction errors of a prior model in the sequence of models**.
* ***Voting***. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.

*DISCLAIMER: The following assumes you are generally familiar with ML algorithms and ensemble methods and/or may want to ignore the details for the time being, and this part will not go into the details of how the algorithms work or their parameters.*

We will use the Pima indians diabetes dataset to demonstrate each algorithm. Each ensemble algorithm is demonstrated using 10-fold cross-validation and the classification accuracy performance metric.

## 0. Import the data

In [0]:
import pandas as pd

url = 'https://raw.githubusercontent.com/dbonacorsi/AML_basic_AA1920/master/datasets/pima-indians-diabetes.data.csv'

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = pd.read_csv(url, names=names)
data

# BAGGING Algorithms

Bootstrap Aggregation (or Bagging) involves taking multiple samples from your training dataset (with replacement) and training a model for each sample. The final output prediction is averaged across the predictions of all of the sub-models. 

The three bagging models we will cover here are as follows:
* Bagged Decision Trees
* Random Forest
* Extra Trees.

## Bagged Decision Trees

Bagging performs best with algorithms that have **high variance**. 


In the example below is an example of using the `BaggingClassifier` with the Classification and Regression Trees algorithm (`DecisionTreeClassifier` - more info [here](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)). A total of 100 trees are created.

In [0]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
#
from sklearn.ensemble import BaggingClassifier                     # <---
#
from sklearn.tree import DecisionTreeClassifier                    # <---

In [0]:
array = data.values
X = array[:,0:8]
Y = array[:,8]

Try a rough decision tree classifier.

In [0]:
seed = 7
kfold = KFold(n_splits=10, random_state=seed)
model = DecisionTreeClassifier(random_state=seed)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

Then try to do better with Bagging.

In [0]:
# Bagged Decision Trees for Classification
seed = 7
kfold = KFold(n_splits=10, random_state=seed)
cart = DecisionTreeClassifier(random_state=seed)
num_trees = 100
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed, bootstrap = False)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

*(NOTE: running the cell above should take some more time than usual..)*

Now, try to change one parameter above, `bootstrap = True` and rerun and see what happens..




Running the example in the latter way, we get a more robust estimate of model accuracy.

## Random Forest

Random Forests is **an extension of bagged decision trees**. 


You can construct a Random Forest model for classification using the RandomForestClassifier class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html). The example below demonstrates using Random Forest for classification with 100 trees and split points chosen from a random selection of 3 features.

In [0]:
from sklearn.ensemble import RandomForestClassifier                    # <---

In [0]:
# Random Forest Classification
num_trees = 100
max_features = 3
kfold = KFold(n_splits=10, random_state=7)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

*(NOTE: running the cell above should take some more time than usual..)*

Running the example provides a mean estimate of classification accuracy.

## Extra Trees

Extra Trees are **another modification of bagging** where random trees are constructed from samples of the training dataset. 

You can construct an Extra Trees model for classification using the ExtraTreesClassifier class (documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html)). 

The example below provides a demonstration of extra trees with the number of trees set to 100 and splits chosen from 7 random features.

In [0]:
from sklearn.ensemble import ExtraTreesClassifier                    # <---

In [0]:
# Extra Trees Classification
num_trees = 100
max_features = 7
kfold = KFold(n_splits=10, random_state=7)
model = ExtraTreesClassifier(n_estimators=num_trees, max_features=max_features)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

*(NOTE: running the cell above should take some more time than usual..)*

Running the example provides a mean estimate of classification accuracy.

# BOOSTING Algorithms

Boosting ensemble algorithms **create a sequence of models that attempt to correct the mistakes of the models before them in the sequence**. Once created, the models make predictions which may be weighted by their demonstrated accuracy and the results are combined to create a final output prediction.

The 2 most common boosting ensemble ML algos are:
* AdaBoost
* Stochastic Gradient Boosting.

## AdaBoost

AdaBoost was perhaps the first successful boosting ensemble algorithm. 

It generally works by weighting instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay less attention to them in the construction of subsequent models. 

You can construct an AdaBoost model for classification using the AdaBoostClassifier class (documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)).

The example below demonstrates the construction of 30 decision trees in sequence using the AdaBoost algorithm.

In [0]:
from sklearn.ensemble import AdaBoostClassifier                    # <---

In [0]:
# AdaBoost Classification
num_trees = 30
seed=7
kfold = KFold(n_splits=10, random_state=seed)
model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

Running the example provides a mean estimate of classification accuracy.

## Stochastic Gradient Boosting

Stochastic Gradient Boosting (also called Gradient Boosting Machines) are **one of the most sophisticated ensemble techniques**. It is also a technique that is proving to be perhaps one of the best techniques available for improving performance via ensembles. 

You can construct a Gradient Boosting model for classification using the `GradientBoostingClassifier` class (documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)). 

The example below demonstrates Stochastic Gradient Boosting for classification with 100 trees.

In [0]:
from sklearn.ensemble import GradientBoostingClassifier                    # <---

In [0]:
# Stochastic Gradient Boosting Classification
seed = 7
num_trees = 100
kfold = KFold(n_splits=10, random_state=seed)
model = GradientBoostingClassifier(n_estimators=num_trees, random_state=seed)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

Running the example provides a mean estimate of classification accuracy.

# VOTING Algorithms

Voting is **one of the simplest ways of combining the predictions from multiple ML algorithms**. 


You can create a voting ensemble model for classification using the `VotingClassifier` class (documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html)). 

The code below provides an example of combining the predictions of logistic regression, classification and regression trees and support vector machines together for a classification problem.


In [0]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier                    # <---

In [0]:
# Voting Ensemble for Classification

kfold = KFold(n_splits=10, random_state=7)

# create the sub models
estimators = []
model1 = LogisticRegression()
estimators.append(( 'logistic' , model1))
model2 = DecisionTreeClassifier()
estimators.append(( 'cart' , model2))
model3 = SVC()
estimators.append(( 'svm' , model3))

# create the ensemble model
ensemble = VotingClassifier(estimators)
results = cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())

Running the example provides a mean estimate of classification accuracy.

## Summary

What we did:

* we discovered ensemble ML algorithms for improving the performance of models on your problems. You learned about
Bagging Ensembles including Bagged Decision Trees, Random Forest and Extra Trees, Boosting Ensembles including AdaBoost and Stochastic Gradient Boosting, Voting Ensembles for averaging the predictions for any arbitrary models.