# What Is Ensemble Learning?
Ensemble techniques combine individual models to improve the stability and predictive power of the model.

## Ideology Behind Ensemble Learning:
Certain models do well in modeling one aspect of the data, while others do well in modeling another.

Instead of learning a single complex model, learn several simple models and combine their output to produce the final decision.

Individual model variances and biases are balanced by the strength of other models in ensemble learning.

Ensemble learning will provide a composite prediction where the final accuracy is better than the accuracy of individual models.

## Significance of Ensemble Learning
    Robustness
    
        Ensemble models incorporate the predictions from all the base learners
    
    Accuracy
    
        Ensemble models deliver accurate predictions and have improved performances

### Ensemble Learning Methods
Techniques for creating an ensemble model

Combine all weak learners to form an ensemble, or create an ensemble of well-chosen strong and diverse models

### Steps Involved in Ensemble Methods
Every ensemble algorithm consists of two steps:
    
    Producing a cohort of predictions using simple ML algorithms
    
    Combining the predictions into one aggregated model

The ensemble can be achieved through several techniques.

### Types of Ensemble Methods

    Averaging

    Weighted Averaging

## Bagging Algorithms
Bootstrap Aggregation or bagging involves taking multiple samples from your training dataset (with replacement) and training a model for each sample.

The final output prediction is averaged across the predictions of all of the submodels.

The three bagging models covered in this section are as follows:
    
    Bagged Decision Trees
    
    Random Forest
    
    Extra Trees

1. Bagged Decision Trees

Bagging performs best with algorithms that have a high variance. A popular example is decision trees, often constructed without pruning.

Below, you can see an example of using the BaggingClassifier with the Classification and Regression Trees algorithm
(DecisionTreeClassifier). A total of 100 trees are created.


In [1]:
#Bagged Decision Trees for Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
cart = DecisionTreeClassifier()
num_trees = 100
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())





0.7578263841421736


2. Random Forest

Random forest is an extension of bagged decision trees.

Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between
individual classifiers. Specifically, rather than greedily choosing the best split point in the construction of the tree, only a random subset of
features is considered for each split.

You can construct a Random Forest model for classification using the RandomForestClassifier class.

The example below provides a sample of Random Forest for classification with 100 trees and split points chosen from a random selection
of three features.

In [2]:
#Random Forest Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 100
max_features = 3
kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())


0.7682330827067669


3. Extra Trees

Extra Trees are another modification of bagging where random trees are constructed from samples of the training dataset.

You can construct an Extra Trees model for classification using the ExtraTreesClassifier class.

The example below provides a demonstration of extra trees with a tree set of 100 and splits chosen from seven random features.

In [3]:
#Extra Trees Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import ExtraTreesClassifier
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 100
max_features = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
model = ExtraTreesClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())


0.7577922077922079
