### <b> Ensemble Learning </b>

### <b> Learning Objectives </b>
By the end of this lesson, you will be able to:
- Define ensemble learning
- List different types of ensemble methods
- Build an intuition
- Apply different algorithms of ensemble learning using use cases

### <b> What Is Ensemble Learning? </b>

Ensemble techniques combine individual models to improve the stability and predictive power of the model.

#### <b> Ideology Behind Ensemble Learning: </b>

* Certain models do well in modeling one aspect of the data, while others do well in modeling another.

* Instead of learning a single complex model, learn several simple models and combine their output to produce the final decision.

* Individual model variances and biases are balanced by the strength of other models in ensemble learning.

* Ensemble learning will provide a composite prediction where the final accuracy is better than the accuracy of individual models.

#### <b> Working of Ensemble Learning </b>

![Ensemble_Learning_Workflow](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/Lesson_07_Ensemble_Learning/Ensemble_Learning_Workflow.png)

#### <b> Significance of Ensemble Learning </b>

* Robustness
  - Ensemble models incorporate the predictions from all the base learners
* Accuracy
  - Ensemble models deliver accurate predictions and have improved performances

#### <b> Ensemble Learning Methods </b>

* Techniques for creating an ensemble model
* Combine all weak learners to form an ensemble, or create an ensemble of well-chosen strong and diverse models

#### <b> Steps Involved in Ensemble Methods </b>

Every ensemble algorithm consists of two steps:

* Producing a cohort of predictions using simple ML algorithms
* Combining the predictions into one aggregated model

The ensemble can be achieved through several techniques.

### <b> Types of Ensemble Methods </b>

#### <b> Averaging </b>


![Averaging](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/Lesson_07_Ensemble_Learning/Averaging.png)

#### <b> Weighted Averaging </b>

![Weighted_Averaging](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/Lesson_07_Ensemble_Learning/Weighted_Averaging.png)

### <b> Bagging Algorithms </b>

Bootstrap Aggregation or bagging involves taking multiple samples from your training dataset (with replacement) and training a model for each sample.

The final output prediction is averaged across the predictions of all of the submodels.

The three bagging models covered in this section are as follows:

 - Bagged Decision Trees
 - Random Forest
 - Extra Trees

#### <b> 1. Bagged Decision Trees </b>

Bagging performs best with algorithms that have a high variance. A popular example is decision trees, often constructed without pruning.

Below, you can see an example of using the BaggingClassifier with the Classification and Regression Trees algorithm (DecisionTreeClassifier). A total of 100 trees are created.


- Scikit-learn is a Python library that provides a consistent interface for machine learning and statistical modeling, including classification, regression, clustering, and dimensionality reduction.
- Pandas is a Python library for data manipulation and analysis.

In [19]:
# Bagged Decision Trees for Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
import warnings
warnings.filterwarnings('ignore')

In [2]:
# url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
# names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
# dataframe = pandas.read_csv(url, names=names)
# array = dataframe.values

In [20]:
url = "pima-indians-diabetes.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
array

array([[  6.   , 148.   ,  72.   , ...,   0.627,  50.   ,   1.   ],
       [  1.   ,  85.   ,  66.   , ...,   0.351,  31.   ,   0.   ],
       [  8.   , 183.   ,  64.   , ...,   0.672,  32.   ,   1.   ],
       ...,
       [  5.   , 121.   ,  72.   , ...,   0.245,  30.   ,   0.   ],
       [  1.   , 126.   ,  60.   , ...,   0.349,  47.   ,   1.   ],
       [  1.   ,  93.   ,  70.   , ...,   0.315,  23.   ,   0.   ]])

In [21]:
X = array[:,0:8]
Y = array[:,8]

In [22]:
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)

In [23]:
cart = DecisionTreeClassifier()

num_trees = 100
model = BaggingClassifier(base_estimator = cart, 
                          n_estimators = num_trees,
                          random_state = seed)

results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7578263841421736


In [28]:
lr = LogisticRegression()

num_trees = 1
model = BaggingClassifier(base_estimator = lr, 
                          n_estimators = num_trees,
                          random_state = seed)

results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7747778537252221


#### <b> 2. Random Forest </b> 

Random forest is an extension of bagged decision trees.

Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers. Specifically, rather than greedily choosing the best split point in the construction of the tree, only a random subset of features is considered for each split.

You can construct a Random Forest model for classification using the RandomForestClassifier class.

The example below provides a sample of Random Forest for classification with 100 trees and split points chosen from a random selection of three features.



In [29]:
#Random Forest Classification
from sklearn.ensemble import RandomForestClassifier

# url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
# names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
# dataframe = pandas.read_csv(url, names=names)
# array = dataframe.values

# X = array[:,0:8]
# Y = array[:,8]
# seed = 7


In [30]:
num_trees = 100
max_features = 3

kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)

results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7747436773752563


#### <b> 3. Extra Trees</b>

Extra Trees are another modification of bagging where random trees are constructed from samples of the training dataset.

You can construct an Extra Trees model for classification using the ExtraTreesClassifier class.

The example below provides a demonstration of extra trees with a tree set of 100 and splits chosen from seven random features.



In [9]:
#Extra Trees Classification
from sklearn.ensemble import ExtraTreesClassifier

# url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
# names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
# dataframe = pandas.read_csv(url, names=names)
# array = dataframe.values
# X = array[:,0:8]
# Y = array[:,8]


In [31]:
seed = 7

num_trees = 100
max_features = 7

kfold = model_selection.KFold(n_splits=10, 
                              random_state=seed, 
                              shuffle=True)

model = ExtraTreesClassifier(n_estimators=num_trees, 
                             max_features=max_features)

results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7643198906356801


###  <b> Boosting Algorithms </b>

Boosting ensemble algorithms create a sequence of models that attempts to correct the mistakes of the models before them in the sequence.

Once created, the models make predictions that may be weighted by their demonstrated accuracy, and the results are combined to create a final output prediction.


The two most common boosting ensemble machine learning algorithms are:

- AdaBoost

- Stochastic Gradient Boosting
<br>

#### <b> AdaBoost </b>

AdaBoost was the first successful boosting ensemble algorithm. It generally works by weighting instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay more or less attention to them in the construction of subsequent models.


![AdaBoost](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/Lesson_07_Ensemble_Learning/AdaBoost.png)

You can construct an AdaBoost model for classification using the AdaBoostClassifier class.

The example below demonstrates the construction of 30 decision trees in sequence using the AdaBoost algorithm.


In [32]:
#AdaBoost Classification
from sklearn.ensemble import AdaBoostClassifier

# url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
# names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

# dataframe = pandas.read_csv(url, names=names)
# array = dataframe.values
# X = array[:,0:8]
# Y = array[:,8]


In [33]:
seed = 7
num_trees = 30

kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
model = AdaBoostClassifier(n_estimators = num_trees, 
                           random_state=seed)

results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7552802460697198


#### <b> Stochastic Gradient Boosting </b>

One of the most advanced ensemble approaches is Stochastic Gradient Boosting (also known as Gradient Boosting Machines). It's also a strategy that's proven to be one of the most effective methods for boosting performance via ensemble.

#### <b> Steps of Gradient Boasting Machine </b>

![GBM_Steps](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/Lesson_07_Ensemble_Learning/GBM_Steps.PNG)

You can construct a Gradient Boosting model for classification using the **GradientBoostingClassifier** class.

The example below demonstrates Stochastic Gradient Boosting for classification with 100 trees.


In [34]:
#Stochastic Gradient Boosting Classification
from sklearn.ensemble import GradientBoostingClassifier

# url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
# names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
# dataframe = pandas.read_csv(url, names=names)

# array = dataframe.values
# X = array[:,0:8]
# Y = array[:,8]

In [35]:
seed = 7
num_trees = 100

kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
model = GradientBoostingClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)

print(results.mean())

0.7604921394395079


### <b>Voting Ensemble</b>

Voting is one of the simplest ways of combining the predictions from multiple machine learning algorithms.

It works by first creating two or more standalone models from your training dataset. A Voting Classifier can then be used to wrap your models and average the predictions of the submodels when asked to make predictions for new data.

The predictions of the submodels can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked generalization) and is currently not provided in scikit-learn.

You can create a voting ensemble model for classification using the **VotingClassifier** class.

The code below provides an example of combining the predictions of logistic regression, classification, and regression trees and support vector machines together for a classification problem.


In [37]:
#Voting Ensemble for Classification
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

# url = "pima-indians-diabetes.csv"
# names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

url = "pima-indians-diabetes.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
array

dataframe = pandas.read_csv(url, names=names)

array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
kfold = model_selection.KFold(n_splits=10)

#Create the sub models
estimators = []
model1 = LogisticRegression()
estimators.append(('logistic', model1))

model2 = DecisionTreeClassifier()
estimators.append(('cart', model2))

model3 = SVC()
estimators.append(('svm', model3))

model4 = RandomForestClassifier()
estimators.append(('rf', model4))

#Create the ensemble model
ensemble = VotingClassifier(estimators)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())

0.7681476418318524


**Note: In this lesson, we saw the use of the ensemble learning methods, and in the next lesson, we will be working on Recommender Systems.**

![Simplilearn_Logo](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Logo_Powered_By_Simplilearn/SL_Logo_1.png)