# Ensemble Learning

_________________________________
_________________________________

# Ensemble learning?

#### Ensemble learning is a potent technique to improve the performance of machine learning model.

An ensemble is the art of combining a diverse set of learners (individual models) together to improvise on the stability and predictive power of the model.

Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models.

### Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking).

#### <a href="https://www.datacamp.com/community/tutorials/ensemble-learning-python" target="_blank">Let's take a real example to build the intuition.</a>

Suppose, you want to invest in a company XYZ. You are not sure about its performance though. So, you look for advice on whether the stock price will increase by more than 6% per annum or not? You decide to approach various experts having diverse domain experience:

* **Employee of Company XYZ:** In the past, he has been right 70% times.

* **Financial Advisor of Company XYZ:** In the past, he has been right 75% times.

* **Stock Market Trader:** In the past, he has been right 70% times.

* **Employee of a competitor:** In the past, he has been right 60% of times.

* **Market Research team in the same segment:** In the past, they have been right 75% of times.

* **Social Media Expert:** In the past, he has been right 65% of times.

Given the broad spectrum of access you have, you can probably combine all the information and make an informed decision.

In a scenario when all the 6 experts/teams verify that it’s a good decision(assuming all the predictions are independent of each other), we will get a combined accuracy rate of

$$ Combined-Accuracy-Rate = 1 - (30 \% \times 25 \% \times 30 \% \times 40 \% \times 25 \% \times 35 \%) $$

$$ Combined-Accuracy-Rate = 1 - 0.07875 = 99.92125 \% $$

In the above example, the way we combine all the predictions together will be termed as Ensemble Learning.

**Assumption:** The assumption used here that all the predictions are completely independent is slightly extreme as they are expected to be correlated.

Let us now change the scenario slightly. This time we have 6 experts, all of them are employee of company XYZ working in the same division. Everyone has a propensity of 70% to advocate correctly.

What if we combine all these advice together, can we still raise up our confidence to >99% ?


The following diagram presents a basic Ensemble structure:

<img src="images/fig-ensemble-structure.png" />

# Error in Ensemble Learning (Variance vs. Bias)


The error emerging from any machine model can be broken down into three components mathematically. Following are these component:


<h3 align="center">Bias + Variance + Irreducible error</h3>  


* **Bias error** is useful to quantify how much on an average are the predicted values different from the actual value. A high bias error means we have an **under-performing** model which keeps on missing essential trends.

* **Variance** on the other side quantifies how are the prediction made on the same observation different from each other. A high variance model will **over-fit** on your training population and perform poorly on any observation beyond training. 

Following diagram will give you more clarity (Assume that red spot is the real value, and blue dots are predictions):

<img src="images/fig-ensemble-bias-variance.png" />


Normally, as we increase the complexity of our model, we will see a reduction in error due to lower bias in the model. However, this only happens till a particular point. As we continue to make our model more complex, you end up over-fitting your model and hence your model will start suffering from high variance.

A champion model should maintain a balance between these two types of errors. This is known as the trade-off management of bias-variance errors. Ensemble learning is one way to execute this trade off analysis.


<img src="images/model_complexity.png" />

# Ensemble methods can be divided into two groups:

* **sequential ensemble methods** where the base learners are generated sequentially (e.g. AdaBoost).

The basic motivation of sequential methods is to exploit the dependence between the base learners. The overall performance can be boosted by weighing previously mislabeled examples with higher weight.

* **parallel ensemble methods** where the base learners are generated in parallel (e.g. Random Forest).

The basic motivation of parallel methods is to exploit independence between the base learners since the error can be reduced dramatically by averaging.


Most ensemble methods use a single base learning algorithm to produce homogeneous base learners, i.e. learners of the same type, leading to **homogeneous ensembles**.

There are also some methods that use heterogeneous learners, i.e. learners of different types, leading to **heterogeneous ensembles**. In order for ensemble methods to be more accurate than any of its individual members, the base learners have to be as accurate as possible and as diverse as possible.

# Different types of Ensemble learning methods:



### 1. Bagging 
Bagging (also known as **B**ootstrap **Agg**regation) often considers homogeneous weak learners, learns them independently from each other in parallel and combines them following some kind of deterministic averaging process.

Bootstrap is a sampling technique in which we select “n” observations out of a population of “n” observations. But the selection is entirely random, i.e., each observation can be chosen from the original population so that each observation is equally likely to be selected in each iteration of the bootstrapping process. After the bootstrapped samples are formed, separate models are trained with the bootstrapped samples. In real experiments, the bootstrapped samples are drawn from the training set, and the sub-models are tested using the testing set. The final output prediction is combined across the projections of all the sub-models.

### 2. Boosting 
Boosting often considers homogeneous weak learners, learns them sequentially in a very adaptative way (a base model depends on the previous ones) and combines them following a deterministic strategy.

Boosting is a form of sequential learning technique. The algorithm works by training a model with the entire training set, and subsequent models are constructed by fitting the residual error values of the initial model. In this way, Boosting attempts to give higher weight to those observations that were poorly estimated by the previous model. Once the sequence of the models are created the predictions made by models are weighted by their accuracy scores and the results are combined to create a final estimation. Models that are typically used in Boosting technique are XGBoost (Extreme Gradient Boosting), GBM (Gradient Boosting Machine), ADABoost (Adaptive Boosting), etc.

### 3. Stacking
Stacking often considers heterogeneous weak learners, learns them in parallel and combines them by training a meta-model to output a prediction based on the different weak models predictions.

----------------
The three most popular methods for combining the predictions from different models are:

* Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset.
* Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain.
* Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.

# 1. Bagging Algorithms
Bootstrap Aggregation or bagging involves taking multiple samples from your training dataset (with replacement) and training a model for each sample.

The final output prediction is averaged across the predictions of all of the sub-models.

The three bagging models covered in this section are as follows:

- Bagged Decision Trees
- Random Forest
- Extra Trees

## 1.1. Bagged Decision Trees (Bagging Algorithms)

Bagging performs best with algorithms that have high variance. A popular example are decision trees, often constructed without pruning.

In the example below see an example of using the BaggingClassifier with the Classification and Regression Trees algorithm (DecisionTreeClassifier). A total of 100 trees are created.

In [2]:
# Bagged Decision Trees for Classification
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

url = "datasets/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

dataframe = pd.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)

cart = DecisionTreeClassifier()

num_trees = 100
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed)

results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.770745044429255


## 1.2. Random Forest (Bagging Algorithms)
Random forest is an extension of bagged decision trees.

Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers. Specifically, rather than greedily choosing the best split point in the construction of the tree, only a random subset of features are considered for each split.

You can construct a Random Forest model for classification using the RandomForestClassifier class.

The example below provides an example of Random Forest for classification with 100 trees and split points chosen from a random selection of 3 features.

In [3]:
# Random Forest Classification
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier

url = "datasets/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

dataframe = pd.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
num_trees = 100
max_features = 3
kfold = model_selection.KFold(n_splits=10, random_state=seed)

model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7746411483253588


# 1.3. Extra Trees (Bagging Algorithms)
Extra Trees are another modification of bagging where random trees are constructed from samples of the training dataset.

You can construct an Extra Trees model for classification using the ExtraTreesClassifier class.

The example below provides a demonstration of extra trees with the number of trees set to 100 and splits chosen from 7 random features.

In [23]:
# Extra Trees Classification
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import ExtraTreesClassifier

url = "datasets/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

dataframe = pd.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
num_trees = 100
max_features = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)

model = ExtraTreesClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.766866028708134


# 2.Boosting Algorithms
Boosting ensemble algorithms creates a sequence of models that attempt to correct the mistakes of the models before them in the sequence.

Once created, the models make predictions which may be weighted by their demonstrated accuracy and the results are combined to create a final output prediction.

The two most common boosting ensemble machine learning algorithms are:

- AdaBoost
- Stochastic Gradient Boosting

# 2.1. AdaBoost (Boosting Algorithms)

AdaBoost was perhaps the first successful boosting ensemble algorithm. It generally works by weighting instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay or or less attention to them in the construction of subsequent models.

You can construct an AdaBoost model for classification using the AdaBoostClassifier class.

The example below demonstrates the construction of 30 decision trees in sequence using the AdaBoost algorithm.

In [4]:
# AdaBoost Classification
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier

url = "datasets/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

dataframe = pd.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
num_trees = 30
kfold = model_selection.KFold(n_splits=10, random_state=seed)

model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.760457963089542


# 2.2. Stochastic Gradient Boosting (Boosting Algorithms)

Stochastic Gradient Boosting (also called Gradient Boosting Machines) are one of the most sophisticated ensemble techniques. It is also a technique that is proving to be perhaps of the the best techniques available for improving performance via ensembles.

You can construct a Gradient Boosting model for classification using the GradientBoostingClassifier class.

The example below demonstrates Stochastic Gradient Boosting for classification with 100 trees.

In [5]:
# Stochastic Gradient Boosting Classification
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import GradientBoostingClassifier

url = "datasets/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pd.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
num_trees = 100
kfold = model_selection.KFold(n_splits=10, random_state=seed)

model = GradientBoostingClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7681989063568012


# 3. Voting Ensemble
Voting is one of the simplest ways of combining the predictions from multiple machine learning algorithms.

It works by first creating two or more standalone models from your training dataset. A Voting Classifier can then be used to wrap your models and average the predictions of the sub-models when asked to make predictions for new data.

The predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked generalization) and is currently not provided in scikit-learn.

You can create a voting ensemble model for classification using the VotingClassifier class.

The code below provides an example of combining the predictions of logistic regression, classification and regression trees and support vector machines together for a classification problem.

In [6]:
# Voting Ensemble for Classification
import pandas as pd
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

url = "datasets/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

dataframe = pd.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)

# create the sub models
estimators = []
model1 = LogisticRegression()
estimators.append(('logistic', model1))
model2 = DecisionTreeClassifier()
estimators.append(('cart', model2))
model3 = SVC()
estimators.append(('svm', model3))

# create the ensemble model
ensemble = VotingClassifier(estimators)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())



0.725205058099795




# A case study in Python

The dataset you are going to be using for this case study is popularly known as the Wisconsin Breast Cancer dataset. The task related to it is Classification.

The dataset contains a total number of 10 features labeled in either benign or malignant classes. The features have 699 instances out of which 16 feature values are missing. The dataset only contains numeric values.


In [7]:
# Let's first import all the Python dependencies you will be needing for this case study.

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler

In [8]:
# Let's load the dataset in a DataFrame object.

data = pd.read_csv('breast-cancer-wisconsin.csv')
data.head()

Unnamed: 0,Sample code number,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Bland Chromatin,Normal Nucleoli,Mitoses,Class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2


In [9]:
# The column "Sample code number" is just an indicator and it's of no use in the modeling. So, let's drop it:

data.drop(['Sample code number'],axis = 1, inplace = True)
data.head()

Unnamed: 0,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Bland Chromatin,Normal Nucleoli,Mitoses,Class
0,5,1,1,1,2,1,3,1,1,2
1,5,4,4,5,7,10,3,2,1,2
2,3,1,1,1,2,2,3,1,1,2
3,6,8,8,1,3,4,3,7,1,2
4,4,1,1,3,2,1,3,1,1,2


In [10]:
# You can see that the column is dropped now. Let's get some statistics about the data 
# using Panda's describe() and info() functions:

data.describe()

Unnamed: 0,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bland Chromatin,Normal Nucleoli,Mitoses,Class
count,699.0,699.0,699.0,699.0,699.0,699.0,699.0,699.0,699.0
mean,4.41774,3.134478,3.207439,2.806867,3.216023,3.437768,2.866953,1.589413,2.689557
std,2.815741,3.051459,2.971913,2.855379,2.2143,2.438364,3.053634,1.715078,0.951273
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0
25%,2.0,1.0,1.0,1.0,2.0,2.0,1.0,1.0,2.0
50%,4.0,1.0,1.0,1.0,2.0,3.0,1.0,1.0,2.0
75%,6.0,5.0,5.0,4.0,4.0,5.0,4.0,1.0,4.0
max,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,4.0


In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 10 columns):
Clump Thickness                699 non-null int64
Uniformity of Cell Size        699 non-null int64
Uniformity of Cell Shape       699 non-null int64
Marginal Adhesion              699 non-null int64
Single Epithelial Cell Size    699 non-null int64
Bare Nuclei                    699 non-null object
Bland Chromatin                699 non-null int64
Normal Nucleoli                699 non-null int64
Mitoses                        699 non-null int64
Class                          699 non-null int64
dtypes: int64(9), object(1)
memory usage: 54.7+ KB


In [12]:
# The dataset contains missing values. The column named "Bare Nuclei" contains them. Let's verify.

data['Bare Nuclei']

0       1
1      10
2       2
3       4
4       1
       ..
694     2
695     1
696     3
697     4
698     5
Name: Bare Nuclei, Length: 699, dtype: object

In [13]:
# You can spot some "?"s in it, right? Well, these are your missing values, and 
# you will be imputing them with Mean Imputation. But first, you will replace those "?"s with 0's.

data.replace('?',np.NaN, inplace=True)

In [14]:
data[data['Bare Nuclei'].isnull()].head()

Unnamed: 0,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Bland Chromatin,Normal Nucleoli,Mitoses,Class
23,8,4,5,1,2,,7,3,1,4
40,6,6,6,9,6,,7,8,1,2
139,1,1,1,1,1,,2,1,1,2
145,1,1,3,1,2,,2,1,1,2
158,1,1,2,1,3,,1,1,1,2


In [15]:
# The "?"s are replaced with 0's now. Let's do the missing value treatment now.

# Convert the DataFrame object into NumPy array otherwise you will not be able to impute
# values = data.values

# # Now impute it
# imputer = Imputer()
# imputedData = imputer.fit_transform(values)

# -----------------------------

imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean',verbose=0)

imputer = imputer.fit(data)

imputedData = imputer.transform(data)

In [16]:
# Now if you take a look at the dataset itself, you will see that all the ranges of the features of 
# the dataset are not the same. This may cause a problem. A small change in a feature might not affect the other. 
# To address this problem, you will normalize the ranges of the features to a uniform range, in this case, 0 - 1.

scaler = MinMaxScaler(feature_range=(0, 1))
normalizedData = scaler.fit_transform(imputedData)

In [17]:
normalizedData

array([[0.44444444, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.44444444, 0.33333333, 0.33333333, ..., 0.11111111, 0.        ,
        0.        ],
       [0.22222222, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.44444444, 1.        , 1.        , ..., 1.        , 0.11111111,
        1.        ],
       [0.33333333, 0.77777778, 0.55555556, ..., 0.55555556, 0.        ,
        1.        ],
       [0.33333333, 0.77777778, 0.77777778, ..., 0.33333333, 0.        ,
        1.        ]])

You have performed all the preprocessing that was required in order to perform your Ensembling experiments.

You will start with Bagging based Ensembling. In this case, you will use a Bagged Decision Tree.

In [18]:
# Bagged Decision Trees for Classification - necessary dependencies

from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

In [19]:
# You have imported the dependencies for the Bagged Decision Trees.

# Segregate the features from the labels
X = normalizedData[:,0:9]
Y = normalizedData[:,9]

In [20]:
X

array([[0.44444444, 0.        , 0.        , ..., 0.22222222, 0.        ,
        0.        ],
       [0.44444444, 0.33333333, 0.33333333, ..., 0.22222222, 0.11111111,
        0.        ],
       [0.22222222, 0.        , 0.        , ..., 0.22222222, 0.        ,
        0.        ],
       ...,
       [0.44444444, 1.        , 1.        , ..., 0.77777778, 1.        ,
        0.11111111],
       [0.33333333, 0.77777778, 0.55555556, ..., 1.        , 0.55555556,
        0.        ],
       [0.33333333, 0.77777778, 0.77777778, ..., 1.        , 0.33333333,
        0.        ]])

In [21]:
kfold = model_selection.KFold(n_splits=10, random_state=7)
cart = DecisionTreeClassifier()
num_trees = 100
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=7)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.9628571428571429


Let's see what you did in the above cell.

First, you initialized a 10-fold cross-validation fold. After that, you instantiated a Decision Tree Classifier with 100 trees and wrapped it in a Bagging-based Ensemble. Then you evaluated your model.

You model performed pretty well. It yielded an accuracy of 95.71%.

Brilliant! Let's implement the other ones.

(If you want a quick refresher on cross-validation then this is the <a href="https://www.youtube.com/watch?v=CRqLeHpACVI">link</a> to go for.)

In [22]:
# AdaBoost Classification

from sklearn.ensemble import AdaBoostClassifier
seed = 7
num_trees = 70
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.9599792960662527


In this case, you did an AdaBoost classification (with 70 trees) which is based on Boosting type of Ensembling. The model gave you an accuracy of 95.57% for 10-fold cross-validation.

Finally, it's time for you to implement the Voting-based Ensemble technique.

In [17]:
# Voting Ensemble for Classification

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import cross_val_score

kfold = model_selection.KFold(n_splits=10, random_state=seed)

estimators = []

# create the sub models
model1 = LogisticRegression(solver='lbfgs')
estimators.append(('logistic', model1))

model2 = DecisionTreeClassifier()
estimators.append(('cart', model2))

model3 = SVC()
estimators.append(('svm', model3))

# create the ensemble model
ensemble = VotingClassifier(estimators)

# results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold) # Evaluate a score by cross-validation
results = cross_val_score(ensemble, X, Y, cv=kfold) # Evaluate a score by cross-validation
print(results.mean())



0.9642857142857142




You implemented a Voting based Ensemble model where you took Logistic Regression, Decision Tree and Support Vector Machine for voting purpose. The model performed the best so far with an accuracy of 96.42% for 10-fold cross-validation.

Now, let's get you familiarized with some common pitfalls of Ensemble learning.

# Pitfalls of Ensemble learning

In general, it is not true that it will always perform better. There are several ensemble methods, each with its own advantages/weaknesses. Which one to use and then depends on the problem at hand.

For example, if you have models with high variance (they over-fit your data), then you are likely to benefit from using bagging. If you have biased models, it is better to combine them with Boosting. There are also different strategies to form ensembles. The topic is just too broad to cover it in one answer.

But the point is: if you use the wrong ensemble method for your setting, you are not going to do better. For example, using Bagging with a biased model is not going to help.

Also, if you need to work in a probabilistic setting, ensemble methods may not work either. It is known that Boosting (in its most popular forms like AdaBoost) delivers poor probability estimates. That is, if you would like to have a model that allows you to reason about your data, not only classification, you might be better off with a graphical model.

So, in this post, you got introduced to Ensemble learning technique. You covered its basics, how it improves your model's performance. You covered its three main types.

Also, you implemented these three types in Python with the help of scikit-learn, and in this course of action, you gained a bit of knowledge about the necessary preprocessing steps.

That's quite a feat! Well done! In this final section, I suggest some further undertakings on Ensembles which you might want to consider.

**Take it further:**
* Try other Boosting-based Ensemble techniques viz. Gradient Boosting, XGBoost, etc.
* Play with the different parameter settings that scikit-learn offers in Ensembles and then try to find why a particular setting performed well. This will make your understanding even stronger. link
* Try Ensemble learning on a variety of datasets to understand where you should and where you should not apply Ensemble learning. For finding datasets Kaggle, UCI Repository, etc. are good places to search.

# Gradient Boosting

In [18]:
'''
The following code is for Gradient Boosting
Created by - ANALYTICS VIDHYA
'''

# importing required libraries
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')

# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)

# Now, we need to predict the missing target variable in the test data
# target variable - Survived

# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']

# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']

'''
Create the object of the GradientBoosting Classifier model
You can also add other parameters and test your code here
Some parameters are : learning_rate, n_estimators
Documentation of sklearn GradientBoosting Classifier: 

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
'''
model = GradientBoostingClassifier(n_estimators=100,max_depth=5)

# fit the model with the training data
model.fit(train_x,train_y)

# predict the target on the train dataset
predict_train = model.predict(train_x)
print('\nTarget on train data',predict_train) 

# Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)

# predict the target on the test dataset
predict_test = model.predict(test_x)
print('\nTarget on test data',predict_test) 

# Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('\naccuracy_score on test dataset : ', accuracy_test)

Shape of training data : (712, 25)
Shape of testing data : (179, 25)

Target on train data [0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0
 1 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0
 0 1 1 0 0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 0 0 0
 0 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0
 0 0 0 1 1 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1
 0 1 1 1 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 0
 0 1 1 0 1 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0
 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1
 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1
 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 0
 0 1 0 0 

# Refrences:

* https://www.datacamp.com/community/tutorials/ensemble-learning-python

* https://www.analyticsvidhya.com/blog/2015/08/introduction-ensemble-learning/

* https://blog.statsbot.co/ensemble-learning-d1dcd548e936

* https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/