# Ensemble Learning

It uses multiple machine learning models or multiple set of models for the same algorithm which try to make a better prediction.

Ensemble Learning model works by training different models on the same dataset and makes prediction iindividually and once the prediction is made then these results are combines with some statistical methods to get final prediction

In one sentence we can explain like this there is a dataset where multiple algorithms are trained on the same dataset and then finally predictions are made based on the outcomes of the individual machine learning algorithms.

Let me explain this with an example of cricket team, in cricket team or any other team every few players are specialized in some fields(batting, fast bowling, fielding, keeping, … etc). In the same way every algorithm has its own feature set. There are multiple algorithms and they are specialized in some way so once we combine all of these algorithms it’s easy to get the final predictions.

<center><img src="https://machinelearningmastery.com/wp-content/uploads/2020/11/Bagging-Ensemble.png"/></center>

# Need of Ensemble

Every model has its own strength and weakness. If we combine multiple models it will help us to hide weakness of individual models sothat we can cover weakness of others.

It creates some errors, The error emerging from any machine model can be broken down into three components mathematically. Following are these component:

> Bias
> Variance
> Irreducible error

To understand these errors have a look at the following figure:

<center><img src="https://jason-chen-1992.weebly.com/uploads/1/0/8/5/108557741/bias-and-variance_orig.png"/></center>


Bias error is useful to quantify how much on an average are the predicted values different from the actual value.

Variance on the other side quantifies how are the prediction made on the same observation different from each other.

<center><img src="https://kgptalkie.com/wp-content/uploads/2020/08/image-126.png"/></center>

Now we will try to understand bias - variance trade off from the following figure.
By increasing model complexity, total error will decrease till some point and then it will start to increase. W need to select optimum model complexity to get less error.

For low complexity model : high bias and low variance
For high complexity model : low bias and high variance

If you are getting high bias then you have a fair chance to increase model complexity. And otherside it you are getting high variance, you need to decrease model complexity that’s how any machine learning algorithm works.


# Types of Ensemble Learning

Basic Ensemble Techniques
  - Max Voting

  - Averaging

  - Weighted Average

Advanced Ensemble Techniques
  - Stacking

  - Blending

  - Bagging

  - Boosting


## Algorithms based on Bagging
  - Bagging meta-estimator

  - Random Forest

## Boosting Algorithms
  - AdaBoost

  - GBM

  - XGB

  - Light GBM

  - CatBoost

## Max Voting

The max voting method is generally used for classification problems. In this technique, multiple models are used to make predictions for each data point.

## Averaging

Similar to the max voting technique, multiple predictions are made for each data point in averaging.

## Weighted Average

This is an extension of the averaging method. All models are assigned different weights defining the importance of each model for prediction.

## Bagging
Bagging is also known as Bootstrapping. It is a sampling technique in which we create subsets of observations from the original dataset, with replacement. The size of the subsets is the same as the size of the original set.

  - Combining predictions that belong to the same type.
  - Aim to decrease variance, not bias.
  - Different training data subsets are randomly drawn with replacement from the entire training dataset.

To explain bagging Random Forest(below figure) is the best example.

<center><img src="https://kgptalkie.com/wp-content/uploads/2020/08/image-127.png"/></center>

It creates multiple subsets like decision tree and it makes a prediction for each decision tree then if random forest is classifier it will take max voting otherwise if it is a regressor it will take avearge from each of these subset of the trees .

## Boosting
Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model.
Letâ€™s understand the way boosting works in the below steps.

- Combining predictions that belong to the different types.
- Aim to decrease bias, not variance.
- Models are weighted according to their performance.

Let’s now understand boosting from the following figure: 

<center><img src="https://kgptalkie.com/wp-content/uploads/2020/08/image-128.png"/></center>

At first we have our original dataset ,our first algorithm creates a plane there for that we have SVM classifier, Random Forest classifier, etc and it found out that there are some errors in the plane . To rectife that errors , we will train other model and after this again we will train other model which identifies errors.
Finally, we combine all three models together which perfectly classify our original dataset.

# Algorithms Implimentation in sklearn

- Bagging
  - Random Forest

- Boosting
  - XGBosst
  - AdaBoost
  - Gradient Boosting

**Random Forest** is another ensemble machine learning algorithm that follows the bagging technique

**XGBoost (extreme Gradient Boosting)** is an advanced implementation of the gradient boosting algorithm

**Adaptive boosting or AdaBoost** is one of the simplest boosting algorithms

**Gradient Boosting or GBM** is another ensemble machine learning algorithm that works for both regression and classification problems

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set()

In [11]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [3]:
cancer = datasets.load_breast_cancer()

In [5]:
print(cancer.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, f

In [6]:
X = cancer.data
y = cancer.target

In [7]:
X.shape, y.shape

((569, 30), (569,))

In [13]:
scaler=StandardScaler()
X_scaled=scaler.fit_transform(X)
X_scaled

array([[ 1.09706398, -2.07333501,  1.26993369, ...,  2.29607613,
         2.75062224,  1.93701461],
       [ 1.82982061, -0.35363241,  1.68595471, ...,  1.0870843 ,
        -0.24388967,  0.28118999],
       [ 1.57988811,  0.45618695,  1.56650313, ...,  1.95500035,
         1.152255  ,  0.20139121],
       ...,
       [ 0.70228425,  2.0455738 ,  0.67267578, ...,  0.41406869,
        -1.10454895, -0.31840916],
       [ 1.83834103,  2.33645719,  1.98252415, ...,  2.28998549,
         1.91908301,  2.21963528],
       [-1.80840125,  1.22179204, -1.81438851, ..., -1.74506282,
        -0.04813821, -0.75120669]])

In [14]:
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
import xgboost as xgb

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=1, stratify=y)

In [20]:
rfc=RandomForestClassifier(n_estimators=200, random_state=1)
adc=AdaBoostClassifier(n_estimators=200, random_state=1, learning_rate=0.01)
gbc=GradientBoostingClassifier(n_estimators=200, random_state=1, learning_rate=0.01)
xgb_clf=xgb.XGBClassifier(n_estimators=200, learning_rate=0.01, random_state=1)

In [21]:
rfc.fit(X_train, y_train)
abc.fit(X_train, y_train)
gbc.fit(X_train, y_train)
xgb_clf.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.01, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=200, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=1,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [22]:
print('Random Forest', rfc.score(X_test, y_test))
print('AdaBoost', abc.score(X_test, y_test))
print('Gradient Boost', gbc.score(X_test, y_test))
print('XGBoost', xgb_clf.score(X_test, y_test))

Random Forest 0.9473684210526315
AdaBoost 0.9473684210526315
Gradient Boost 0.9736842105263158
XGBoost 0.956140350877193
