#### Table of Contents
Introduction to Ensemble Learning
Basic Ensemble Techniques:
1. Max Voting
2. Averaging
3. Weighted Average

#### Advanced Ensemble Techniques
1. Stacking
2. Blending
3. Bagging
4. Boosting

#### Algorithms based on Bagging and Boosting
1. Bagging meta-estimator
2. Random Forest
3. AdaBoost
4. GBM
5. XGB
6. Light GBM
7. CatBoost

#### 2. Simple Ensemble Techniques
In this section, we will look at a few simple but powerful techniques, namely:

Max Voting
Averaging
Weighted Averaging

##### 1. Max Voting
The max voting method is generally used for classification problems. In this technique, multiple models are used to make predictions for each data point. The predictions by each model are considered as a ‘vote’. The predictions which we get from the majority of the models are used as the final prediction.

In [None]:
import pandas as pd
import numpy as np
import os

df = pd.read_csv('data/bank_processed_data.csv')
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
df.head()

In [None]:
from sklearn.model_selection import train_test_split
 
X = df.drop('deposit_cat', 1)
y = df.deposit_cat


X_train, X_test, y_train, y_test = train_test_split(X,y)

In [137]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import tree

model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)


model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')

model.fit(X_train,y_train)
model.score(X_test,y_test)



0.7613758509494805

##### 2. Averaging

Similar to the max voting technique, multiple predictions are made for each data point in averaging. In this method, we take an average of predictions from all the models and use it to make the final prediction. Averaging can be used for making predictions in regression problems or while calculating probabilities for classification problems.

For example, in the below case, the averaging method would take the average of all the values.

i.e. (5+4+5+4+4)/5 = 4.4

In [138]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)

pred1 = model1.predict_proba(X_test)
pred2 = model2.predict_proba(X_test)
pred3 = model3.predict_proba(X_test)

finalpred = (pred1+pred2+pred3)/3
print (finalpred)

[[0.98271867 0.01728133]
 [0.53152848 0.46847152]
 [0.98026071 0.01973929]
 ...
 [0.85739974 0.14260026]
 [0.10249435 0.89750565]
 [0.20745775 0.79254225]]




##### 3. Weighted Average

In [142]:
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)

pred1 = model1.predict_proba(X_test)
pred2 = model2.predict_proba(X_test)
pred3 = model3.predict_proba(X_test)

finalpred = (pred1*0.3+pred2*0.3+pred3*0.4)

print (finalpred)

[[0.9792624  0.0207376 ]
 [0.55783418 0.44216582]
 [0.97631286 0.02368714]
 ...
 [0.84887969 0.15112031]
 [0.10299322 0.89700678]
 [0.1889493  0.8110507 ]]




In [148]:
 model3.predict_proba([X_test.iloc[-3]])

array([[0.77219923, 0.22780077]])

In [149]:
 model3.predict([X_test.iloc[0]])

array([0], dtype=int64)

#### 3. Advanced Ensemble techniques
Now that we have covered the basic ensemble techniques, let’s move on to understanding the advanced techniques.

##### 1. Stacking
Stacking is an ensemble learning technique that uses predictions from multiple models (for example decision tree, knn or svm) to build a new model. This model is used for making predictions on the test set. Below is a step-wise explanation for a simple stacked ensemble:

The train set is split into 10 parts.

<img src="images/image-11.png">

A base model (suppose a decision tree) is fitted on 9 parts and predictions are made for the 10th part. This is done for each part of the train set.

<img src="images/image-10.png">

The base model (in this case, decision tree) is then fitted on the whole train dataset.
Using this model, predictions are made on the test set.

<img src="images/image-2.png">

Steps 2 to 4 are repeated for another base model (say knn) resulting in another set of predictions for the train set and test set.

<img src="images/image-3.png">

The predictions from the train set are used as features to build a new model

<img src="images/image12.png">

This model is used to make final predictions on the test prediction set.

##### We first define a function to make predictions on n-folds of train and test dataset. This function returns the predictions for train and test for each model.

In [150]:
from sklearn.model_selection import StratifiedKFold

def Stacking(model,train,y,test,n_fold):
    
    folds=StratifiedKFold(n_splits=n_fold,random_state=1)
    
    test_pred=np.empty((test.shape[0],1),float)
    train_pred=np.empty((0,1),float)
    
    for train_indices,val_indices in folds.split(train,y.values):
        
        x_train,x_val = train.iloc[train_indices],train.iloc[val_indices]
        y_train,y_val = y.iloc[train_indices],y.iloc[val_indices]

        model.fit(X=x_train,y=y_train)
        train_pred = model.predict(x_val)
        test_pred = model.predict(test)
        
    return test_pred,train_pred, y_val#, y_train

Now we’ll create two base models – decision tree and knn.

In [151]:
model1 = tree.DecisionTreeClassifier(random_state=1)

test_pred1 ,train_pred1, y_val_1 = Stacking(model=model1,n_fold=10, train=X_train,test=X_test,y=y_train)

train_pred1 = pd.DataFrame(train_pred1)
test_pred1 = pd.DataFrame(test_pred1)

In [153]:
model2 = KNeighborsClassifier()

test_pred2 ,train_pred2, y_val_2 =Stacking(model=model2,n_fold=10,train=X_train,test=X_test,y=y_train)

train_pred2 = pd.DataFrame(train_pred2)
test_pred2 = pd.DataFrame(test_pred2)

Create a third model, logistic regression, on the predictions of the decision tree and knn models.

In [160]:
df_train = pd.concat([train_pred1, train_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis=1)
y_test_val = y_val_1

model = LogisticRegression(random_state=1)
model.fit(df_train,y_test_val)
model.score(df_test, y_test)



0.7388032963095664

In [167]:
df_test

Unnamed: 0,0,0.1
0,0,0
1,1,0
2,0,0
3,1,1
4,1,0
5,0,0
6,0,0
7,1,1
8,1,1
9,1,0


In order to simplify the above explanation, the stacking model we have created has only two levels. The decision tree and knn models are built at level zero, while a logistic regression model is built at level one. Feel free to create multiple levels in a stacking model.