# Ensemble Learning

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Ensemble-Learning" data-toc-modified-id="Ensemble-Learning-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Ensemble Learning</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Learning Outcomes</a></span></li><li><span><a href="#The-Data" data-toc-modified-id="The-Data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>The Data</a></span></li><li><span><a href="#Voting-Classifiers" data-toc-modified-id="Voting-Classifiers-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Voting Classifiers</a></span><ul class="toc-item"><li><span><a href="#Majority-Vote-(Hard-Voting)" data-toc-modified-id="Majority-Vote-(Hard-Voting)-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Majority Vote (Hard Voting)</a></span></li><li><span><a href="#Soft-Voting" data-toc-modified-id="Soft-Voting-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Soft Voting</a></span></li></ul></li><li><span><a href="#Stacking" data-toc-modified-id="Stacking-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Stacking</a></span></li><li><span><a href="#Bagging" data-toc-modified-id="Bagging-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Bagging</a></span></li><li><span><a href="#Boosting" data-toc-modified-id="Boosting-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Boosting</a></span></li></ul></div>

# Learning Outcomes

* Distinguishing different Voting Classifiers
* Implementing both soft and hard Voting Classifiers in sklearn
* Implementing Stacking Classifier in sklearn
* Implementing Bagging Classifier in sklearn
* Implementing Boosting Classifiers in sklearn

"Ensemble methods work best when the predictors are as independent from one another as possible. One way to get diverse classifiers is to train them using very different algorithms. This increases the chance that they will make very different types of errors, improving the ensemble’s accuracy."

# The Data

In [12]:
import warnings
warnings.filterwarnings("ignore")
from utils import load_titanic
from sklearn.model_selection import train_test_split
X,y = load_titanic()
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=0)

X.head()

Unnamed: 0,age,fare,pclass_1,pclass_2,pclass_3,sex_female,sex_male,sibsp_0,sibsp_1,sibsp_2,...,sibsp_8,parch_0,parch_1,parch_2,parch_3,parch_4,parch_5,parch_6,alone_False,alone_True
0,22.0,7.25,0,0,1,0,1,0,1,0,...,0,1,0,0,0,0,0,0,1,0
1,38.0,71.2833,1,0,0,1,0,0,1,0,...,0,1,0,0,0,0,0,0,1,0
2,26.0,7.925,0,0,1,1,0,1,0,0,...,0,1,0,0,0,0,0,0,0,1
3,35.0,53.1,1,0,0,1,0,0,1,0,...,0,1,0,0,0,0,0,0,1,0
4,35.0,8.05,0,0,1,0,1,1,0,0,...,0,1,0,0,0,0,0,0,0,1


# Voting Classifiers

In [13]:
from sklearn.ensemble import VotingClassifier

## Majority Vote (Hard Voting) 

Uses class labels for voting.

In [14]:
# Voting Classifier takes a list of estimators as input.
# With voting = 'hard'
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier, ExtraTreesClassifier
from sklearn.metrics import classification_report

estimators = [('lr',LogisticRegression()), ('ac',AdaBoostClassifier()), ('et',ExtraTreesClassifier())]

clf = VotingClassifier(estimators)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
print(classification_report(y_test, preds))

              precision    recall  f1-score   support

           0       0.84      0.84      0.84       139
           1       0.74      0.74      0.74        84

    accuracy                           0.80       223
   macro avg       0.79      0.79      0.79       223
weighted avg       0.80      0.80      0.80       223



## Soft Voting

Uses class probabilities for voting.

In [15]:
clf = VotingClassifier(estimators,voting='soft')
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
print(classification_report(y_test, preds))

              precision    recall  f1-score   support

           0       0.83      0.86      0.85       139
           1       0.76      0.71      0.74        84

    accuracy                           0.81       223
   macro avg       0.80      0.79      0.79       223
weighted avg       0.81      0.81      0.81       223



# Stacking

In [16]:
from sklearn.ensemble import StackingClassifier

clf = StackingClassifier(estimators)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
print(classification_report(y_test, preds))

              precision    recall  f1-score   support

           0       0.84      0.88      0.86       139
           1       0.79      0.71      0.75        84

    accuracy                           0.82       223
   macro avg       0.81      0.80      0.81       223
weighted avg       0.82      0.82      0.82       223



# Bagging

```python
from sklearn.ensemble import BaggingClassifier
```

Bagging involves selecting random subsets from both the observations and the features to train multiple classifiers.
This way of aggragating classifers is used to reduce the variance of a model by introducing randomness. It is usually applied to strong and complex models(low bias).

An example of a BaggingClassifier is the Random Forest model, which aggragates the outputs of fully grown decision trees with random subsets of the data.

All classifiers contribute to an equal extend.

In [28]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
# Lets see if 10 bagged Decision Tree models are better than 1.
# Note: In this toy example dataset is very small and does not allow for much variation

tree10 = DecisionTreeClassifier(random_state=0)
preds = BaggingClassifier(tree10, n_estimators=10, random_state=0).fit(X_train, y_train).predict(X_test)
print(classification_report(y_test,preds))


tree1 = DecisionTreeClassifier(random_state=0)
preds = tree1.fit(X_train, y_train).predict(X_test)
print(classification_report(y_test,preds))



              precision    recall  f1-score   support

           0       0.85      0.89      0.87       139
           1       0.81      0.74      0.77        84

    accuracy                           0.83       223
   macro avg       0.83      0.82      0.82       223
weighted avg       0.83      0.83      0.83       223

              precision    recall  f1-score   support

           0       0.80      0.83      0.82       139
           1       0.71      0.65      0.68        84

    accuracy                           0.77       223
   macro avg       0.75      0.74      0.75       223
weighted avg       0.76      0.77      0.76       223



# Boosting

Using a weighted majority vote.

```python
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
```

In [32]:
# AdaBoost uses many weak learners and weights to produce strong predictions.
# Default is using a single branch decision tree, so simply a decision.

clf = AdaBoostClassifier()
preds = clf.fit(X_train, y_train).predict(X_test)
print(classification_report(y_test,preds))


from sklearn.ensemble import GradientBoostingClassifier
# GradientBoostingClassifier is an additive model meaning it adds new models as it trains
# It uses gradient descent to update its parameters.
# Usually a low learning rate with high number of trees are used for moderately large dataset.
# The state-of-the-art gradient boosting algorithms are discussing in SOTA folder.
clf = GradientBoostingClassifier()
preds = clf.fit(X_train, y_train).predict(X_test)
print(classification_report(y_test,preds))


              precision    recall  f1-score   support

           0       0.85      0.83      0.84       139
           1       0.74      0.76      0.75        84

    accuracy                           0.81       223
   macro avg       0.79      0.80      0.80       223
weighted avg       0.81      0.81      0.81       223

              precision    recall  f1-score   support

           0       0.85      0.87      0.86       139
           1       0.78      0.75      0.76        84

    accuracy                           0.83       223
   macro avg       0.81      0.81      0.81       223
weighted avg       0.82      0.83      0.82       223

