## Chapter 15 Improve Performance with Ensembles

The three most popular methods for combining the predictions from diﬀerent models are:
- **Bagging**. Building multiple models (typically of the same type) from diﬀerent subsamples of the training dataset.
- **Boosting**. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the sequence of models.
- **Voting**. Building multiple models (typically of diﬀering types) and simple statistics (like
calculating the mean) are used to combine predictions.

#### 1. Bagging Algorithms

**Bootstrap Aggregation (or Bagging)** involves taking multiple samples from your training dataset (**with replacement**) and training a model for each sample. The final output prediction is averaged across the predictions of all of the sub-models.

##### (1) Bagged Decision Trees

Bagging performs best with algorithms that have high variance.
Train multiple independent decision trees on different bootstrap samples of the training data, then average (for regression) or vote (for classification) their predictions.

How it works:
- Randomly sample the training data with replacement (bootstrap sample) to train each tree.
- Each tree sees the same set of features — no randomness in feature selection.
- Aggregate predictions:
  - Classification → majority vote
  - Regression → mean of predictions

Randomness introduced: only through data sampling.

Goal: reduce variance (overfitting) of individual decision trees.


Below is an example of using the `BaggingClassifier` with the Classification and Regression Trees algorith (`DecisionTreeClassifier`).

In [6]:
# load data
import pandas as pd
from sklearn.model_selection import KFold, cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

filename = 'data/pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(filename, names=names)
print(df.shape)
df.head()

(768, 9)


Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [7]:
# Bagged decision tress for classification
array = df.values
X = array[:, :-1]
Y = array[:,-1]
kfold = KFold(n_splits=10, shuffle=True, random_state=7)
cart = DecisionTreeClassifier()
num_trees = 100
model = BaggingClassifier(estimator=cart, n_estimators=num_trees, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7578263841421736


##### (2) Random Forecast

*Random Forests is an extension of bagged decision trees*. Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation
between individual classifiers. *Specifically, rather than greedily choosing the best split point in the construction of each tree, only a random subset of features are considered for each split*.
By restricting the set of features each tree can consider at each split, Random Forests make trees less correlated, which further reduces variance and often improves generalization.

Randomness introduced:
- Random data sampling (bootstrap)
- Random feature subset selection per split


Below is an example of a Random Forest model constructed for classification using the `RandomForestClassifier` class.

In [12]:
# Random forecast classification
from sklearn.ensemble import RandomForestClassifier

num_trees = 100
max_features = 3 # split points chosen from a random selection of 3 features
kfold = KFold(n_splits=10, shuffle=True, random_state=7)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7682330827067669


##### (3) Extra Trees

Extra Trees are even more random than Random Forests.
They randomize both the feature selection and the split thresholds.

How it differs from Random Forest:
- Still uses random subsets of features (like Random Forest).
- But does not search for the best split on those features.
  - Instead, it picks random split points (thresholds) for each feature and chooses one randomly.
- Sometimes, Extra Trees don’t even use bootstrapped samples — they can train on the entire dataset.


Below is an example of an Extra Trees model constructed for classification using
the `ExtraTreesClassifier` class.

In [13]:
# extra trees classification
from sklearn.ensemble import ExtraTreesClassifier

num_trees = 100
max_features = 7
kfold = KFold(n_splits=10, shuffle=True, random_state=7)
model = ExtraTreesClassifier(n_estimators=num_trees, max_features=max_features)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7525974025974026


| Algorithm         | Data Sampling          | Feature Sampling | Split Selection  | Variance     | Bias     | Speed       |
| ----------------- | ---------------------- | ---------------- | ---------------- | ------------ | -------- | ----------- |
| **Bagged Trees**  | Bootstrap samples      | All features     | Best split       | ↓ Variance   | Same     | Moderate    |
| **Random Forest** | Bootstrap samples      | Random subset    | Best split       | ↓↓ Variance  | Slight ↑ | Moderate    |
| **Extra Trees**   | (Optionally) full data | Random subset    | **Random split** | ↓↓↓ Variance | ↑↑ Bias  | **Fastest** |


#### 2. Boosting Algorithms

Boosting ensemble algorithms creates a sequence of models that attempt to correct the mistakes of the models before them in the sequence. Once created, the models make predictions which
may be weighted by their demonstrated accuracy and the results are combined to create a final output prediction.

##### (1) AdaBoost

AdaBoost (Adaptive Boosting) adjusts sample weights to focus on hard-to-classify examples.
It generally works by weighting instances in the dataset by how easy or diﬃcult they are to classify, allowing the algorithm to pay or less attention to them in the construction of subsequent models.

AdaBoost “adapts” to errors — focusing more on examples the model struggles with.

Works well on clean data, but can overfit if there’s too much noise or outliers (since it keeps emphasizing hard cases).

Below is an example of an AdaBoost model constructed for classification using the `AdaBoostClassifier` class.

If you don’t specify a base estimator, scikit-learn uses: DecisionTreeClassifier(max_depth=1). That is — a decision stump, meaning a tree with only one split (just one decision rule).

In [14]:
# AdaBoost classification
from sklearn.ensemble import AdaBoostClassifier

num_trees = 30
kfold = KFold(n_splits=10, shuffle=True, random_state=7)
model = AdaBoostClassifier(n_estimators=num_trees, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7552460697197538


##### (2) Stochastic Gradient Boosting

Stochastic Gradient Boosting or **SGB** (also called Gradient Boosting Machines) are one of the most sophisticated ensemble techniques. It is also a technique that is proving to be perhaps one of the best techniques available for improving performance via ensembles.
Gradient Boosting fits new trees to correct the errors (residuals) of the previous ones.
Stochastic adds randomness to make the process less greedy and reduce variance.

Below is an example of a Gradient Boosting model constructed for classification using the `GradientBoostingClassifier` class.

In [15]:
# Stochastic gradient boosting classification
from sklearn.ensemble import GradientBoostingClassifier

num_trees = 100
kfold = KFold(n_splits=10, shuffle=True, random_state=7)
model = GradientBoostingClassifier(n_estimators=num_trees, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7578947368421053


#### 3. Voting Ensemble

In [18]:
# Voting ensemble for classification
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

# create sub models
estimators = []
model1 = LogisticRegression(max_iter=200)
estimators.append(('logistic', model1))
model2 = DecisionTreeClassifier()
estimators.append(('cart', model2))
model3 = SVC()
estimators.append(('svm', model3))

# create ensemble model
ensemble = VotingClassifier(estimators)
results = cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())

0.7721633629528366
