# tutorial 3: ensemble learning

This tutorial show you how to use classes from the **ensemble_learning** module to:
* make predictions from an ensemble of different algorithms trained on the same input channel using the **Ensemble** class
* make predictions from an ensemble of models trained on different input channels using the **ChanneEnsemble** class

ML models that win competitions often use ensemble learning approaches, combining the inferences of an ensemble of different ML algorithms to make a final prediction.     

Classifier voting is a powerful way to improve classification accuracy by averaging the predicted class probabilities of multiple different classifiers.  Stacked generalization is a powerful way to combine the inferences of an ensemble of models by using them as inputs to an ML algorithm that is "stacked" on to of the base predictor ensemble. 

Pipecaster makes conventional single channel voting and stacking ML approaches easy with a single class: **Ensemble**.  The **ChannelEnsemble** class provides similar functionality but with multiple input channels.  Multichannel ensembles are tricky to build with scikit-learn, which provided one of the oringal motivations for the development of pipecaster.

Both **Ensemble** and **ChannelEnsemble** classes also allow screening of ensemble members using internal cross validation.  Seleciton can sometimes improve performance by eliminating inaccurate models.

In [2]:
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC 
import pipecaster as pc

early_stopping_GBC = GradientBoostingClassifier(n_estimators=1000, 
                                     validation_fraction=0.1, 
                                     n_iter_no_change=3)

clf = pc.MultichannelPipeline(n_channels=10)
clf.add_layer(early_stopping_GBC)
clf.add_layer(pc.MultichannelPredictor(SVC()))
clf

Unnamed: 0_level_0,layer_0,layer_1
channel,Unnamed: 1_level_1,Unnamed: 2_level_1
0,GradientBoostingClassifier,SVC_MC
1,GradientBoostingClassifier,▽
2,GradientBoostingClassifier,▽
3,GradientBoostingClassifier,▽
4,GradientBoostingClassifier,▽
5,GradientBoostingClassifier,▽
6,GradientBoostingClassifier,▽
7,GradientBoostingClassifier,▽
8,GradientBoostingClassifier,▽
9,GradientBoostingClassifier,▽


**Notes on dataframe visualization**:  
* Inverted triangles indicate that the channels are spanned by the multichannel pipe shown directly above. 
* The _MC suffix indicates that the SVC classifier is being used for multichannel prediction (i.e. it's wrapped in the MultichannelPredictor class).
* The integer channel and layer indices seen on the left and top can be used to reference specific object instances in the pipeline (see MultichannelPipeline interface annotations).  

In [4]:
mclf1.fit(Xs, y)

Unnamed: 0_level_0,layer_0,out_0,layer_1,out_1
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,{GradientBoostingClassifier}cvtr,→,{SVC_MC}tr,→
1,{GradientBoostingClassifier}cvtr,→,▽,
2,{GradientBoostingClassifier}cvtr,→,▽,
3,{GradientBoostingClassifier}cvtr,→,▽,
4,{GradientBoostingClassifier}cvtr,→,▽,
5,{GradientBoostingClassifier}cvtr,→,▽,
6,{GradientBoostingClassifier}cvtr,→,▽,
7,{GradientBoostingClassifier}cvtr,→,▽,
8,{GradientBoostingClassifier}cvtr,→,▽,
9,{GradientBoostingClassifier}cvtr,→,▽,


**Notes on dataframe visualization**: 
* GradientBoostingClassifier appears in brackets followed by 'cvtr' to indicate that the classifier has been wrapped with a class that provides internal cross validation training to prevent overfitting of downstream metaclassifiers.
* SVC_MC appears in brackets followed by 'tr' to indicate that it has been wrapped to provide a transform() method that predictors ordinarily lack (transform functionality is there in case the user wants to generate inputs for a function external to the pipeline). 
* Output layers are indicated showing which channels generate outputs.

### Performance analysis

In [None]:
%matplotlib inline
import time
from sklearn.metrics import balanced_accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.feature_selection import SelectPercentile
from sklearn.pipeline import Pipeline

import matplotlib.pyplot as plt
from scipy.stats import sem

n_cpus = pc.count_cpus()

# Test a MultichannelPipeline in a cross validation experiment
t = time.time()
pc_accuracies = pc.cross_val_score(mclf1, Xs, y, scorer=balanced_accuracy_score, cv=8, n_processes=n_cpus)
pc_time = time.time() - t

# Test a single channel scikit-learn pipeline in a cross validation experiment
X = np.concatenate(Xs, axis=1)
clf = Pipeline([('GradientBoostingClassifier', early_stopping_GBC)])
t = time.time()
sk_accuracies = cross_val_score(clf, X, y, scoring='balanced_accuracy', cv=8, n_jobs=n_cpus)
sk_time = time.time() - t

# Plot the cross validation results
fig, axes = plt.subplots(1, 2)
xlabels = ['concatenated', 'multichannel']
axes[0].bar(xlabels, [np.mean(sk_accuracies), np.mean(pc_accuracies)], 
        yerr=[sem(sk_accuracies), sem(pc_accuracies)], capsize=10)
axes[0].set_ylim(.60)
axes[0].set_ylabel('balanced accuracy', fontsize=12)
axes[0].set_xticklabels(xlabels, rotation=45, ha='right', fontsize=12)
axes[1].bar(xlabels, [np.mean(sk_time), np.mean(pc_time)])
axes[1].set_ylabel('execution time [s]', fontsize=12)
axes[1].set_xticklabels(xlabels, rotation=45, ha='right', fontsize=12)
plt.tight_layout()
# plt.savefig('performance_comparison.svg')

**Results**  
The multichannel pipecaster pipeline performs better on this task than the pipeline architecture that uses concatenated features.  This performance enhancement, which is sometimes seen with real data, may be due to the tendency of meta-predictors to correct errors found in the base predictors and perhaps to increased diversity of the features important to the base predictors.

This example is a proxy for a more thorough demonstration of performance improvement, which would involve a full hyperparameter optimization for the two pipeline architectures.  The early stopping GradientBoostingClassifier method shown here is a decent proxy though, because performance of this classifier is not very sensitive to hyperparameter values, and model complexity is automatically set by increasing the number of boosting rounds until performance on a validation set stops increasing.