# tutorial 2: ensemble learning

This tutorial shows you how to use classes from the **ensemble_learning** module to:
* make predictions from an ensemble of different algorithms trained on the same input channel using the **Ensemble** class
* make predictions from an ensemble of models trained on different input channels using the **ChanneEnsemble** class

Competition-winning models often use ensemble learning approaches that combine the inferences of an ensemble of different ML algorithms to make a final prediction.  Predictor voting and stacked generalization are powerful methods for combining inferences that can create accurate predictions from less accurate models. Both methods require that prediction errors in the ensemble are uncorrelated.

Scikit-multichannel makes conventional single channel voting and stacked generalization approaches easy with a single class: **Ensemble**.  In addition, the **ChannelEnsemble** class provides similar functionality for an ensemble of models where each model is in a different input channel.

Both **Ensemble** and **ChannelEnsemble** classes also allow screening of base predictors using internal cross validation.  Screening can sometimes improve performance by weeding out inaccurate models.

## Single channel ensembles

### Stacked generalization

This example illustrates the use of skmultichannel's **Ensemble** class to stack ML models for ensemble learning.  The ensemble in this example has six different ML classifier algorithms.  A support vector machine (SVC) is stacked on top of these base classifiers, using their predictions as features for meta-prediction.

**Internal Cross Validation (CV) Training**
When training a meta-predictor (SVC in this example), it is standard practice to use internal
CV training of the base classifiers to prevent them from making inferences on training samples (1).  With internal CV training, training sample inference is avoided by training each base predictor multiple times on subsets of the data and using the models to make predicitions about held-out samples.  Line 23 in the code below activates internal CV and line 24 specifies that the CV prediction will be used to train the meta-predictor (the default option when CV is activated, shown here to be explicit).

(1) Wolpert, David H. "Stacked generalization."
Neural networks 5.2 (1992): 241-259.

In [1]:
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
import skmultichannel as sm

X, y = make_classification(n_classes=2, n_samples=500, n_features=100,
                           n_informative=5, class_sep=0.6)

predictors = [MLPClassifier(), LogisticRegression(),
              KNeighborsClassifier(),  GradientBoostingClassifier(),
              RandomForestClassifier(), GaussianNB()]

ensemble_clf = sm.Ensemble(
                 base_predictors=predictors,
                 meta_predictor=SVC(),
                 internal_cv=5,
                 disable_cv_train=False, 
                 base_processes='max')

clf = Pipeline([('scaler', StandardScaler()),
                ('ensemble_clf', ensemble_clf)])
sm.cross_val_score(clf, X, y)

2022-12-03 15:29:16,036	INFO services.py:1456 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


[0.7069707401032703, 0.7478485370051635, 0.7289156626506024]

### Voting

This example illustrates the use of skmultichannel's **Ensemble** class to make a voting ensemble classifier.  The ensemble in this example has six different ML classifier algorithms.  **SoftVotingMetaClassifer** takes the prediction probabilites from these classifiers uses argmax used to chose a class.  Scikit-multichannel also provides a **HardVotingMetaClassifier** class that predicts with the most frequent base prediction, and the **AggregatingMetaRegressor** class to combine the predictions of regressors.

*Note*: Scikit-multichannel's single channele classes, like Ensemble, should be fully compatible with scikit-learn.  For instance, in this example the Ensemble is used in a scikit-learn Pipeline object. 

In [2]:
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.pipeline import Pipeline
import skmultichannel as sm

X, y = make_classification(n_classes=2, n_samples=500, n_features=100,
                           n_informative=5, class_sep=0.6)

predictors = [MLPClassifier(), LogisticRegression(),
              KNeighborsClassifier(),  GradientBoostingClassifier(),
              RandomForestClassifier(), GaussianNB()]

ensemble_clf = sm.Ensemble(
                 base_predictors=predictors,
                 meta_predictor=sm.SoftVotingMetaClassifier(),
                 base_processes='max')

clf = Pipeline([('scaler', StandardScaler()), ('ensemble_clf', ensemble_clf)])
sm.cross_val_score(clf, X, y)

[0.8382099827882961, 0.8265203671830178, 0.8554216867469879]

### Stacked generalization with model selection

Internal cross validation was introduced in the first example in this notebook as a method for reducing overfitting during stacked generalization.  Scikit-multichannel also takes advantage of internal cross validation training to estimate model performance for the purpose of in-pipeline model selection.  Inaccurate models in the ensemble can sometime reduce the performance of your pipeline.  

Lines 25-27 in this example are the only lines that differ from the first example.  These lines are added to provide methods for scoring and selecting base models.  Line 25 sets the method used by the base classifiers for making predictions when their performance is being assessed.  Line 26 sets the performance metric.  Line 27 instructs Ensemble to select the 3 models with the highest balanced_accuracy scores.  For more information see the the API documentation for **Ensemble** and for skmultichannel's **score_selection** module.   

In [3]:
from sklearn.datasets import make_classification
from sklearn.metrics import balanced_accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
import skmultichannel as sm

X, y = make_classification(n_classes=2, n_samples=500, n_features=100,
                           n_informative=5, class_sep=0.6)

predictors = [MLPClassifier(), LogisticRegression(),
              KNeighborsClassifier(),  GradientBoostingClassifier(),
              RandomForestClassifier(), GaussianNB()]

ensemble_clf = sm.Ensemble(
                 base_predictors=predictors,
                 meta_predictor=SVC(),
                 internal_cv=5, 
                 base_score_methods='predict',
                 scorer=balanced_accuracy_score,
                 score_selector=sm.RankScoreSelector(k=3),
                 disable_cv_train=False, 
                 base_processes='max')

clf = Pipeline([('scaler', StandardScaler()),
                ('ensemble_clf', ensemble_clf)])

sm.cross_val_score(clf, X, y, score_method='predict', scorer=balanced_accuracy_score)



[0.6524670109007458, 0.611732644865175, 0.6746987951807228]

In [4]:
clf.fit(X, y)
ensemble_clf = clf.named_steps['ensemble_clf']
ensemble_clf.get_screen_results()



Unnamed: 0_level_0,performance,selections
model,Unnamed: 1_level_1,Unnamed: 2_level_1
MLPClassifier(),0.568,-
LogisticRegression(),0.556,-
KNeighborsClassifier(),0.58,-
GradientBoostingClassifier(),0.66,+++
RandomForestClassifier(),0.644,+++
GaussianNB(),0.608,+++


### Model selection without ensemble prediction

The **Ensemble** class can also be used without a meta_predictor for model model selection.  When the meta_predictor parameter is left at the default value of None during Ensemble initialization, Ensemble will use internal cv and scoring to select the best predictor in the ensemble during fitting.  Only the most accurate predictor in the ensemble predictor will be stored and used for inference.

In [5]:
from sklearn.metrics import balanced_accuracy_score
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.pipeline import Pipeline
import skmultichannel as sm

X, y = make_classification(n_classes=2, n_samples=500, n_features=100,
                           n_informative=5, class_sep=0.6)

predictors = [MLPClassifier(), LogisticRegression(),
              KNeighborsClassifier(),  GradientBoostingClassifier(),
              RandomForestClassifier(), GaussianNB()]

ensemble_clf = sm.Ensemble(
                 base_predictors=predictors,
                 meta_predictor=None,
                 internal_cv=5, 
                 base_score_methods='predict',
                 scorer=balanced_accuracy_score,
                 base_processes='max')

clf = Pipeline([('scaler', StandardScaler()),
                ('ensemble_clf', ensemble_clf)])

sm.cross_val_score(clf, X, y, score_method='predict', scorer=balanced_accuracy_score)



[0.7786144578313253, 0.7664228341939185, 0.8192771084337349]

In [7]:
clf.fit(X, y)
ensemble_clf = clf.named_steps['ensemble_clf']
ensemble_clf.get_screen_results()

Unnamed: 0_level_0,performance,selections
model,Unnamed: 1_level_1,Unnamed: 2_level_1
MLPClassifier(),0.654034,-
LogisticRegression(),0.666019,-
KNeighborsClassifier(),0.609954,-
GradientBoostingClassifier(),0.758084,+++
RandomForestClassifier(),0.73198,-
GaussianNB(),0.694067,-


## Multichannel ensembles

The **ChannelEnsemble** class supports the same functionality as **Ensemble** (voting, stacked generalization, model selection) but instead of training an ensemble of models for a single channel, it trains an ensemble of models for multiple channels with one model per channel.  Training an ML model on each input and combining inferences through voting or stacking can improve predictive accuracy over a simple concatenation -> ML pipeline.  

*Note*:  You can build multichannel ensembles manually without the **ChannelEnsemble** class using using the **make_transform** and **make_cv_transform** functions to convert predictors into transformers and provide internal cv training functionality -> **ChannelConcenator** to combine outputs from the base predictors -> meta-predictor.  For more info on this usage style, see the 3 different style examples in the **MultichannelPipeline** and **SoftVotingMetaClassifier** API documenation (or docstrings).

### Stacked generalization

In [8]:
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
import skmultichannel as sm

Xs, y, X_types = sm.make_multi_input_classification(n_informative_Xs=3,
                                                    n_random_Xs=7)

clf = sm.MultichannelPipeline(n_channels=10)
clf.add_layer(StandardScaler())
clf.add_layer(sm.ChannelEnsemble(base_predictors=GradientBoostingClassifier(), 
                                 meta_predictor=SVC(), 
                                 internal_cv=5),
              pipe_processes='max')

sm.cross_val_score(clf, Xs, y)

[0.9411764705882353, 0.9393382352941176, 0.9117647058823529]

In [9]:
clf.fit(Xs, y)

Unnamed: 0_level_0,layer_0,out_0,layer_1,out_1
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,StandardScaler,→,ChannelEnsemble,→
1,StandardScaler,→,▽,
2,StandardScaler,→,▽,
3,StandardScaler,→,▽,
4,StandardScaler,→,▽,
5,StandardScaler,→,▽,
6,StandardScaler,→,▽,
7,StandardScaler,→,▽,
8,StandardScaler,→,▽,
9,StandardScaler,→,▽,


### Voting

In [10]:
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
import skmultichannel as sm

Xs, y, X_types = sm.make_multi_input_classification(n_informative_Xs=3,
                                                    n_random_Xs=7)

clf = sm.MultichannelPipeline(n_channels=10)
clf.add_layer(StandardScaler())
clf.add_layer(sm.ChannelEnsemble(base_predictors=GradientBoostingClassifier(), 
                                 meta_predictor=sm.SoftVotingMetaClassifier()),
              pipe_processes='max')

sm.cross_val_score(clf, Xs, y)

[0.9411764705882353, 0.9393382352941176, 0.8161764705882353]

In [11]:
clf.fit(Xs, y)

Unnamed: 0_level_0,layer_0,out_0,layer_1,out_1
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,StandardScaler,→,ChannelEnsemble,→
1,StandardScaler,→,▽,
2,StandardScaler,→,▽,
3,StandardScaler,→,▽,
4,StandardScaler,→,▽,
5,StandardScaler,→,▽,
6,StandardScaler,→,▽,
7,StandardScaler,→,▽,
8,StandardScaler,→,▽,
9,StandardScaler,→,▽,


### Stacked generalization with model selection

In [12]:
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
import skmultichannel as sm

Xs, y, X_types = sm.make_multi_input_classification(n_informative_Xs=3,
                                                    n_random_Xs=7)

clf = sm.MultichannelPipeline(n_channels=10)
clf.add_layer(StandardScaler())
clf.add_layer(sm.ChannelEnsemble(GradientBoostingClassifier(), SVC(), 
                                 internal_cv=5, scorer='auto',
                                 score_selector=sm.RankScoreSelector(k=3),),
              pipe_processes='max')

sm.cross_val_score(clf, Xs, y)

[0.9117647058823529, 0.9080882352941176, 0.8474264705882353]

In [13]:
clf.fit(Xs, y)

Unnamed: 0_level_0,layer_0,out_0,layer_1,out_1
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,StandardScaler,→,ChannelEnsemble,→
1,StandardScaler,→,▽,
2,StandardScaler,→,▽,
3,StandardScaler,→,▽,
4,StandardScaler,→,▽,
5,StandardScaler,→,▽,
6,StandardScaler,→,▽,
7,StandardScaler,→,▽,
8,StandardScaler,→,▽,
9,StandardScaler,→,▽,


In [14]:
ensemble_clf = clf.get_model(1,0)
df = ensemble_clf.get_screen_results()
df['input type'] = X_types
df

Unnamed: 0_level_0,performance,selections,input type
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.787115,+++,informative
1,0.393758,-,random
2,0.507803,-,random
3,0.737495,+++,informative
4,0.55022,-,random
5,0.705882,+++,informative
6,0.64946,-,random
7,0.542617,-,random
8,0.443377,-,random
9,0.367747,-,random


### Model selection without ensemble prediction

In [15]:
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
import skmultichannel as sm

Xs, y, X_types = sm.make_multi_input_classification(n_informative_Xs=3,
                                                    n_random_Xs=7)

clf = sm.MultichannelPipeline(n_channels=10)
clf.add_layer(StandardScaler())
clf.add_layer(sm.ChannelEnsemble(GradientBoostingClassifier(), internal_cv=5, 
                                 scorer='auto'), pipe_processes='max')

sm.cross_val_score(clf, Xs, y)

[0.9411764705882353, 0.9705882352941176, 0.9411764705882353]

In [16]:
clf.fit(Xs, y)

Unnamed: 0_level_0,layer_0,out_0,layer_1,out_1
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,StandardScaler,→,ChannelEnsemble,→
1,StandardScaler,→,▽,
2,StandardScaler,→,▽,
3,StandardScaler,→,▽,
4,StandardScaler,→,▽,
5,StandardScaler,→,▽,
6,StandardScaler,→,▽,
7,StandardScaler,→,▽,
8,StandardScaler,→,▽,
9,StandardScaler,→,▽,


In [17]:
ensemble_clf = clf.get_model(1,0)
df = ensemble_clf.get_screen_results()
df['input type'] = X_types
df

Unnamed: 0_level_0,performance,selections,input type
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.539816,-,random
1,0.607843,-,random
2,0.835934,-,informative
3,0.472189,-,random
4,0.537015,-,random
5,0.906963,+++,informative
6,0.509804,-,random
7,0.586635,-,random
8,0.742697,-,informative
9,0.436174,-,random
