# Stacked generalization ensemble
In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to best combine the input predictions to make a better output prediction.

It may be helpful to think of the stacking procedure as having two levels: level 0 and level 1.

- Level 0: The level 0 data is the training dataset inputs and level 0 models learn to make predictions from this data.
- Level 1: The level 1 data takes the output of the level 0 models as input and the single level 1 model, or meta-learner, learns to make predictions from this data.

## Train and save sub-models

In [1]:
from sklearn.datasets import make_blobs
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
from os import makedirs, listdir
from os.path import exists

In [2]:
# create datasets
def prepare_data():
    X, y = make_blobs(n_samples=1100, centers=3, cluster_std=2, random_state=2)
    #y = to_categorical(y)
    # split data
    n_data = 100
    trainX, testX = X[:n_data, :], X[n_data:, :]
    trainy, testy = y[:n_data], y[n_data:]
    return trainX, trainy, testX, testy


In [4]:
# create keras models and saving sub-models 
def fit_model(trainX, trainy):
    model = Sequential()
    model.add(Dense(25, input_dim=2, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(trainX, trainy, epochs=500, verbose=0)
    return model

# build data
trainX, trainy, testX, testy = prepare_data()
# create models and save to disk.
if not exists("models"):
    makedirs('models')
    n_models = 5
    for i in range(n_models):
        model = fit_model(trainX, trainy)
        filename = 'models/model_' + str(i) + '.h5'
        model.save(filename)
        print("saved %s" % filename)

# Stacking model, dataset and prediction sub-model
As input for a new model, we will require 1,000 examples with some number of features. Given that we have five models and each model makes three predictions per example, then we would have 15 (3 x 5) features for each example provided to the submodels. We can transform the [1000, 5, 3] shaped predictions from the sub-models into a [1000, 15] shaped array to be used to train a meta-learner using the reshape() NumPy function and flattening the final two dimensions. The stacked_dataset() function implements this step.

we can use this input dataset along with the output, or y part, of the test set to train a new meta-learner.

In this case, we will train a simple logistic regression algorithm from the scikit-learn library.

Logistic regression only supports binary classification, although the implementation of logistic regression in scikit-learn in the LogisticRegression class supports multi-class classification (more than two classes) using a one-vs-rest scheme. The function fit_stacked_model() below will prepare the training dataset for the meta-learner by calling the stacked_dataset() function, then fit a logistic regression model that is then returned.

In [5]:
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from tensorflow.keras.models import load_model
from numpy import dstack

In [6]:
# load models
def load_models(n_models):
    all_models = list()
    for i in range(n_models):
        filename = 'models/model_' + str(i) + '.h5'
        model = load_model(filename)
        all_models.append(model)
        print('loaded %s' %  filename)
    return all_models

# create stacked model input dataset as output from the ensemble
def stacked_dataset(members, inputX):
    stackX = None
    for model in members:
        yhat = model.predict(inputX, verbose=0)
        if stackX is None:
            stackX = yhat
        else:
            stackX = dstack((stackX, yhat))
    # flatten predictions to [rows, members * probabilities]
    stackX = stackX.reshape((stackX.shape[0], stackX.shape[1] * stackX.shape[2]))
    return stackX

# fit the model based on the outputs from the ensemble members
def fit_stacked_model(members, inputX, inputy):
    # create dataset
    stackedX = stacked_dataset(members, inputX)
    # fit model
    model = LogisticRegression()
    model.fit(stackedX, inputy)
    return model

# make prediction with the stacked model
def stacked_prediction(members, model, inputX):
    stackedX = stacked_dataset(members, inputX)
    yhat = model.predict(stackedX)
    return yhat

In [7]:
## main state, loading sub-models into a list and evaluating each performance. the best is 81.6%
## next, the logistic regression meta-learner is trained on the predicted probabilities from each sub-model on the
## test set, then the entire stacking model is evaluated on the test set.
# setup datasets
trainX, trainy, testX, testy = prepare_data()
print(trainX.shape, testX.shape)

n_members = 5
members = load_models(n_members) # load saved models

# evaluate keras sub-models on test datasets
for model in members:
    testy_cat = to_categorical(testy)    
    _, acc = model.evaluate(testX, testy_cat, verbose=0)
    print('model acc: %.3f' % acc)

# fit stacked models using ensemble
model = fit_stacked_model(members, testX, testy)
# evaluate model on test datasets
yhat = stacked_prediction(members, model, testX)
acc = accuracy_score(testy, yhat)
print('Stacked test acc: %.3f' % acc)

(100, 2) (1000, 2)
loaded models/model_0.h5
loaded models/model_1.h5
loaded models/model_2.h5
loaded models/model_3.h5
loaded models/model_4.h5


2022-02-15 21:29:20.834213: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


model acc: 0.803
model acc: 0.803
model acc: 0.807
model acc: 0.816
model acc: 0.807
Stacked test acc: 0.834


# Integrated stacking model
When using neutral network as sub-models

Specifically, the sub-networks can be embedded in a larger multi-headed neural network that then learns how to best combine the predictions from each input sub-model. It allows the stacking ensemble to be treated as a single large model.

The benefit of this approach is that the outputs of the submodels are provided directly to the meta-learner. Further, it is also possible to update the weights of the submodels in conjunction with the meta-learner model, if this is desirable.

In [8]:
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.layers import concatenate
from numpy import argmax

# stacked generalization with neutral network meta-learning model
# re-define stacked model from multiple member input models
def define_stacked_model(members):
    # update all layers in all model to not be trainable
    for i in range(len(members)):
        model = members[i]
        for layer in model.layers:
            # not trainable
            layer.trainable = False
            # rename to avoid unique layer name issue
            layer._name = 'ensemble_' + str(i+1) + layer.name
    # define multi-headed input
    ensemble_visible = [model.input for model in members]
    # concatenate merge output from each model
    ensemble_outputs = [model.output for model in members]
    merge = concatenate(ensemble_outputs)
    hidden = Dense(10, activation='relu')(merge) # set the input layer of model
    output = Dense(3, activation='softmax')(hidden)
    model = Model(inputs=ensemble_visible, outputs=output)
    # plot graph of ensemble
    plot_model(model, show_shapes=True, to_file='ensemble_stacked_model.png')
    # compile
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# fit a stacked model
def fit_stacked_model_int(model, inputX, inputy):
    # prepare input data
    X = [inputX for _ in range(len(model.input))]
    # encode output data
    y = to_categorical(inputy)
    # fit model
    model.fit(X, y, epochs=300, verbose=0)
    
# make prediction with stacked model
def predict_stacked_model(model, inputX):
    # prepare input data
    X = [inputX for _ in range(len(model.input))]
    # make prediction
    return model.predict(X, verbose=0)

In [9]:
## main stage,
# generate data
trainX, trainy, testX, testy = prepare_data()
print(trainX.shape, testX.shape)
# load sub-models
n_models = 5
members = load_models(n_models)
# define ensemble model
stacked_model = define_stacked_model(members)
# fit stacked model on test set
fit_stacked_model_int(stacked_model, testX, testy)
# predict and evaluate model
yhat = predict_stacked_model(stacked_model, testX)
yhat = argmax(yhat, axis=1)

acc = accuracy_score(testy, yhat)
print('Stacked test acc: %.3f' % acc)

(100, 2) (1000, 2)
loaded models/model_0.h5
loaded models/model_1.h5
loaded models/model_2.h5
loaded models/model_3.h5
loaded models/model_4.h5
Stacked test acc: 0.832


Once the sub-models have been prepared, we can define the stacking ensemble model.

The input layer for each of the sub-models will be used as a separate input head to this new model. This means that k copies of any input data will have to be provided to the model, where k is the number of input models, in this case, 5.

The outputs of each of the models can then be merged. In this case, we will use a simple concatenation merge, where a single 15-element vector will be created from the three class-probabilities predicted by each of the 5 models.

We will then define a hidden layer to interpret this “input” to the meta-learner and an output layer that will make its own probabilistic prediction. The define_stacked_model() function below implements this and will return a stacked generalization neural network model given a list of trained sub-models.

# learning curve for diagnosing machine learning performance

https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
