# Scked generalization applied to a regression problem

This example shows how stacked generalization can be used to combine several
regressors into a single stacked one that performs better than the best
regressor.

The example uses the k-fold aproach to blending regressors on the inner layers.

We use Boston's house pricing dataset to compare the mean squared error between
three regressors (SVM, Lasso and Ridge regressions) and the combination of
their outputs with a single linear regression. The following result is
achieved.

In [1]:
# utils
from time import time

import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec, GridSpecFromSubplotSpec

# import base regressors
from sklearn.linear_model import LassoCV, RidgeCV, LinearRegression
from sklearn.svm import SVR

# stacking api
from wolpert.pipeline import make_stack_layer
from wolpert.pipeline import StackingPipeline

# dataset
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

%matplotlib inline

RANDOM_SEED = 89555

In [2]:
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=RANDOM_SEED)

Let's first define and train the base regressors (the ones that will stay on the first layer).

In [3]:
lasso = LassoCV(random_state=RANDOM_SEED)
ridge = RidgeCV()
svr = SVR(C=1e3, gamma=1e-4, kernel='rbf')

base_regressors = [("Lasso Regressor", lasso),
                   ("Ridge Regressor", ridge),
                   ("SVR", svr)]

In [4]:
def evaluate_and_log_model(name, model):
    t0_train = time()
    model.fit(X_train, y_train)
    train_time = time() - t0_train
    y_pred = model.predict(X_test)
    score = mean_squared_error(y_test, y_pred)
    print("MSE for %s: %.3f (train time: %.3f seconds)"
          % (name, score, train_time))

    return train_time, score


train_times = []
scores = []
labels = [name for name, _ in base_regressors]

for i, (name, regressor) in enumerate(base_regressors):
    train_time, score = evaluate_and_log_model(name, regressor)
    train_times.append(train_time)
    scores.append(score)

MSE for Lasso Regressor: 22.913 (train time: 0.045 seconds)
MSE for Ridge Regressor: 21.699 (train time: 0.001 seconds)
MSE for SVR: 21.227 (train time: 0.074 seconds)


They're really close to each other, and SVR is the winning one. Now let's build our stacked ensemble. On the first layer we'll use the three estimators we already have and, for the final estimator, we'll use a simple linear regression.

In [5]:
layer0 = make_stack_layer(lasso, ridge, svr)
final_estimator = LinearRegression()

stacked_reg = StackingPipeline([('layer-0', layer0),
                                ('final', final_estimator)])

train_time, score = evaluate_and_log_model('Stacked ensemble', stacked_reg)

MSE for Stacked ensemble: 18.149 (train time: 0.268 seconds)


We can see the stacked ensemble is better than the best of the estimators in the first layer.