# Part 1: Stacking with sklearn

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import sem
from numpy.random import permutation

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score, KFold
from sklearn.ensemble import StackingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

### Wine quality prediction

We will use a dataset of wines, with their attributes and quality. The goal is to predict the quality from the given attributes for unseen data.

In [None]:
data = pd.read_csv("data/wine_quality.csv")

Inspect the dataset with `.head()`.

In [None]:
# Your code here...
data.head()


Create an input data array `X` (with all columns except quality), and the output array `y`. Set the output array to `float64` type.

In [None]:
# Your code here...
X, y = data.iloc[:,:-1].values, data.iloc[:,-1].values
y = y.astype('float64')


Print how many rows and columns your input array contains.

In [None]:
# Your code here...
X.shape


Divide your input data into 5-fold cross-validation sets, using the `KFold` method. Assign it to the variable `cv`.

Use randomized splits by setting the `shuffle` attribute to `True`. This is required since our data has not been randomized and is in fact ordered by wine type (reds and whites). Not randomizing the splits would make different sample distributions for the training and test sets.

In [None]:
# Your code here...
cv = KFold(n_splits=5, shuffle=True)


### Training multiple classifiers

Let's start by training different kinds of classifiers, which we will bundle together later.

Train a Linear Regression model by iterating over the 5-fold data split you created, fitting a model to the training data, and calculating the prediction error for the test data. Use the `mean_squared_error()` metric. 

Save the final scores to `scores_linreg` and print it.

In [None]:
# Your code here...
scores_linreg = []
for train_index, test_index in cv.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    model_linreg = LinearRegression().fit(X_train, y_train)
    
    error = mean_squared_error(model_linreg.predict(X_test), y_test)
    scores_linreg += [error]
scores_linreg


Now let us train a Support Vector Regression (`SVR`) model. This time use the convenience method `cross_val_score()` that implements the cross-validation and scoring procedure. Set the `scoring` attribute to `"neg_mean_squared_error"`.

Save the final scores to `scores_svr` and print it.

In [None]:
# Your code here...
model_svr = SVR()
scores_svr = cross_val_score(
    model_svr,
    X,
    y,
    scoring='neg_mean_squared_error',
    cv=cv
)
scores_svr


As a third model, train a `DecisionTreeRegressor`, again using the `cross_val_score()`. 

Save the final scores to `scores_dtr` and print it.

In [None]:
# Your code here...
model_dtr = DecisionTreeRegressor()
scores_dtr = cross_val_score(
    model_dtr,
    X,
    y,
    scoring='neg_mean_squared_error',
    cv=cv
)
scores_dtr


### Creating a Stacking ensemble

The Stacking method takes individual prediction models and combines their predictions through an additional model.

Create a Stacking model with `StackingRegressor()` that combines the three types of models we have used (Linear Regression, SVR and Decision Tree Regressor) and uses a Linear Regression model as the final layer. 

Set it up with a 6-fold cross-validation split. Note that this is the level-2 cross-validation split, which is different to the cross-validation of the dataset done at the start.

Fit and score the model with `cross_val_score()`. Save the scores to `score_stack` and print it.

In [None]:
# Your code here...
stack_linreg = LinearRegression()
model_stack = StackingRegressor(
    [
        ("linreg", LinearRegression()),
        ("svc", SVR()),
        ("dtr", DecisionTreeRegressor())
    ],
    final_estimator=stack_linreg,
    cv=6
)
scores_stack = cross_val_score(
    model_stack,
    X,
    y,
    scoring='neg_mean_squared_error',
    cv=cv
)
scores_stack


Print the average score of each for the four approaches. Note that the sign of the score may be inverted depending on the scoring method used.

Which model has the smallest test error?

In [None]:
# Your code here...
np.mean(scores_linreg), np.mean(scores_svr), np.mean(scores_dtr), np.mean(scores_stack)


Create a bar plot to visualize the resulting scores, with `plt.bar()`.

Calculate the standard error of the mean with `sem()` and use it for the `yerr` parameter.

Name each bar with the model label.

Set the vertical axis lower limit with `ylim(0.4)`.

In [None]:
# Your code here...
scores = np.array([scores_linreg, -scores_svr, -scores_dtr, -scores_stack])
labels = ["linreg", "svc", "dtr", "stacking"]

plt.bar(range(4), scores.mean(axis=1), yerr=sem(scores.T))
plt.xticks(range(4), labels)
plt.ylim(0.4)
plt.show()


The Stacking model combined the predictions of the individual models. As some models give better predictions than others, this should be reflected in how the stacking model weights each of them.

To inspect this, create a Stacking model as above (or reuse it) and now fit it to the whole dataset `X`. 

Print the final coefficients of the meta-learner layer, stored under `.final_estimator_.coef_`. Which model is given more weight and why?

In [None]:
# Your code here...
stack = model_stack.fit(X, y)
stack.final_estimator_.coef_


# Part 2: Blending implementation

Blending is a simplified version of Stacking. Instead of using K-fold cross-validation, it uses one single split of the training data: one for the individual models, one for the meta-learner.

Here we will implement this ensemble method for the same data and prediction models as in Part 1. 

### Level-1 Blending model

Let us start by implementing only level-1 (without level-2 cross-validation).

Divide the dataset `X` and `y` into training and validation sets (`X_train`, `y_train`, `X_val`, `y_val`), with 500 samples.

You may use `permutation()` to shuffle your data samples.

In [None]:
# Your code here...
inds = permutation(len(X))
X_train, y_train = X[inds[:-500]], y[inds[:-500]]
X_val, y_val = X[inds[-500:]], y[inds[-500:]]


Create the three individual prediction models, as in Part 1, and fit them to the training set.

In [None]:
# Your code here...
model_linreg = LinearRegression().fit(X_train, y_train)
model_svr = SVR().fit(X_train, y_train)
model_dtr = DecisionTreeRegressor().fit(X_train, y_train)


Stack their predictions for the validation set into a into a single array.

In [None]:
# Your code here...
stacked_output = np.array(
    [
        model_linreg.predict(X_val),
        model_svr.predict(X_val), 
        model_dtr.predict(X_val)
    ]
).T


Create a Linear Regression meta-learner and fit it to the validation set.

In [None]:
# Your code here...
meta_learner = LinearRegression().fit(stacked_output, y_val)


Create a function that implements the Blending prediction, by combining the prediction of the individual models and the meta-learner.

In [None]:
# Your code here...
blending = lambda x: meta_learner.predict(np.array(
    [
        model_linreg.predict(x), 
        model_svr.predict(x),
        model_dtr.predict(x)
    ]
).T)


Check the error of the Blending model to the data `X`. 

Note that this is the training error, not the test error, of our Blending model, as we did not do level-2 cross-validation yet, and it will be lower than the test error.

In [None]:
# Your code here...
mean_squared_error(y, blending(X))


Combine the previous steps into a function named `train_blending(X_train, y_train, X_val, y_val)`, that takes a split dataset and returns the Blending model. 

In [None]:
# Your code here...
def train_blending(X_train, y_train, X_val, y_val):
    model_linreg = LinearRegression().fit(X_train, y_train)
    model_svr = SVR().fit(X_train, y_train)
    model_dtr = DecisionTreeRegressor().fit(X_train, y_train)
    
    blending = lambda x: meta_learner.predict(np.array(
        [
            model_linreg.predict(x), 
            model_svr.predict(x),
            model_dtr.predict(x)
        ]
    ).T)
    
    return blending


### Level-2 cross-validation

Now we will do things properly and first divide the dataset in a level-2 cross-validation split.

Create a 5-Fold randomized split of the dataset with `KFold()`. Iterate over the splits, each time creating the Blending model with `train_blending()` and the training set, and saving the error for the test set. Print the average error and compare with the models in Part 1.

In [None]:
# Your code here...
cv1 = 5
val_size = 500

splits = KFold(n_splits=cv1, shuffle=True).split(X)

scores = []
for train_index, test_index in splits:
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    model = train_blending(
        X_train[:-val_size],
        y_train[:-val_size],
        X_train[-val_size:],
        y_train[-val_size:]
    )
    scores += [mean_squared_error(y_test, model(X_test))]
    
score_blending = np.mean(scores)
print(score_blending)
