# Introduction

### Brief Overview

Stacking is a technique that is popular in machine learning competitions and real-world problems where even small gain in performance is important. Stacking collects predictions made by several models and then runs another model on top of them. This results in ability to pick insights from multiple approaches and combine them.

In this notebook, an implementation of stacking is demonstrated. This implementation has three noteworthy properties:

1. final model uses out-of-fold predictions of base models and this reduces impact of predictions that are made by high-capacity models, i.e., models that are prone to overfit;

2. `StackingRegressor` and `StackingClassifier` fully support `sklearn` API which means that, for example, `GridSearchCV` can be used with them;

3. all estimators with `sklearn` API are supported as both first stage or second stage estimators; in particular, objects of class `sklearn.pipeline.Pipeline` are supported.

### References

An easy-to-read guide to stacking and some other ensembling techniques:
* [https://mlwave.com/kaggle-ensembling-guide/](https://mlwave.com/kaggle-ensembling-guide/)

The article where stacking was suggested at the first time:
* Wolpert, D. (1992). Stacked generalization, Neural Networks (5) : 241–259.

# General Preparations

In [1]:
from sklearn.datasets import load_boston

from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor

from dsawl.stacking import StackingRegressor

# Illustrative Example

### Data Loading

In [2]:
bunch = load_boston()
X, y = bunch.data, bunch.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=361)

### Benchmarks Produced by Single Models

In [3]:
linear_regression = LinearRegression().fit(X_train, y_train)
linear_regression_predictions = linear_regression.predict(X_test)
r2_score(y_test, linear_regression_predictions)

0.77601926508014896

In [4]:
k_neighbors = KNeighborsRegressor(n_neighbors=3).fit(X_train, y_train)
k_neighbors_predictions = k_neighbors.predict(X_test)
r2_score(y_test, k_neighbors_predictions)

0.65217359563067423

In [5]:
random_forest = RandomForestRegressor(random_state=361).fit(X_train, y_train)
random_forest_predictions = random_forest.predict(X_test)
r2_score(y_test, random_forest_predictions)

0.91034904977352005

### Stacking Itself

In [6]:
stacking = StackingRegressor(
    base_estimators_types=[LinearRegression, KNeighborsRegressor, RandomForestRegressor],
    base_estimators_params=[{}, {'n_neighbors': 3}, {}],
    random_state=361
)
stacking.fit(X_train, y_train)
stacking_predictions = stacking.predict(X_test)
r2_score(y_test, stacking_predictions)

0.92254428978438108

### Conclusion

Although linear regression and method of $K$ nearest neighbors are significantly weaker than random forest in the problem under consideration, involvement of their predictions leads to a moderate increase in $R^2$ coefficient of determination. This illustrates why stacking can be powerful sometimes.

# To Be Continued

If you need more examples of how to use `StackingRegressor` or `StackingClassifier` and you can not wait until this demo is updated, please look at the file located at `../tests/stacking_tests.py`. Also you can call Python's built-in function `help` with a class or a methods as its argument — all classes and public methods from `dsawl` package are documented.