# Basic Pipelines

In this notebook, we'll see how to use pipelines to fit a predictive model which contains a random forest, followed by a quantile calibration step.

In [1]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=500, n_features=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [2]:
from sklearn_helpers import RandomForestTransformer, QuantileCalibrator
from sklearn.pipeline import Pipeline

steps = [
    ('random_forest', RandomForestTransformer(n_estimators=100, max_features=0.33)),
    ('quantile_calibrator', QuantileCalibrator(quantiles=25, isotonic_fit=True, isotonic_lambda=1.0))
]

model = Pipeline(steps=steps)

The `Pipeline` class constructor expects the argument `steps`.
This argument must be a `list` of ordered-pairs where each is of the form `('step_name', estimator)`.

In this example, the first step is a `RandomForestTransformer` and the second is a `QuantileCalibrator`.

`sklearn`'s pipeline API enforces that only the final step in a pipeline can be a predicting estimator, either a regressor or a transformer. Since `RandomForestRegressor` is a regressor, we cannot put it at the top of the pipeline. Instead, we can use `sklearn_helper`'s `RandomForestTransformer` class, which is simply a wrapper around `RamdomForestRegressor`:

```python
class RandomForestTransformer(RandomForestRegressor, TransformerMixin):
    
    def transform(self, X, y=None):
        return self.predict(X)
```

While a hacky solution, hey, it works!

With the model defined, we can train it just like any other model.

In [3]:
model.fit(X_train, y_train)

Pipeline(memory=None,
     steps=[('random_forest', RandomForestTransformer(bootstrap=True, criterion='mse', max_depth=None,
            max_features=0.33, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_l..., ('quantile_calibrator', QuantileCalibrator(isotonic_fit=True, isotonic_lambda=1.0, quantiles=25))])

With the model trained, we can check it's accuracy:

In [4]:
from sklearn.metrics import r2_score

r2_score(model.predict(X_test), y_test)

0.80199939099595974

For more, see http://scikit-learn.org/stable/modules/pipeline.html#pipeline