Pipeline can be used to chain multiple estimators into one.

> * Convenience and encapsulation
>> <i>fit</i> and <i>predict</i>
> * Joint parameter selection
>> grid search
> * Safety
>> same samples are used to train the transformers and predictors

key : name of step
value : estimator object

In [2]:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
pipe 

Pipeline(memory=None,
     steps=[('reduce_dim', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)), ('clf', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

In [3]:
from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import Binarizer
make_pipeline(Binarizer(), MultinomialNB()) 

Pipeline(memory=None,
     steps=[('binarizer', Binarizer(copy=True, threshold=0.0)), ('multinomialnb', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])

In [13]:
pipe.steps

[('reduce_dim',
  PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)),
 ('clf', SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False))]

In [5]:
pipe.named_steps['reduce_dim']

PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

In [7]:
pipe.set_params(clf__C=10)  # <estimator>__<parameter>

Pipeline(memory=None,
     steps=[('reduce_dim', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)), ('clf', SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

In [9]:
pipe.named_steps.reduce_dim is pipe.named_steps['reduce_dim']

True

In [10]:
from sklearn.model_selection import GridSearchCV
param_grid = dict(reduce_dim__n_components=[2, 5, 10],
                  clf__C=[0.1, 10, 100])
grid_search = GridSearchCV(pipe, param_grid=param_grid)

non-final steps may be ignored by setting them to None

In [11]:
from sklearn.linear_model import LogisticRegression
param_grid = dict(reduce_dim=[None, PCA(5), PCA(10)],
                  clf=[SVC(), LogisticRegression()],
                  clf__C=[0.1, 10, 100])
grid_search = GridSearchCV(pipe, param_grid=param_grid)

```python
from sklearn.pipeline import Pipeline
```
> * named_steps

| Parameters | type | value | dafault |
|------------|------|-------|---------|
| steps | list | | |
| memory | None, str or object with the joblib.Memory interface | optional | |


| Methods | Parameters | return |
|---------|------------|--------|
| decision_function | X | y_score |
| fit | X, y=None, \*\*fit_params | self |
| fit_predict | X, y=None, \*\*fit_params | y_pred |
| fit_transform | X, y=None, \*\*fit_params | Xt |
| get_params | deep=True | params |
| inverse_transform | Xt | Xt |
| predict | X | y_pred |
| predict_log_proba | X | y_score |
| predict_proba | X | y_proba |
| score | X, y=None, sample_weight=None | score |
| set_params | \*\*kwargs | self |
| transform | X | Xt |

