# Pipelines
Pipeline of transforms with a final estimator.

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit. 

**Chains together multiple steps: output of each step is used as input to the next step.**

**Makes it easy to apply the same preprocessing to train and test!**

## Construction

In [1]:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
pipe

Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])

## Accessing steps

In [7]:
# The estimators of a pipeline are stored as a list in the steps attribute, but can be accessed by index or name 
# by indexing (with [idx]) the Pipeline

print(pipe.steps[0])
print(pipe[0])
print(pipe['reduce_dim'])

('reduce_dim', PCA())
PCA()
PCA()


In [5]:
# Pipeline’s named_steps attribute allows accessing steps by name with tab completion in interactive environments:
pipe.named_steps.reduce_dim is pipe['reduce_dim']

True

In [6]:
# A sub-pipeline can also be extracted using the slicing notation commonly used for Python Sequences such as lists.
#(although only a step of 1 is permitted).
print(pipe[:1])
print(pipe[-1:])

Pipeline(steps=[('reduce_dim', PCA())])
Pipeline(steps=[('clf', SVC())])


## Nested parameters
### Parameters of the estimators in the pipeline can be accessed using the "estimator__parameter" syntax

In [8]:
pipe.set_params(clf__C=10)

Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC(C=10))])

## Can also be implemented in CV
**The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters.**<br>
**It enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.**

In [9]:
from sklearn.model_selection import GridSearchCV
param_grid = dict(reduce_dim__n_components=[2, 5, 10],
                  clf__C=[0.1, 10, 100])
grid_search = GridSearchCV(pipe, param_grid=param_grid)