In this the preprocessing transformations are applied one after another on the input feature matrix




```
si = SimpleImputer()
x_imputed = si.fit_transform(x)
ss = StandardScaler()
x_scaled = ss.fit_transform(X_imputed)
```



It is excltly important to apply same transformations on training, evaluation and test set in same order

`sklearn.pipeline` module provides utilities to build a composite estimator, as chain of transformers and estimators.

Two classes:
1. Pipeline
2. FeatureUnion

Pipeline | FeatureUnion 
-----| ------
Constructs a chain of multiple transformers to execute a fixed sequence of steps in data preprocessing and modelling | Combines output from several transformers by creating a new transformer from them

# sklearn.pipelien.Pipeline

1. Sequentially apply list of transformers and estimators

2. Intermediate steps of pipeline must be transformers - i.e. they must implement fit and transform method

3. The final estimator only needs to implement fit

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters

There are two ways to create pipeline objects:

1. Pipeline('estimatorName', estimator())

2. make_pipeline()

## Pipeline()



```
estimator = [
  ('simple_imputer', SimpleImputer()),
  ('Standard_sacaler', StandardScaler()),
]

pipe= Pipeline(steps = estimators)
pipe.fit_transform(x)
```




## make_pipeline()

it takes number of estimator objects only

```
pipe = make_pipeline(SimpleImputer(), StandardScaler())
```



# Accessing Individual steps in the pipeline



```
estimators = [
  ('simple_imputer', SimpleImputer()),  # step 1
  ('pca', PCA()),                       # step 2
  ('regressor', LinearRegression())     # step 3
]

pipe = Pipeline(steps = estimators)

# the second estimator can be accessed in following 4 ways:

pipe.names_steps.pca
pipe.steps[1]
pipe[1]
pipe['pca']
```



# Access parameters of each step in Pipeline



```
estimators = [
  ('simple_imputer', SimpleImputer()),  # step 1
  ('pca', PCA()),                       # step 2
  ('regressor', LinearRegression())     # step 3
]

pipe = Pipeline(steps = estimators)

pipe.set_params(pca__n_components = 2)
```

We can use `<estimator>__<parameterName>` syntax for accessing parameters. _ _ underscores*



# performing grid search with pipeline




```
param_grid = dict(imputer = ['passthrough', SimpleImputer(), KNNImputer()],
clf = [SVC(), Logisticregression()], clf__C = [0.1,10,100])

grid_search = GridSearchCV(pipe, param_grid = param_grid)


```
c is an inverse of regularization. lower its value stronger the regularisation is


# sklearn.pipeline.FeatureUnion

concatenates results of multiple transformer objects. Applies list of transformer objects in parallel. and outputs are concatenated side by side into larger matrix. FeatureUnion and Pipeline can be used to create complex transformers

In [1]:
num_pipeline = Pipeline([('selector, ColumnTransformer([('select_first_4','passthrough', slice(0,4))])),
                          ('imputer', SimpleImputer(strategy = 'median')),
                          ('std_scaler', StandardScaler()),])

cat_pipeline = ColumnTransformer([('label_binarizer', LabelBinarizer(),[4]),])

full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline),
                                                 ("cat_pipeline", cat_pipeline),])




SyntaxError: ignored