## Creating Pipelines

This notebooks covers a few very important examples of how to engineer raw and aribitrary data to well-formatted features using Sklearn pipeline.

We'll learn how to develop pipelines that string together a series of transformation and training tasks.

### Feature Pipeline

We can string multiple transformational and training steps together using Pipelines without worrying about how to handle their individual input and output.

In [1]:
import numpy as np
from numpy import nan

##create random data with different scales and nans
X = np.array([[ nan, 1,   5  ],
              [ 8,   5,   6  ],
              [ 4,   5,   3  ],
              [ 5,   nan, 1  ],
              [ 10,   9,   9  ]])

##create labels
y = np.array([11, 15, -3,  10, -8])

In [2]:
##import the imputers
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

##create pipeline to string together transformation steps
##of handling missing values, scaling the features and 
##linear regression model
model = Pipeline([('imputer', SimpleImputer(strategy='mean')),
                  ('scaler', StandardScaler()),
                  ('model', LinearRegression())
                 ])

In [3]:
##train the final model after preprocessing is done
model.fit(X, y)
print(y)
print(model.predict(X))

[11 15 -3 10 -8]
[14.37256483  7.18178154 -3.68984281 11.76293161 -4.62743517]
