## Search for best parameters and create a pipeline

### Easy reading...create and use a pipeline

> <b>Pipelining</b> (as an aside to this section)
* `Pipeline(steps=[...])` - where steps can be a list of processes through which to put data or a dictionary which includes the parameters for each step as values
* For example, here we do a transformation (SelectKBest) and a classification (SVC) all at once in a pipeline we set up.

See a full example [here](http://scikit-learn.org/stable/auto_examples/feature_stacker.html)

Note:  If you wish to perform <b>multiple transformations</b> in your pipeline try [FeatureUnion](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html#sklearn.pipeline.FeatureUnion)

In [None]:
# Imports for python 2/3 compatibility

from __future__ import absolute_import, division, print_function, unicode_literals

# For python 2, comment these out:
# from builtins import range

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

# a feature selection instance
selection = SelectKBest(chi2, k = 2)

# classification instance
clf = SVC(kernel = 'linear')

# make a pipeline
pipeline = Pipeline([("feature selection", selection), ("classification", clf)])

# train the model
pipeline.fit(X, y)

In [None]:
# Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.
# Homepage: http://rasbt.github.io/mlxtend/

!pip install msgpack mlxtend

In [None]:
import numpy as np
from mlxtend.plotting import plot_decision_regions

# Obtain estimated test set labels using the pipeline we created
y_pred = pipeline.predict(X_test)

# We use mlxtend to show the decision regions of the final SVC
fig, axarr = plt.subplots(1, 2, figsize=(12,5), sharex=True, sharey=True)

# Plot the decision region for the X_train and y_train. Note that the pipeline didn't transform X using
# the SelectKBest component, so we transform it here:
X_train_transformed = selection.transform(X_train)
X_test_transformed = selection.transform(X_test)

plot_decision_regions(X_train_transformed, y_train, clf=clf, legend=2, ax= axarr[0])
axarr[0].set_title("Decision Region (Trained)")

plot_decision_regions(X_test_transformed, y_pred, clf=clf, legend=2, ax= axarr[1])
axarr[1].set_title("Decision Region (Predicted)")


### Last, but not least, Searching Parameter Space with `GridSearchCV`

In [None]:
from sklearn.model_selection import GridSearchCV

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly = PolynomialFeatures(include_bias = False)
lm = LinearRegression()

pipeline = Pipeline([("polynomial_features", poly),
                         ("linear_regression", lm)])

param_grid = dict(polynomial_features__degree = list(range(1, 30, 2)),
                  linear_regression__normalize = [False, True])

grid_search = GridSearchCV(pipeline, param_grid=param_grid)
grid_search.fit(X, y)
print(grid_search.best_params_)

Created by a Microsoft Employee.
	
The MIT License (MIT)<br>
Copyright (c) 2016 Micheleen Harris