# Scikit Learn Pipelines

This notebook uses the built-in Digits dataset you saw previously in the semester, 
where the task is the recognize / classify written digits.

 * Derived from Scikit Learn Documentation

In [None]:
from __future__ import print_function, division

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA, NMF
from sklearn.feature_selection import SelectKBest, chi2


# Defining the segements of the pipe

Here we define a pipeline as an ordered list of classes that will take data.
In the example below:
  1. Data --> PCA --> Data_Features
  1. Data_Features --> LinearSVC --> Classifications

Therefore, 
  1. Data --> Pipeline --> Classifications

In [None]:
pipe = Pipeline([
    # Named step "reduce_dim" ... uses code module PCA
    ('reduce_dim', PCA()),
    # Named step "classify" ... uses code module LinerSVC
    ('classify', LinearSVC())
])

## Hyper-parameter Tuning 
### is part of the machine learning process

This is the process of evaluating a collection of model hyperparameters when seeking the optimal performance.
We saw a similar technique used on many of the clustering libraries that performed this automatically.

In [None]:
# These are the possible number of features to reduce to
N_FEATURES_OPTIONS = [2, 4, 8]
# Classify C parameter to explore
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
    {
        # Options to instantiate the PCA model
        'reduce_dim': [PCA(iterated_power=7), NMF()],
        
        #####################################
        # Parameters of the estimators in the 
        # pipeline can be accessed using the 
        # <estimator>__<parameter> syntax:
        # So: reduce_dim = <estimator> and it, the PCA, takes a parameter n_components
        'reduce_dim__n_components': N_FEATURES_OPTIONS,
        # So: classify = <estimator> and it, the LinearSVC, takes a parameter C
        'classify__C': C_OPTIONS
    },
    {
        # A second set of tests cases for hyperparameters
        'reduce_dim': [SelectKBest(chi2)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
]
reducer_labels = ['PCA', 'NMF', 'KBest(chi2)']


# Define a search grid (collection of parameters)

In [None]:
grid = GridSearchCV(pipe, cv=3, n_jobs=1, param_grid=param_grid)

## Load the data then fit the Grid

This fitting does an exhaustive search across the hyperparameter space,
each time re-using the pipeline for the movement from raw data to classification results.

In [None]:
digits = load_digits()
grid.fit(digits.data, digits.target)

# Let's see how we did

In [None]:
#  Note the grid has cross-validation results stored in .cv_results_['mean_test_score']
mean_scores = np.array(grid.cv_results_['mean_test_score'])

# scores are in the order of param_grid iteration, which is alphabetical
mean_scores = mean_scores.reshape(len(C_OPTIONS), -1, len(N_FEATURES_OPTIONS))

# select score for best C
mean_scores = mean_scores.max(axis=0)

bar_offsets = (np.arange(len(N_FEATURES_OPTIONS)) *
               (len(reducer_labels) + 1) + .5)



In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

COLORS = 'bgrcmyk'
for i, (label, reducer_scores) in enumerate(zip(reducer_labels, mean_scores)):
    plt.bar(bar_offsets + i, reducer_scores, label=label, color=COLORS[i])

plt.title("Comparing feature reduction techniques")
plt.xlabel('Reduced number of features')
plt.xticks(bar_offsets + len(reducer_labels) / 2, N_FEATURES_OPTIONS)
plt.ylabel('Digit classification accuracy')
plt.ylim((0, 1))
plt.legend(loc='upper left')

We can see from the plot that reducing the digits data to 8 features through 
PCA gets us the best classification performance.

We can now go back and rebuild our model using this knowledge of the 
performance and hyperparameter relationship.

#### Additional Pipelining Examples:
 * [Text Feature Extraction](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model-selection-grid-search-text-feature-extraction-py)
 * [Feature Map Approximation for RBF Kernels](http://scikit-learn.org/stable/auto_examples/plot_kernel_approximation.html#sphx-glr-auto-examples-plot-kernel-approximation-py)
 * [PCA to Logistic Regression](http://scikit-learn.org/stable/auto_examples/plot_digits_pipe.html#sphx-glr-auto-examples-plot-digits-pipe-py)
 
##### Feature Unions
Feature Union is closely related to the pipelining.
  * [Scikit Learn Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html)
  * Read more here: http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html
  

# Save your notebook