* **Pipeline** is chain of estimators and transformers </br>
* **FeatureUnion** combines several transformer objects into a new transformer that combines their output.
* FeatureUnion and Pipeline can be combined to create complex models.


**Refernces**
1. https://www.youtube.com/watch?v=Om_TFrFGotQ
2. https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html
3. https://michelleful.github.io/code-blog/2015/06/20/pipelines/
4. https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html
5. https://github.com/knathanieltucker/bit-of-data-science-and-scikit-learn/blob/master/notebooks/PipelinesAndFeatureUnions.ipynb

In [62]:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.pipeline import FeatureUnion
from sklearn.decomposition import KernelPCA

In [63]:
## data and train test split
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Make pipeline
* Same as normal pipeline but no need to name the estimators default names as are given to the estimators

In [64]:
## make a pipline of PCA and SVM classifier
pipe = make_pipeline(PCA() ,SVC() )

In [65]:
pipe.fit(X_train , y_train)

Pipeline(memory=None,
         steps=[('pca',
                 PCA(copy=True, iterated_power='auto', n_components=None,
                     random_state=None, svd_solver='auto', tol=0.0,
                     whiten=False)),
                ('svc',
                 SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None,
                     coef0=0.0, decision_function_shape='ovr', degree=3,
                     gamma='scale', kernel='rbf', max_iter=-1,
                     probability=False, random_state=None, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False)

In [66]:
acc = pipe.score(X_test , y_test)
print("Accuarcy for PCA+SVC Pipeline : ",acc)

Accuarcy for PCA+SVC Pipeline :  0.9777777777777777


# Feature Union
* This estimator applies a list of transformer objects in parallel to the input data, then concatenates the results. (**union** of results)
* This is useful to combine **several feature extraction mechanisms into a single transformer.**
* During fitting, each of these trasnformers is fit to the data independently.


In [67]:
## Combining PCA and Kernel PCA into a single transformer/estimator object
transf = [('linear_PCA' , PCA()) , ('kernel_PCA', KernelPCA())]
new_estimator  = FeatureUnion(transf)
new_estimator


FeatureUnion(n_jobs=None,
             transformer_list=[('linear_PCA',
                                PCA(copy=True, iterated_power='auto',
                                    n_components=None, random_state=None,
                                    svd_solver='auto', tol=0.0, whiten=False)),
                               ('kernel_PCA',
                                KernelPCA(alpha=1.0, coef0=1, copy_X=True,
                                          degree=3, eigen_solver='auto',
                                          fit_inverse_transform=False,
                                          gamma=None, kernel='linear',
                                          kernel_params=None, max_iter=None,
                                          n_components=None, n_jobs=None,
                                          random_state=None,
                                          remove_zero_eig=False, tol=0))],
             transformer_weights=None, verbose=False)

In [68]:
## setting params
params = {'linear_PCA__n_components' : 3  , 'kernel_PCA__kernel' : 'rbf' ,  'kernel_PCA__gamma' : 1 }
new_estimator.set_params(**params)

FeatureUnion(n_jobs=None,
             transformer_list=[('linear_PCA',
                                PCA(copy=True, iterated_power='auto',
                                    n_components=3, random_state=None,
                                    svd_solver='auto', tol=0.0, whiten=False)),
                               ('kernel_PCA',
                                KernelPCA(alpha=1.0, coef0=1, copy_X=True,
                                          degree=3, eigen_solver='auto',
                                          fit_inverse_transform=False, gamma=1,
                                          kernel='rbf', kernel_params=None,
                                          max_iter=None, n_components=None,
                                          n_jobs=None, random_state=None,
                                          remove_zero_eig=False, tol=0))],
             transformer_weights=None, verbose=False)

In [69]:
new_estimator.fit_transform(X).shape

(150, 151)