Concatenating
Instruction 1: Provide an example of concatenating multiple feature extraction methods using
your dataset.
In many real-world examples, there are many ways to extract features from a dataset. Often it is
beneficial to combine several methods to obtain good performance. This example shows how to
use FeatureUnion to combine features obtained by PCA and univariate selection. Combining
features using this transformer has the benefit that it allows cross validation and grid searches
over the whole process. The combination used in this example is not particularly helpful on this
dataset and is only used to illustrate the usage of FeatureUnion.

In [1]:

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler

In [2]:
data = load_diabetes()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

pca = PCA(n_components=2)

selection = SelectKBest(k=1)

combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

X_features = combined_features.fit(X_train, y_train).transform(X_train)

svm = SVR()

pipeline = Pipeline([("features", combined_features), ("svm", svm)])

param_grid = dict(features__pca__n_components=[1, 2, 3],
                  features__univ_select__k=[1, 2],
                  svm__C=[0.1, 1, 10],
                  svm__epsilon=[0.1, 0.2, 0.3])

grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10, n_jobs=-1)
grid_search.fit(X_train, y_train)
print(grid_search.best_estimator_)

Fitting 5 folds for each of 54 candidates, totalling 270 fits
Pipeline(steps=[('features',
                 FeatureUnion(transformer_list=[('pca', PCA(n_components=3)),
                                                ('univ_select',
                                                 SelectKBest(k=2))])),
                ('svm', SVR(C=10, epsilon=0.3))])
