# Concatenating multiple feature extraction methods

Adapted from http://scikit-learn.org/stable/auto_examples/feature_stacker.html

In many real-world examples, there are many ways to extract features from a dataset. Often it is beneficial to combine several methods to obtain good performance. This example shows how to use FeatureUnion to combine features obtained by PCA and univariate selection.

Combining features using this transformer has the benefit that it allows cross validation and grid searches over the whole process.

The combination used in this example is not particularly helpful on this dataset and is only used to illustrate the usage of FeatureUnion.

In [1]:
include("preamble.jl")

 in depwarn at deprecated.jl:73
 in vect at abstractarray.jl:38
 in find_in_path at /Users/cedric/.julia/v0.4/Autoreload/src/files.jl:11
 in find_file at /Users/cedric/.julia/v0.4/Autoreload/src/files.jl:40
 in arequire at /Users/cedric/.julia/v0.4/Autoreload/src/Autoreload.jl:50
 in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_string at loading.jl:266
 in execute_request_0x535c5df2 at /Users/cedric/.julia/v0.4/IJulia/src/execute_request.jl:177
 in eventloop at /Users/cedric/.julia/v0.4/IJulia/src/IJulia.jl:141
 in anonymous at task.jl:447
while loading /Users/cedric/Programa/Sklearn/notebooks/preamble.jl, in expression starting on line 5


In [34]:
# Original Python Author: Andreas Mueller <amueller@ais.uni-bonn.de>
#
# License: BSD 3 clause

using Skcore: GridSearchCV
using Skcore: Pipeline
@pyimport2 sklearn.pipeline: FeatureUnion
@pyimport2 sklearn.svm: SVC
@pyimport2 sklearn.datasets: load_iris
@pyimport2 sklearn.decomposition: PCA
@pyimport2 sklearn.feature_selection: SelectKBest

iris = load_iris()

X, y = iris["data"], iris["target"]

# This dataset is way to high-dimensional. Better do PCA:
pca = PCA(n_components=2)

# Maybe some original features where good, too?
selection = SelectKBest(k=1)

# Build estimator from PCA and Univariate selection:

combined_features = Skcore.FeatureUnion([("pca", pca), ("univ_select", selection)])

# Use combined features to transform dataset:
X_features = transform(fit!(combined_features, X, y), X)

svm = SVC(kernel="linear")

# Do grid search over k, n_components and C:

pipeline = Pipeline([("features", combined_features), ("svm", svm)])

param_grid = Dict(:features__pca__n_components=>[1, 2, 3],
                  :features__univ_select__k=>[1, 2],
                  :svm__C=>[0.1, 1, 10])

grid_search = GridSearchCV(pipeline, param_grid; verbose=10, refit=true)
fit!(grid_search, X, y)
print(grid_search.best_estimator_)

  stacklevel=1)


Fitting 3 folds for each of 18 candidates, totalling 54 fits
[CV] features__pca__n_components=1, svm__C=0.1, features__univ_select__k=1
[CV] features__pca__n_components=1, svm__C=0.1, features__univ_select__k=1, score=0.96078  -  0.0s
[CV] features__pca__n_components=1, svm__C=0.1, features__univ_select__k=1
[CV] features__pca__n_components=1, svm__C=0.1, features__univ_select__k=1, score=0.90196  -  0.0s
[CV] features__pca__n_components=1, svm__C=0.1, features__univ_select__k=1
[CV] features__pca__n_components=1, svm__C=0.1, features__univ_select__k=1, score=0.97917  -  0.0s
[CV] features__pca__n_components=1, svm__C=1.0, features__univ_select__k=1
[CV] features__pca__n_components=1, svm__C=1.0, features__univ_select__k=1, score=0.94118  -  0.0s
[CV] features__pca__n_components=1, svm__C=1.0, features__univ_select__k=1
[CV] features__pca__n_components=1, svm__C=1.0, features__univ_select__k=1, score=0.92157  -  0.0s
[CV] features__pca__n_components=1, svm__C=1.0, features__univ_select