Using feature selector as part of sklearn pipeline #960

jtlz2 · 2022-08-03T20:18:00Z

Discussed in #959

^{Originally posted by jtlz2 August 3, 2022}
Awesome package, thanks!

I'm trying to use the feature-selector transformer within a sklearn pipeline but keep getting errors like

AssertionError: The index of X and y need to be the same

Now, this raises a few questions for me:

Although https://github.com/blue-yonder/tsfresh/blob/main/notebooks/examples/02%20sklearn%20Pipeline.ipynb mentions the feature augmenter, it does not give the exact syntax for using the feature selector in a pipeline step. What is the syntax precisely?
Presumably feature selection should only be carried out on the CV dataset - how do I ensure this in the context of a sklearn pipeline?
This raises another spectre - I want to perform (tsfresh's) feature selection on tsfresh features, while simultaneously fitting non-tsfresh-derived features. Is this even possible and if so how can we make it work?

Thanks again!

The text was updated successfully, but these errors were encountered:

kempa-liehr · 2022-11-17T22:32:12Z

Hi @jtlz2,
Thanks for pointing out that we are missing an example on how to use FeaturesSelector() in an sklearn pipeline.

Let's assume that you already have extracted the time-series features using the extract_features() function. You can join the DataFrame with time-series features with another feature matrix, if both have the same index.

The sklearn pipeline can be built as follows:

from sklearn.ensemble import RandomForestClassifier
sklearn.model_selection import cross_val_score
from sklearn.pipeline import make_pipeline
from tsfresh.transformers import FeatureSelector

clf = make_pipeline(FeatureSelector(),
                     RandomForestClassifier())
cross_val_score(clf, X, y)

Then, you can fit your model clf.fit() or use clf with the tools provided in sklearn.model_selection.

kempa-liehr added the enhancement label Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using feature selector as part of sklearn pipeline #960

Using feature selector as part of sklearn pipeline #960

jtlz2 commented Aug 3, 2022 •

edited

kempa-liehr commented Nov 17, 2022

Using feature selector as part of sklearn pipeline #960

Using feature selector as part of sklearn pipeline #960

Comments

jtlz2 commented Aug 3, 2022 • edited

Discussed in #959

kempa-liehr commented Nov 17, 2022

jtlz2 commented Aug 3, 2022 •

edited