Improve documentation for saving and loading of feature extraction calculators #22

guerrajorge · 2016-10-30T19:51:26Z

Hello Max and the other contributors,

Great work on this library, It works really well.

Is it possible to save and load extraction calculators that were used in a dataset? For instance, let's say that I have a dataset and I run extract_features and then select_features. As of result, some features are being removed and the standing features were calculated based on a specific calculator. Is there a way to get those calculators?

The reasoning of this option is that I need to run future dataset with the same calculators since my model is being trained on them.

Let me know if this is possible or if am I missing something?

Thank you.

MaxBenChrist · 2016-10-30T20:28:17Z

Hola Jorge!

yes, indeed it is possible to save and load the extraction calculators that were used on a dataset. Actually we spend a lot time thinking about how we make that possible.

The solution we came up with is based on the from_columns() method of the FeatureExtractionSettings class. This method deduces the feature calculators and their parameters from the feature names in the calculated time series feature matrix.
It is important to not calculate features that you will drop late as some of the feature calculators have long runtimes.

For you as a user there are two options on how save which features were calculated:

You use our sklean compatible RelevantFeatureAugmenter transformer objects, see the documentation at http://tsfresh.readthedocs.io/en/latest/text/sklearn_transformers.html
You start an extraction run that is limited to the old features by means of FeatureExtractionSettings object as shown in the following snippet:

from tsfresh.feature_extraction.settings import FeatureExtractionSettings
from tsfresh import extract_features

# the inantiated Settings object will calculate all features
settings_new = FeatureExtractionSettings()
# Now we assume that X_old is the result of an earlier filtered extraction run
# We set our new settings object to only calculate the features from X_old
setting_new = setting_new.from_columns(X_old.columns):
# Now we only calculate the features that were contained in X_old
X_new = extract_features(df_new, settings_new)

Huandao0812 · 2016-11-02T04:32:26Z

Hi Max, I tried to do the same by your example, but my X_new has different number of features than my X_old, my code is here https://github.com/Huandao0812/lstm_exp/blob/master/test_tsfresh.py#L46
can you have a quick look

Huandao0812 · 2016-11-02T05:04:30Z

update: I check the diff of 2 set of columns and this is the difference:
the X_new has 2 more columns than the X_old
diff columns = set(['feature__cwt_coefficients__widths_(2, 5, 10, 20)_coeff_13__w_20', 'feature__cwt_coefficients__widths(2, 5, 10, 20)__coeff_3__w_5'])

MaxBenChrist changed the title ~~Save and load feature extraction calculators~~ Improve documentation for saving and loading of feature extraction calculators Oct 30, 2016

MaxBenChrist added the enhancement label Oct 30, 2016

MaxBenChrist mentioned this issue Nov 2, 2016

extraction and filtering not equal to filtered extraction #29

Closed

MaxBenChrist closed this as completed Nov 7, 2016

devautor mentioned this issue Apr 1, 2017

settings_new.from_columns not sticking strictly to given feature names #183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation for saving and loading of feature extraction calculators #22

Improve documentation for saving and loading of feature extraction calculators #22

guerrajorge commented Oct 30, 2016

MaxBenChrist commented Oct 30, 2016 •

edited

Loading

Huandao0812 commented Nov 2, 2016 •

edited

Loading

Huandao0812 commented Nov 2, 2016

Improve documentation for saving and loading of feature extraction calculators #22

Improve documentation for saving and loading of feature extraction calculators #22

Comments

guerrajorge commented Oct 30, 2016

MaxBenChrist commented Oct 30, 2016 • edited Loading

Huandao0812 commented Nov 2, 2016 • edited Loading

Huandao0812 commented Nov 2, 2016

MaxBenChrist commented Oct 30, 2016 •

edited

Loading

Huandao0812 commented Nov 2, 2016 •

edited

Loading