Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for saving and loading of feature extraction calculators #22

Closed
guerrajorge opened this issue Oct 30, 2016 · 3 comments

Comments

@guerrajorge
Copy link

Hello Max and the other contributors,

Great work on this library, It works really well.

Is it possible to save and load extraction calculators that were used in a dataset? For instance, let's say that I have a dataset and I run extract_features and then select_features. As of result, some features are being removed and the standing features were calculated based on a specific calculator. Is there a way to get those calculators?

The reasoning of this option is that I need to run future dataset with the same calculators since my model is being trained on them.

Let me know if this is possible or if am I missing something?

Thank you.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Oct 30, 2016

Hola Jorge!

yes, indeed it is possible to save and load the extraction calculators that were used on a dataset. Actually we spend a lot time thinking about how we make that possible.

The solution we came up with is based on the from_columns() method of the FeatureExtractionSettings class. This method deduces the feature calculators and their parameters from the feature names in the calculated time series feature matrix.
It is important to not calculate features that you will drop late as some of the feature calculators have long runtimes.

For you as a user there are two options on how save which features were calculated:

  1. You use our sklean compatible RelevantFeatureAugmenter transformer objects, see the documentation at http://tsfresh.readthedocs.io/en/latest/text/sklearn_transformers.html
  2. You start an extraction run that is limited to the old features by means of FeatureExtractionSettings object as shown in the following snippet:
from tsfresh.feature_extraction.settings import FeatureExtractionSettings
from tsfresh import extract_features

# the inantiated Settings object will calculate all features
settings_new = FeatureExtractionSettings()
# Now we assume that X_old is the result of an earlier filtered extraction run
# We set our new settings object to only calculate the features from X_old
setting_new = setting_new.from_columns(X_old.columns):
# Now we only calculate the features that were contained in X_old
X_new = extract_features(df_new, settings_new)

@MaxBenChrist MaxBenChrist changed the title Save and load feature extraction calculators Improve documentation for saving and loading of feature extraction calculators Oct 30, 2016
@Huandao0812
Copy link

Huandao0812 commented Nov 2, 2016

Hi Max, I tried to do the same by your example, but my X_new has different number of features than my X_old, my code is here https://github.com/Huandao0812/lstm_exp/blob/master/test_tsfresh.py#L46
can you have a quick look

@Huandao0812
Copy link

update: I check the diff of 2 set of columns and this is the difference:
the X_new has 2 more columns than the X_old
diff columns = set(['feature__cwt_coefficients__widths_(2, 5, 10, 20)_coeff_13__w_20', 'feature__cwt_coefficients__widths(2, 5, 10, 20)__coeff_3__w_5'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants