Replies: 1 comment
-
First of all, sorry for the massive delay in answering this!
Thank is correct! What you can do however, is the following: chunk up your data into smaller partitions (e.g. with dask or spark) and then calculate features on them. Maybe choose overlapping chunks (although you probably do not want to do a full rolling). Additionally, downsample the time series and extract features on the downsampled version. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the great project!
My question is about using
tsfresh
in case when an individual timeseries is very large.I've glanced over the documentation (particularly Large Input Data), it says that in case of big data the input is divided into chunks which then are distributed over the cluster, where minimal unit of each chunk is an individual timeseries (i.e. two chunks can't have data from one timeseries, unless the user manually handles this case). Hence, my understanding is, the whole framework assumes that the individual unit of user's input data should still be castable to a single-machine-memory-pandas-dataframe. Looking at
feature_calculators
also seems to support this, since all of the functions operate on np arrays.But what do we do if I want to calculate a feature (e.g.
stddev
) on a single timeseries which does not fit into memory? Although the framework claims Dask support, feature calculators still operate on np arrays only, not on Dask collections/dataframes. Does it mean that the user has to manually chunk the input? This means that we also need to handle the reduce step to gather chunk calculation result into an output feature value, which adds a whole layer of ambiguity (e.g. how do I getstddev
of a large ts from 10 values ofstddevs
from 10 smaller ts). In that case, is there an idea how to handle this reduction step for all of the calculators so we can scale the whole framework for the large data? Or it should be done on calculator level (e.g. reductor per calculator), individually for each feature?Same question goes regarding rolling ts, is it possible to roll larger-than-memory ts with the framework?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions