Clarification in extract_features method #1021

Bling16 · 2023-05-12T12:36:25Z

Bling16
May 12, 2023

I have a dataset consisting of 4 million records. I want to carry out extract_features function method on the entire dataset with Dask, but I am facing memory issues. I now want to ask

Case 1 : Carrying out feature extraction for the entire 4 million records
Case 2 : Making 4 chunks of the data (of 1 million records each) and carrying out feature extraction for individual chunks.

Would the values of feature extraction be varied in both cases?

nils-braun · 2023-05-16T21:26:16Z

nils-braun
May 16, 2023
Maintainer

If the full dataset consists of just a single long time series with one ID, the features will be different. But please note that in this case also using dask will nor help (it can only parallelize over different IDs).
If the dataset consists of different IDs (multiple smaller time series) and you split aligned, the features will be the same.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification in extract_features method #1021

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Clarification in extract_features method #1021

Bling16 May 12, 2023

Replies: 1 comment

nils-braun May 16, 2023 Maintainer

Bling16
May 12, 2023

nils-braun
May 16, 2023
Maintainer