Shape of target #1009

scsv2504 · 2023-03-24T15:47:41Z

scsv2504
Mar 24, 2023

I am quite new to machine learning and i have trouble to understand how to use tsfresh properly. I just can't understand how to transform my data properly to favor from the package.

I have data where the input variable is the temperature and the output variable is the power demand of a heat pump (regression problem). I only have data for 1 year (365 data points).

I came across tsfresh and tried to extract features relevant features. I read the documentation about the data format so i added two columns:

column_id, where i divided the dataset into 52 smaller groups (i decided to assign a column id for each week, so i deleted one data point that 364/7=52) -> is this right?
column_sort, which is an index from 1 to 364 for each timestep

After doing so, i have used roll_time_series(X, column_id="column_id", column_sort="column_sort", max_timeshift=7) to create a rolled DataFrame. This returns me a dataframe of shape: (1456,4)

and used extract_features(df_rolled, column_id="column_id", column_sort="column_sort", column_value="temperature") to extract the features. This returns me a dataframe of shape: (52, 783)

Now, i want to select the relevant features. For this, i need the target y as an argument. I tried to use make_forecasting_frame() to convert my target into a proper shape but i just can't understand how to use it properly.

Can someone try to explain this in layman terms? Or even in general, how does the target have to look like when extracting features and how do i create the final dataset to train a model?

Do i have to merge the extract_features dataset back with the origin input matrix? So that the shape of X is (364, n_features) and y is (364,1)?

nils-braun · 2023-04-01T14:41:30Z

nils-braun
Apr 1, 2023
Maintainer

First of all, hello to the community!

I guess you have seen our tutorial/notebook on time series forecasting? https://github.com/blue-yonder/tsfresh/blob/main/notebooks/05%20Timeseries%20Forecasting.ipynb

And I guess you have also seen our documentation on forecasting? https://tsfresh.readthedocs.io/en/latest/text/forecasting.html

What you explained sounds reasonable.

In principle, you do not even need to split up your data into different weeks (of course you can if it makes sense to you). You can also use the rolling function just with a single ID and control the rolling behavior (e.g. to only use the data of max 7 days) with the arguments.

The notebook above is also using a time series with just a single ID (in this case it is the apple stock) and it is doing the prediction without the make forecasting frame function. The trick is, that you roll both features and target at the same time and then shift the target by one (this is the same as done by the make forecasting function). If you follow this notebook, you should be able to just plug in your own data. I would recommend trying the notebook locally so that you can see the outputs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape of target #1009

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Shape of target #1009

scsv2504 Mar 24, 2023

Replies: 1 comment

nils-braun Apr 1, 2023 Maintainer

scsv2504
Mar 24, 2023

nils-braun
Apr 1, 2023
Maintainer