time series forecasting comment #6

umbe1987 · 2023-03-30T09:05:58Z

First, thank you for your blogs!

I wanted to leave a comment about the time series forecasting blog at https://www.r-bloggers.com/2019/09/time-series-forecasting-with-random-forest/#google_vignette

X_test <- tax_ts_mbd[nrow(tax_ts_mbd), c(1:lag_order)]

I was wondering whether this should be instead

X_test <- tax_ts_mbd[nrow(tax_ts_mbd), -1]

because otherwise we keep the first column in the X_test, which is the target and should be out. We should only use the lagged versions for predicting. Or maybe I am just wrong :)

Anyway, happy if you could reply, otherwise, again, thanks for the neat tutorial!

The text was updated successfully, but these errors were encountered:

lukasfeick-sw · 2023-04-06T14:07:42Z

Hi @umbe1987!

Thanks a lot for your comment. We double-checked the code snippets and realized that you are absolutely right. This should indeed be X_test <- tax_ts_mbd[nrow(tax_ts_mbd), -1].

We will update the blog next week.

Thanks again for your valuable input. Please feel free to reach out to us if you have any other questions or concerns!

umbe1987 · 2023-04-06T15:40:46Z

@lukasfeick-sw glad I could help, but again thanks a lot for providing such interesting analyses in the first place!

I really appreciate your reply to my comment, it means you care about the knowledge you are sharing with the public (even if it is something that has already some years 😄)

Dschaykib · 2023-04-14T19:21:58Z

Hey @umbe1987
We (@lukasfeick-sw and myself) wanted to update the blog, but while doing so, we had another look at the issue and noticed, that X_test <- tax_ts_mbd[nrow(tax_ts_mbd), c(1:lag_order)] was correct all along. Let me explain:

The difficulty lies within the facts, from which data each object is created, and for what goal. The following image shows the time series data and the created lags. The "today" point for this example is the end of 2017 and the forecast is each month in 2018. Therefore the data we need for Jan 2018 are from July 2017 till Dec 2017. These data points correspond to c(1:lag_order) (the green bar in the image).

The first column is indeed the target, but all other values are the lagged target as well. If we would take X_test <- tax_ts_mbd[, -1] we be one period off.

In addition, when we loop over the horizon, the following image explains why we take y_train[-1] and X_train[-nrow(X_train),].

I hope this clears things up! If you have any questions, let us know.

umbe1987 · 2023-04-15T10:02:46Z

Hi @Dschaykib

I cannot confirm now since I am typing from my phone but the problem I saw was in the X_test variable, not in X_train. If you use -1 for indexing the columns this means you select all but the first columns (aka all lagged features and not the target). Again, I don't know if this is still an issue, bit I will try to understand if your points are still valid for the test set and not only for the training set.

Thank you btw for coming back to me, really appreciate it ;)

Dschaykib · 2023-04-15T11:00:54Z

@umbe1987 my mistake... I had a typo... I meant X_test. I edited my comment above.
(since X_train and X_test are created from the same main data)

umbe1987 · 2023-04-17T12:47:45Z

Sorry to bother again.

I just wanted to let you know that I wanted to have a check on the code but I have trouble to download the dataset using the indicated link: https://www-genesis.destatis.de/genesis/online/data;sid=324194910261427BD227A6DA6E868E7B.GO_1_1?operation=abruftabelleAbrufen&selectionname=71211-0006&levelindex=0&levelid=1560495247902&index=24

Do you perhaps have an updated link? In my case, the link brings me to a page that says Die aufgerufene Methode ist nicht implementiert!

Dschaykib · 2023-04-17T12:51:33Z

Yes, the data was a little hidden, therefore @manueltilgner made an update to the blog (which did not get updated on r-bloggers). You can find the data here:
https://github.com/STATWORX/blog/tree/master/time%20series%20forecasting

umbe1987 · 2023-04-17T13:50:28Z

Thank you for the link, running the example has definitely cleared up my understanding of the process (and yes, of course you were right about the variable).

I think a few more words could probably be spent to explain better what is happening in the loop (the neat figures you shared in this comment would be a great addition IMHO).

But again, it's a (very nice) post and not a scientific article, and it already helps a ton as it is.

Thank you again! :)

umbe1987 closed this as completed Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time series forecasting comment #6

time series forecasting comment #6

umbe1987 commented Mar 30, 2023

lukasfeick-sw commented Apr 6, 2023

umbe1987 commented Apr 6, 2023

Dschaykib commented Apr 14, 2023 •

edited

umbe1987 commented Apr 15, 2023

Dschaykib commented Apr 15, 2023

umbe1987 commented Apr 17, 2023

Dschaykib commented Apr 17, 2023

umbe1987 commented Apr 17, 2023

time series forecasting comment #6

time series forecasting comment #6

Comments

umbe1987 commented Mar 30, 2023

lukasfeick-sw commented Apr 6, 2023

umbe1987 commented Apr 6, 2023

Dschaykib commented Apr 14, 2023 • edited

umbe1987 commented Apr 15, 2023

Dschaykib commented Apr 15, 2023

umbe1987 commented Apr 17, 2023

Dschaykib commented Apr 17, 2023

umbe1987 commented Apr 17, 2023

Dschaykib commented Apr 14, 2023 •

edited