Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time series forecasting comment #6

Closed
umbe1987 opened this issue Mar 30, 2023 · 8 comments
Closed

time series forecasting comment #6

umbe1987 opened this issue Mar 30, 2023 · 8 comments

Comments

@umbe1987
Copy link

Hi @manueltilgner

First, thank you for your blogs!

I wanted to leave a comment about the time series forecasting blog at https://www.r-bloggers.com/2019/09/time-series-forecasting-with-random-forest/#google_vignette

X_test <- tax_ts_mbd[nrow(tax_ts_mbd), c(1:lag_order)]

I was wondering whether this should be instead

X_test <- tax_ts_mbd[nrow(tax_ts_mbd), -1]

because otherwise we keep the first column in the X_test, which is the target and should be out. We should only use the lagged versions for predicting. Or maybe I am just wrong :)

Anyway, happy if you could reply, otherwise, again, thanks for the neat tutorial!

@lukasfeick-sw
Copy link
Contributor

Hi @umbe1987!

Thanks a lot for your comment. We double-checked the code snippets and realized that you are absolutely right. This should indeed be X_test <- tax_ts_mbd[nrow(tax_ts_mbd), -1].

We will update the blog next week.

Thanks again for your valuable input. Please feel free to reach out to us if you have any other questions or concerns!

@umbe1987
Copy link
Author

umbe1987 commented Apr 6, 2023

@lukasfeick-sw glad I could help, but again thanks a lot for providing such interesting analyses in the first place!

I really appreciate your reply to my comment, it means you care about the knowledge you are sharing with the public (even if it is something that has already some years 😄)

@umbe1987 umbe1987 closed this as completed Apr 6, 2023
@Dschaykib
Copy link
Collaborator

Dschaykib commented Apr 14, 2023

Hey @umbe1987
We (@lukasfeick-sw and myself) wanted to update the blog, but while doing so, we had another look at the issue and noticed, that X_test <- tax_ts_mbd[nrow(tax_ts_mbd), c(1:lag_order)] was correct all along. Let me explain:

The difficulty lies within the facts, from which data each object is created, and for what goal. The following image shows the time series data and the created lags. The "today" point for this example is the end of 2017 and the forecast is each month in 2018. Therefore the data we need for Jan 2018 are from July 2017 till Dec 2017. These data points correspond to c(1:lag_order) (the green bar in the image).

Slide1

The first column is indeed the target, but all other values are the lagged target as well. If we would take X_test <- tax_ts_mbd[, -1] we be one period off.

In addition, when we loop over the horizon, the following image explains why we take y_train[-1] and X_train[-nrow(X_train),].

Slide2

I hope this clears things up! If you have any questions, let us know.

@umbe1987
Copy link
Author

Hi @Dschaykib

I cannot confirm now since I am typing from my phone but the problem I saw was in the X_test variable, not in X_train. If you use -1 for indexing the columns this means you select all but the first columns (aka all lagged features and not the target). Again, I don't know if this is still an issue, bit I will try to understand if your points are still valid for the test set and not only for the training set.

Thank you btw for coming back to me, really appreciate it ;)

@Dschaykib
Copy link
Collaborator

@umbe1987 my mistake... I had a typo... I meant X_test. I edited my comment above.
(since X_train and X_test are created from the same main data)

@umbe1987
Copy link
Author

Sorry to bother again.

I just wanted to let you know that I wanted to have a check on the code but I have trouble to download the dataset using the indicated link: https://www-genesis.destatis.de/genesis/online/data;sid=324194910261427BD227A6DA6E868E7B.GO_1_1?operation=abruftabelleAbrufen&selectionname=71211-0006&levelindex=0&levelid=1560495247902&index=24

Do you perhaps have an updated link? In my case, the link brings me to a page that says Die aufgerufene Methode ist nicht implementiert!

@Dschaykib
Copy link
Collaborator

Yes, the data was a little hidden, therefore @manueltilgner made an update to the blog (which did not get updated on r-bloggers). You can find the data here:
https://github.com/STATWORX/blog/tree/master/time%20series%20forecasting

@umbe1987
Copy link
Author

Thank you for the link, running the example has definitely cleared up my understanding of the process (and yes, of course you were right about the variable).

I think a few more words could probably be spent to explain better what is happening in the loop (the neat figures you shared in this comment would be a great addition IMHO).

But again, it's a (very nice) post and not a scientific article, and it already helps a ton as it is.

Thank you again! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants