New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time series forecasting comment #6
Comments
Hi @umbe1987! Thanks a lot for your comment. We double-checked the code snippets and realized that you are absolutely right. This should indeed be We will update the blog next week. Thanks again for your valuable input. Please feel free to reach out to us if you have any other questions or concerns! |
@lukasfeick-sw glad I could help, but again thanks a lot for providing such interesting analyses in the first place! I really appreciate your reply to my comment, it means you care about the knowledge you are sharing with the public (even if it is something that has already some years 😄) |
Hey @umbe1987 The difficulty lies within the facts, from which data each object is created, and for what goal. The following image shows the time series data and the created lags. The "today" point for this example is the end of 2017 and the forecast is each month in 2018. Therefore the data we need for Jan 2018 are from July 2017 till Dec 2017. These data points correspond to The first column is indeed the target, but all other values are the lagged target as well. If we would take In addition, when we loop over the horizon, the following image explains why we take I hope this clears things up! If you have any questions, let us know. |
Hi @Dschaykib I cannot confirm now since I am typing from my phone but the problem I saw was in the X_test variable, not in X_train. If you use -1 for indexing the columns this means you select all but the first columns (aka all lagged features and not the target). Again, I don't know if this is still an issue, bit I will try to understand if your points are still valid for the test set and not only for the training set. Thank you btw for coming back to me, really appreciate it ;) |
@umbe1987 my mistake... I had a typo... I meant X_test. I edited my comment above. |
Sorry to bother again. I just wanted to let you know that I wanted to have a check on the code but I have trouble to download the dataset using the indicated link: https://www-genesis.destatis.de/genesis/online/data;sid=324194910261427BD227A6DA6E868E7B.GO_1_1?operation=abruftabelleAbrufen&selectionname=71211-0006&levelindex=0&levelid=1560495247902&index=24 Do you perhaps have an updated link? In my case, the link brings me to a page that says |
Yes, the data was a little hidden, therefore @manueltilgner made an update to the blog (which did not get updated on r-bloggers). You can find the data here: |
Thank you for the link, running the example has definitely cleared up my understanding of the process (and yes, of course you were right about the variable). I think a few more words could probably be spent to explain better what is happening in the loop (the neat figures you shared in this comment would be a great addition IMHO). But again, it's a (very nice) post and not a scientific article, and it already helps a ton as it is. Thank you again! :) |
Hi @manueltilgner
First, thank you for your blogs!
I wanted to leave a comment about the time series forecasting blog at https://www.r-bloggers.com/2019/09/time-series-forecasting-with-random-forest/#google_vignette
I was wondering whether this should be instead
because otherwise we keep the first column in the X_test, which is the target and should be out. We should only use the lagged versions for predicting. Or maybe I am just wrong :)
Anyway, happy if you could reply, otherwise, again, thanks for the neat tutorial!
The text was updated successfully, but these errors were encountered: