Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is pmdarima predict function missing start and end parameters like the underlying statsmodels arima module. #141

Closed
poojithaamin opened this issue May 9, 2019 · 8 comments
Labels
feature request A tag for feature requests wontfix

Comments

@poojithaamin
Copy link

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

@poojithaamin
Copy link
Author

This is what I was talking about-->
https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima_model.ARIMA.predict.html#statsmodels.tsa.arima_model.ARIMA.predict
I want to be able to specify an end date upto which the predictions needs to be made, instead of number of periods. Do you plan to include it in future versions?

@tgsmith61591
Copy link
Member

Short answer: pmdarima is not statsmodels.

Longer answer: predict is pmdarima's forecast equivalent of statsmodels. Specifying a starting point for forecasts doesn't make much sense; if you want a forecast 5 periods in the future, that's easily achievable in the current state. If you want in-sample predictions, that's also achievable. I don't understand what specifying a start/end point is going to get you as far as utility that the package doesn't already address. Therefore, I don't intend to add that functionality.

Do you have a good, specific reason why this is necessary? And if so, can you demonstrate it with a repeatable example?

@poojithaamin
Copy link
Author

Thank you for your reply.
Suppose I have an ARIMA model, fit with data till Feb 2018 and I have a requirement to predict monthly data till Dec 2018. In a scenario where I am just provided with the model without prior knowledge of the last index of the fit, meaning, I am not aware of the last month of the model fit, and I want to make forecasts till Dec 2018. I would not be able to estimate the number of periods in this case. Having an end_date as a parameter would be useful here. I hope I was able to explain the case.

@tgsmith61591
Copy link
Member

tgsmith61591 commented May 9, 2019

I understand that need, but date logic is something that semantically probably shouldn't live within a mathematical library, especially with nuances like timezones, daylight savings, etc. Make sense?

Best advice would be to have a utility function that computes the number of periods forward from today that you need to estimate, and just calculate that number of periods forward.

Furthermore, we convert all timeseries arrays to numpy arrays as internal representations (because we use a lot of Cython internally) so any date information in an index would be lost.

Finally, we tend to support the philosophy that a project should address its scope very well rather than trying to solve all possible permutations of a problem. Given this type of issue can be so domain specific, we made an early decision not to deal with dates and to handle everything with slicing. We feel, in the long-term, this gives everyone more flexibility since they can pre- and post-process as needed, and aren't at the mercy of a black box.

I'll leave this issue open for a while and if there is enough interest, I may change my stance. But keep in mind, PRs are always welcome.

@tgsmith61591
Copy link
Member

Also keep in mind the update function exists to continually maintain your model. Properly maintained, you should always have a good estimate of when the last observed values occurred.

@poojithaamin
Copy link
Author

In that case, I'll look at the possibility of saving the last date of the training data as metadata in our system and use that to calculate n_periods while forecasting.
Thanks once again for your time.

@tgsmith61591
Copy link
Member

tgsmith61591 commented May 9, 2019

Date may not even be part of the metadata. We accept all forms of 1d arrays (tuples, lists, series, anything expressable as 1d). I take the stance that the responsibility of model documentation falls on the developer.

(Sorry I misread your comment)

That said, you can access the original endogenous array of a fitted arima:

your_model.arima_res_.model.data.endog

That indirectly solves your problem

@poojithaamin
Copy link
Author

yeah.. like you said, model.arima_res_.model.data.endog, does not give the index of the data, which is date, even if I fit the model with a series having date as index.

@tgsmith61591 tgsmith61591 added feature request A tag for feature requests wontfix labels May 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A tag for feature requests wontfix
Projects
None yet
Development

No branches or pull requests

2 participants