Support relative forecasting for ARIMA#2613
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2613 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 297 297
Lines 27027 27039 +12
=======================================
+ Hits 26983 26995 +12
Misses 44 44
Continue to review full report at Codecov.
|
| [i + 1 for i in range(len(y[15:]))], is_relative=True | ||
| ) | ||
|
|
||
| a_clf = sktime_arima.AutoARIMA(start_p=2, start_q=2, max_p=2, max_q=2) |
There was a problem hiding this comment.
These parameters reduce this test run time from 1m 14s to 20s
There was a problem hiding this comment.
Do you think it would be the same coverage to just reduce the freq_str to maybe half or so? I appreciate the thoroughness, but I feel like maybe we can trust sklearn to handle uniformity between the different time periods.
There was a problem hiding this comment.
Sure I can limit it to Seconds, Months, and Years mainly to support sub-minute intervals and because months and years have a varying number of days, it might be useful to stay up to date with any changes they make in sktime.
chukarsten
left a comment
There was a problem hiding this comment.
Nice job, Parthiv! I'm not 100% sure whether we only want only relative forecasting or we want to enable both relative and absolute. If the answer is the former, then I think we're good to go. If the latter, we might need to rework this a little bit.
| [i + 1 for i in range(len(y[15:]))], is_relative=True | ||
| ) | ||
|
|
||
| a_clf = sktime_arima.AutoARIMA(start_p=2, start_q=2, max_p=2, max_q=2) |
| [i + 1 for i in range(len(y[15:]))], is_relative=True | ||
| ) | ||
|
|
||
| a_clf = sktime_arima.AutoARIMA(start_p=2, start_q=2, max_p=2, max_q=2) |
There was a problem hiding this comment.
Do you think it would be the same coverage to just reduce the freq_str to maybe half or so? I appreciate the thoroughness, but I feel like maybe we can trust sklearn to handle uniformity between the different time periods.
| fh_ = forecasting_.ForecastingHorizon( | ||
| [i + 1 for i in range(len(dates))], is_relative=True | ||
| ) |
There was a problem hiding this comment.
So, dumb question, it seems that right now relative forecasting is all we support with this change. Is there any benefit to changing the api by having is_relative be assigned during the regressor's creation? I'm not super familiar, so maybe we only want relative forecasting, but it seems to the luddite like me that "both" might be desirable.
There was a problem hiding this comment.
Not dumb at all and this is certainly something we can discuss further. Based on the time series conversations that have been going on, we want to make sure that however we implement this, it should work well with the moving window strategy we want to leverage in Rolling Origin Cross Validation. I think especially considering gap and forecast_horizon, we'd want to look a certain number of steps ahead when making our predictions. Having relative forecasting would make this easier since we can determine how far ahead we want to look, and what those indices should be.
bchen1116
left a comment
There was a problem hiding this comment.
Left a question for my own understanding, but rest of it looks good!
| ) | ||
| fh_ = forecasting_.ForecastingHorizon(dates, is_relative=False) | ||
| fh_ = forecasting_.ForecastingHorizon( | ||
| [i + 1 for i in range(len(dates))], is_relative=True |
There was a problem hiding this comment.
Why are we forecasting on this array of ints rather than the dates? Do the dates become irrelevant when we use is_relative?
There was a problem hiding this comment.
@bchen1116 Correct, the ints represent how many "periods" after relative to the end of the training dates we want to predict for.
This reverts commit ce1830f.
Fixes: #2592
The original issue was filed under the assumption that
sktimewould not be able to support certain time units, such as those that had a frequency other than 1, such as 2 days, 4 minutes, 3 months, etc. This manifests when exogenous variables (features) are included infitandpredict. This problem doesn't seem to persist when theis_relativeparameter inForecastingHorizonis set toTrue. By setting this toTrue, predictions will be made on a set of absolute indices starting from after the last date used when fitting the estimator, as opposed to being made on specific dates.With the flag is set to
FalseThis does not work when features are passed
With the flag is set to
TrueThis works even when features are passed
More information can be found here.