Skip to content

Fix ARIMA not accounting for gap in prediction from end of training data #3884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 13, 2022

Conversation

jeremyliweishih
Copy link
Collaborator

Fixes #3853.

@jeremyliweishih jeremyliweishih changed the title Fix FH not account for gap in prediction from end of training data Fix ARIMA not accounting for gap in prediction from end of training data Dec 12, 2022
@codecov
Copy link

codecov bot commented Dec 12, 2022

Codecov Report

Merging #3884 (96203a4) into main (408eb9b) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3884     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        346     346             
  Lines      36304   36325     +21     
=======================================
+ Hits       36167   36188     +21     
  Misses       137     137             
Impacted Files Coverage Δ
...omponents/estimators/regressors/arima_regressor.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_arima_regressor.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review December 12, 2022 09:10
@@ -142,7 +142,25 @@ def _match_indices(self, X, y):
def _set_forecast(self, X):
from sktime.forecasting.base import ForecastingHorizon

fh_ = ForecastingHorizon([i + 1 for i in range(len(X))], is_relative=True)
# we can only calculate the difference if the indices are of the same type
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set this as a fallback to previous behavior in case we ever receive inconsistent indices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the index type being different indicative of something?

Copy link
Collaborator Author

@jeremyliweishih jeremyliweishih Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only be the result of inconsistent user behavior and this is a safeguard against that!

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, just had a few questions! Thanks for taking care of this.

Comment on lines +154 to +156
units_diff = len(dates_diff) - 1
fh_ = ForecastingHorizon(
[units_diff + i for i in range(len(X))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we needed an off-by-one offset in the non-gap case. Should that still be the case here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the + 1 since range(len(X)) starts at 0. In the gap case units_diff should always be > 0 so we don't need it!

@@ -139,6 +140,7 @@ def test_set_forecast(ts_data):
)

clf = ARIMARegressor()
clf.last_X_index = X.index[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never ran fit in this test so last_X_index wasn't being set!



@pytest.mark.parametrize("use_covariates", [True, False])
def test_arima_regressor_can_forecast_arbitrary_dates(use_covariates, ts_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful here to highlight how this is arbitrary dates in a comment or something - how long is X_test here, so how big of a gap is it between the training data and what we're asking to predict on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add!

Comment on lines +421 to +423
assert (
arima.predict(X_test).tail(5).tolist() == arima.predict(X_test_last_5).tolist()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail when run on current main?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small things and questions. Little nit to approve.

@@ -142,7 +142,25 @@ def _match_indices(self, X, y):
def _set_forecast(self, X):
from sktime.forecasting.base import ForecastingHorizon

fh_ = ForecastingHorizon([i + 1 for i in range(len(X))], is_relative=True)
# we can only calculate the difference if the indices are of the same type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the index type being different indicative of something?

@jeremyliweishih jeremyliweishih merged commit 9f08237 into main Dec 13, 2022
@jeremyliweishih jeremyliweishih deleted the js_3853_arima branch December 13, 2022 16:38
@christopherbunn christopherbunn mentioned this pull request Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARIMARegressor does not support predictions of input with a gap off of training data
3 participants