-
Notifications
You must be signed in to change notification settings - Fork 92
Drop ARIMA when month_start freq is detected
#2632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2632 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 297 297
Lines 27037 27044 +7
=======================================
+ Hits 26993 27000 +7
Misses 44 44
Continue to review full report at Codecov.
|
|
|
||
| @pytest.mark.parametrize("freq_num", ["1", "2"]) | ||
| @pytest.mark.parametrize("freq_str", ["S", "T", "H", "D", "M", "Y"]) | ||
| @pytest.mark.parametrize("freq_str", ["T", "M", "Y"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaking this in here because it didn't go through in my last ARIMA PR (just meant to save time from 20s to 10s)
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just left a question about one of the tests
| ) | ||
|
|
||
| _, y = ts_data | ||
| X = pd.DataFrame(pd.date_range("2020-10-01", "2020-10-31"), columns=["Date"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we need to add this to the test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I had added it because the code I included in AutoMLSearch would throw an error if the data didn't have a date_index column specified, which X_y_binary didn't have. Decided to just clean up the test and replace the data with what I added, thanks!
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh When I run this repro for Arima with MS frequency on main I get a value error but it's coming from our own arima implementation.
from evalml.pipelines.components import ARIMARegressor
from evalml.demos import load_diabetes
import pandas as pd
X, y = load_diabetes()
X.ww["Date"] = pd.Series(pd.date_range(start="1/1/2018", periods=X.shape[0], freq="MS"))
arima = ARIMARegressor(date_index="Date")
arima.fit(X, y)
When I delete the line that sets the frequency to M for MS, i.e. freq = pd.infer_freq(dates), the above repro doesn't throw an error anymore and the arima is able to fit:
Unit tests also pass. So I'm wondering whether we need to make the changes to AutoMLSearch?
|
@freddyaboulton If that line is changed to |
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ParthivNaresh !
| allowed_estimators = get_estimators( | ||
| self.problem_type, self.allowed_model_families | ||
| ) | ||
| if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence as to whether this should live here or in get_estimators. I'm ok keeping this here. Is this a bug that sktime can fix in the future or is it a limitation of the MS frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I chose to put it here instead of get_estimators was to avoid passing problem_configuration to get_estimators just for this check
It's a limitation with MS unfortunately, due to the unreliability of forecasting over intervals that randomly change in values (different days in a month). This doesn't seem to be an issue with month end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's not change the api of get_estimators yet! Ok, I think there will be other frequencies with that limitation so I will file an issue for following up about that. Might make sense to add yet another data check for this hehe

Fixes #2631