-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop ARIMA when month_start
freq is detected
#2632
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2632 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 297 297
Lines 27037 27044 +7
=======================================
+ Hits 26993 27000 +7
Misses 44 44
Continue to review full report at Codecov.
|
@@ -379,7 +379,7 @@ def test_fit_predict_date_index_named_out_of_sample( | |||
|
|||
|
|||
@pytest.mark.parametrize("freq_num", ["1", "2"]) | |||
@pytest.mark.parametrize("freq_str", ["S", "T", "H", "D", "M", "Y"]) | |||
@pytest.mark.parametrize("freq_str", ["T", "M", "Y"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaking this in here because it didn't go through in my last ARIMA PR (just meant to save time from 20s to 10s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just left a question about one of the tests
@@ -3452,7 +3452,8 @@ def test_automl_validates_problem_configuration(X_y_binary): | |||
problem_type="time series regression", | |||
problem_configuration={"max_delay": 2, "gap": 3}, | |||
) | |||
|
|||
_, y = ts_data | |||
X = pd.DataFrame(pd.date_range("2020-10-01", "2020-10-31"), columns=["Date"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we need to add this to the test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I had added it because the code I included in AutoMLSearch
would throw an error if the data didn't have a date_index
column specified, which X_y_binary
didn't have. Decided to just clean up the test and replace the data with what I added, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh When I run this repro for Arima with MS frequency on main I get a value error but it's coming from our own arima implementation.
from evalml.pipelines.components import ARIMARegressor
from evalml.demos import load_diabetes
import pandas as pd
X, y = load_diabetes()
X.ww["Date"] = pd.Series(pd.date_range(start="1/1/2018", periods=X.shape[0], freq="MS"))
arima = ARIMARegressor(date_index="Date")
arima.fit(X, y)
When I delete the line that sets the frequency to M
for MS
, i.e. freq = pd.infer_freq(dates)
, the above repro doesn't throw an error anymore and the arima is able to fit:
Unit tests also pass. So I'm wondering whether we need to make the changes to AutoMLSearch?
@freddyaboulton If that line is changed to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ParthivNaresh !
@@ -505,6 +505,16 @@ def __init__( | |||
allowed_estimators = get_estimators( | |||
self.problem_type, self.allowed_model_families | |||
) | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence as to whether this should live here or in get_estimators
. I'm ok keeping this here. Is this a bug that sktime can fix in the future or is it a limitation of the MS
frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I chose to put it here instead of get_estimators
was to avoid passing problem_configuration
to get_estimators
just for this check
It's a limitation with MS
unfortunately, due to the unreliability of forecasting over intervals that randomly change in values (different days in a month). This doesn't seem to be an issue with month end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's not change the api of get_estimators
yet! Ok, I think there will be other frequencies with that limitation so I will file an issue for following up about that. Might make sense to add yet another data check for this hehe
Fixes #2631