-
Notifications
You must be signed in to change notification settings - Fork 89
Add ARIMA to AutoML #2009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARIMA to AutoML #2009
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2009 +/- ##
=======================================
- Coverage 99.9% 99.9% -0.0%
=======================================
Files 280 280
Lines 24105 24288 +183
=======================================
+ Hits 24080 24260 +180
- Misses 25 28 +3
Continue to review full report at Codecov.
|
Performance tests here |
377ca75
to
a244248
Compare
# Conflicts: # docs/source/release_notes.rst # evalml/tests/automl_tests/test_automl_search_regression.py # evalml/tests/component_tests/test_arima_regressor.py # evalml/tests/component_tests/test_estimators.py
X_t = self._component_obj.transform(X) | ||
X_t_df = pd.DataFrame(X_t, columns=X.columns, index=X.index) | ||
return _retain_custom_types_and_initalize_woodwork(X_ww, X_t_df, ltypes_to_ignore=[Integer, Categorical]) | ||
|
||
def fit_transform(self, X, y=None): | ||
X_ww = infer_feature_types(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary otherwise datetime columns are standardized which results in an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this come up? Is this a bug that we've always had or triggered by another change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me - just left a clarifying question. I wonder if there's a way to clean up the tests due to the exception in pipeline generation for ARIMA but I couldn't think of anything off the top of my head. Moving forward with TS problems for performance testing, I'm hoping that we can add datasets with seasonality or other characteristics that models like ARIMA can leverage.
…valml into 1946-Add-ARIMA-To-AutoML
# Conflicts: # docs/source/release_notes.rst
…valml into 1946-Add-ARIMA-To-AutoML
# Conflicts: # docs/source/release_notes.rst
…valml into 1946-Add-ARIMA-To-AutoML # Conflicts: # docs/source/release_notes.rst
Due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let me know when we've got the corresponding conda-forge change approved/merged and are ready to merge this PR and I will gladly do so :)
Other comments blocking merge:
- Please add a note to the ARIMA docstring mentioning that component isn't supported yet on conda.
- I left one question about pmdarima version
@@ -84,40 +91,56 @@ def _match_indices(self, X, y, date_col): | |||
y.index = date_col | |||
return X, y | |||
|
|||
def _format_dates(self, dates, X, y, predict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Heads up @freddyaboulton we'll eventually need to update this ARIMA impl on your accessor feature branch once this is merged.
evalml/pipelines/components/estimators/regressors/arima_regressor.py
Outdated
Show resolved
Hide resolved
|
||
|
||
@pytest.fixture | ||
def has_minimal_dependencies(pytestconfig): | ||
return pytestconfig.getoption("--has-minimal-dependencies") | ||
|
||
|
||
@pytest.fixture | ||
def is_using_conda(pytestconfig): | ||
return pytestconfig.getoption("--is-using-conda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a separate conda feedstock PR to add this there, right?
I'm guessing we need that to be merged first in order for CI to work properly, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry Yes we'll have a separate PR for the feedstock that adds --is-using-conda
to the command
@@ -11,4 +11,5 @@ seaborn>=0.11.1 | |||
category_encoders>=2.0.0 | |||
statsmodels >= 0.12.2 | |||
imbalanced-learn>=0.8.0 | |||
pmdarima==1.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've disabled arima for conda installs while pmdarima sorts out the dependency issue: is there any reason we shouldn't do
pmdarima>=1.8.0
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately causes latest_dependency_versions
to throw an error as it expects 1.8.2
, but that version pins numpy
to a lower version than what we can accept.
Fixes #1946