Skip to content

Add ARIMA to AutoML #2009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
May 11, 2021
Merged

Add ARIMA to AutoML #2009

merged 40 commits into from
May 11, 2021

Conversation

ParthivNaresh
Copy link
Contributor

Fixes #1946

@ParthivNaresh ParthivNaresh self-assigned this Mar 22, 2021
@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #2009 (931d6a5) into main (e4e8025) will decrease coverage by 0.1%.
The diff coverage is 99.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2009     +/-   ##
=======================================
- Coverage   99.9%   99.9%   -0.0%     
=======================================
  Files        280     280             
  Lines      24105   24288    +183     
=======================================
+ Hits       24080   24260    +180     
- Misses        25      28      +3     
Impacted Files Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py 99.0% <ø> (ø)
evalml/utils/gen_utils.py 99.6% <ø> (ø)
evalml/tests/component_tests/test_utils.py 97.5% <75.0%> (-2.5%) ⬇️
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <80.0%> (-<0.1%) ⬇️
evalml/model_family/model_family.py 100.0% <100.0%> (ø)
...omponents/estimators/regressors/arima_regressor.py 100.0% <100.0%> (ø)
...components/transformers/scalers/standard_scaler.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 100.0% <100.0%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_arima_regressor.py 100.0% <100.0%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4e8025...931d6a5. Read the comment docs.

@CLAassistant
Copy link

CLAassistant commented Mar 26, 2021

CLA assistant check
All committers have signed the CLA.

@ParthivNaresh
Copy link
Contributor Author

Performance tests here

@ParthivNaresh ParthivNaresh force-pushed the 1946-Add-ARIMA-To-AutoML branch from 377ca75 to a244248 Compare April 29, 2021 16:04
# Conflicts:
#	docs/source/release_notes.rst
#	evalml/tests/automl_tests/test_automl_search_regression.py
#	evalml/tests/component_tests/test_arima_regressor.py
#	evalml/tests/component_tests/test_estimators.py
X_t = self._component_obj.transform(X)
X_t_df = pd.DataFrame(X_t, columns=X.columns, index=X.index)
return _retain_custom_types_and_initalize_woodwork(X_ww, X_t_df, ltypes_to_ignore=[Integer, Categorical])

def fit_transform(self, X, y=None):
X_ww = infer_feature_types(X)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary otherwise datetime columns are standardized which results in an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this come up? Is this a bug that we've always had or triggered by another change in this PR?

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me - just left a clarifying question. I wonder if there's a way to clean up the tests due to the exception in pipeline generation for ARIMA but I couldn't think of anything off the top of my head. Moving forward with TS problems for performance testing, I'm hoping that we can add datasets with seasonality or other characteristics that models like ARIMA can leverage.

@ParthivNaresh
Copy link
Contributor Author

Due to pmdarima conda installation issues, #2237 has been filed

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let me know when we've got the corresponding conda-forge change approved/merged and are ready to merge this PR and I will gladly do so :)

Other comments blocking merge:

  • Please add a note to the ARIMA docstring mentioning that component isn't supported yet on conda.
  • I left one question about pmdarima version

@@ -84,40 +91,56 @@ def _match_indices(self, X, y, date_col):
y.index = date_col
return X, y

def _format_dates(self, dates, X, y, predict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Heads up @freddyaboulton we'll eventually need to update this ARIMA impl on your accessor feature branch once this is merged.



@pytest.fixture
def has_minimal_dependencies(pytestconfig):
return pytestconfig.getoption("--has-minimal-dependencies")


@pytest.fixture
def is_using_conda(pytestconfig):
return pytestconfig.getoption("--is-using-conda")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a separate conda feedstock PR to add this there, right?

I'm guessing we need that to be merged first in order for CI to work properly, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Yes we'll have a separate PR for the feedstock that adds --is-using-conda to the command

@@ -11,4 +11,5 @@ seaborn>=0.11.1
category_encoders>=2.0.0
statsmodels >= 0.12.2
imbalanced-learn>=0.8.0
pmdarima==1.8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've disabled arima for conda installs while pmdarima sorts out the dependency issue: is there any reason we shouldn't do

pmdarima>=1.8.0

here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately causes latest_dependency_versions to throw an error as it expects 1.8.2, but that version pins numpy to a lower version than what we can accept.

@dsherry dsherry merged commit 50d2b0a into main May 11, 2021
@chukarsten chukarsten mentioned this pull request May 17, 2021
@freddyaboulton freddyaboulton deleted the 1946-Add-ARIMA-To-AutoML branch May 13, 2022 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time series: add ARIMA to automl
7 participants