Refactor ARIMA/Prophet #3104

ParthivNaresh · 2021-11-30T15:54:48Z

Perf tests here: https://alteryx.atlassian.net/wiki/spaces/PS/pages/1152516105/ARIMA+-+No+longer+relying+on+date+index

I've also added 4 new datasets to the S3 bucket:

Christmas_Daily_PageViews
Summer_Daily_PageViews
OlympicGames_Daily_PageViews
Pandemic_Daily_PageViews

I got these by downloading the Wikipedia page views from here for different articles over various time horizons.

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2021-11-30T16:03:59Z

Codecov Report

Merging #3104 (28aaec1) into main (4bfe5f4) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3104     +/-   ##
=======================================
- Coverage   99.8%   99.7%   -0.0%     
=======================================
  Files        315     315             
  Lines      30711   30547    -164     
=======================================
- Hits       30620   30447    -173     
- Misses        91     100      +9

Impacted Files	Coverage Δ
...ests/automl_tests/test_automl_search_regression.py	`100.0% <ø> (ø)`
...lml/automl/automl_algorithm/iterative_algorithm.py	`100.0% <100.0%> (ø)`
...omponents/estimators/regressors/arima_regressor.py	`100.0% <100.0%> (ø)`
...ponents/estimators/regressors/prophet_regressor.py	`100.0% <100.0%> (ø)`
evalml/pipelines/time_series_pipeline_base.py	`100.0% <100.0%> (ø)`
...alml/tests/component_tests/test_arima_regressor.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_components.py	`99.0% <100.0%> (+0.2%)`	⬆️
...ml/tests/component_tests/test_prophet_regressor.py	`100.0% <100.0%> (ø)`
evalml/tests/conftest.py	`97.0% <0.0%> (-1.4%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bfe5f4...28aaec1. Read the comment docs.

ParthivNaresh · 2021-12-01T16:40:53Z

evalml/pipelines/components/estimators/regressors/arima_regressor.py

        if y is None:
            raise ValueError("ARIMA Regressor requires y as input.")
+        if X is not None:


This addresses #2960. The issue is that since TimeSeriesFeaturizer isn't included in ARIMA-based pipelines, the potential for target leakage can occur.

I think we should back this out of this PR and tackle it in it's own issue because I don't think this is the best fix for the problem.

I think the root cause of the problem is including features that will not be known during prediction time during fit. And whether or not a feature is known-in-advanced is independent of it's correlation to the target (which is what TargetLeakeage is checking). It's possible that there is a feature that will not be known in advanced that is not picked up by TargetLeakage and it's also possible that a feature that will be known in advanced will be dropped because it's correlated with the target.

I think we gotta think this through a bit more it's possible #2960 is blocked by #2511

ParthivNaresh · 2021-12-01T16:42:49Z

evalml/pipelines/components/estimators/regressors/arima_regressor.py

-            X = X.select_dtypes(exclude=["datetime64"])
+        fh_ = self._set_forecast(X)
+        X = X[self.cols_to_keep]
+        X = X.select_dtypes(exclude=["datetime64"])


_remove_datetime isn't called here because ARIMA doesn't have a problem predicting on data that has an arbitrary temporal index, even if it was fit on data that doesn't have a temporal index (because the forecast horizon is relative to the end of the training data).

evalml/automl/automl_algorithm/iterative_algorithm.py

ParthivNaresh · 2021-12-03T02:35:58Z

evalml/tests/component_tests/test_arima_regressor.py

+            if datetime_feature:
+                X_train["Dates"] = dates[:40]
+                X_test["Dates"] = dates[40:]
+        if train_none:


The difference between train_none and no_features is that train_none passes in None as X, while no_features passes in a dataframe with only an index set.

chukarsten

This looks good to me, Parthiv! I think just a few minor string related stuff.

evalml/pipelines/components/estimators/regressors/arima_regressor.py

evalml/pipelines/components/estimators/regressors/prophet_regressor.py

evalml/tests/component_tests/test_prophet_regressor.py

chukarsten · 2021-12-03T16:34:03Z

Just for completeness, I filed this : #3121

…nto ARIMA_DateIndex

evalml/pipelines/components/estimators/regressors/arima_regressor.py

evalml/pipelines/time_series_pipeline_base.py

freddyaboulton

@ParthivNaresh Thanks for this PR I think it looks great! I think it's awesome you added some new seasonal datasets from wikipedia for us to test on. Looking forward to the LG PR to add those so I can use them too. Can you upload the html report to your results page? I'm curious to hover over those datasets/look at the rankings table.

The one thing is that I think we should back out the changes related to #2960 and tackle that problem in its own PR.

evalml/automl/automl_algorithm/iterative_algorithm.py

evalml/pipelines/components/estimators/regressors/arima_regressor.py

freddyaboulton · 2021-12-06T16:06:09Z

evalml/pipelines/components/estimators/regressors/arima_regressor.py

        if y is None:
            raise ValueError("ARIMA Regressor requires y as input.")
+        if X is not None:


I think we should back this out of this PR and tackle it in it's own issue because I don't think this is the best fix for the problem.

I think the root cause of the problem is including features that will not be known during prediction time during fit. And whether or not a feature is known-in-advanced is independent of it's correlation to the target (which is what TargetLeakeage is checking). It's possible that there is a feature that will not be known in advanced that is not picked up by TargetLeakage and it's also possible that a feature that will be known in advanced will be dropped because it's correlated with the target.

I think we gotta think this through a bit more it's possible #2960 is blocked by #2511

evalml/tests/component_tests/test_arima_regressor.py

freddyaboulton · 2021-12-06T16:18:00Z

evalml/tests/component_tests/test_arima_regressor.py

 )
-def test_fit_predict_ts_with_y_not_X_index(
-    mock_get_dates, mock_format_dates, ts_data_seasonal_train
+def test_fit_predict_sk_failure(


nit: This is basically testing that without remove_datetime, our ARIMA component would fail right? Can we rename the test to reflect that and add a comment to the pytest.raises to explain why/what is happening?

Actually this is testing a few different use cases in which the original sktime.AutoARIMA fails but our implementation succeeds.

When the X index is of datetime type and the y index is not (and vice versa)

When X has no features but still has an index

When X has a datetime feature

So I named it test_fit_predict_sk_failure to represent the difference in behaviour from sktime and evalml. Also because of the different use cases, the error from sktime varies so I can't pin it down with an error message.

# Conflicts: # docs/source/release_notes.rst

…nto ARIMA_DateIndex

# Conflicts: # evalml/tests/component_tests/test_arima_regressor.py # evalml/tests/component_tests/test_prophet_regressor.py

ParthivNaresh added 2 commits November 30, 2021 09:53

test changes

c7e54d5

release notes

9cc40a2

ParthivNaresh added 6 commits November 30, 2021 11:56

readd date_index but not used

64dce74

add targetleakage check

b61e3e8

remove date_index variable

1b59ed0

Merge branch 'main' into ARIMA_DateIndex

32c9f83

update release notes

a3d61f2

update release notes

fa8248e

ParthivNaresh commented Dec 1, 2021

View reviewed changes

ParthivNaresh added 4 commits December 1, 2021 14:58

add seasonal param changes

caaeb1b

no message

26ddb9c

Merge branch 'main' into ARIMA_DateIndex

a66e9b0

no message

52e63f5

ParthivNaresh changed the title ~~ARIMA remove mandatory date index~~ Refactor ARIMA/Prophet Dec 2, 2021

ParthivNaresh added 6 commits December 2, 2021 14:22

no message

bbb253e

test update - prophet

118c35f

Merge branch 'main' into ARIMA_DateIndex

01f8341

set default sp=2

3717f8b

no message

5503464

revert arima params

1aaa056

ParthivNaresh commented Dec 3, 2021

View reviewed changes

evalml/automl/automl_algorithm/iterative_algorithm.py Show resolved Hide resolved

Merge branch 'main' into ARIMA_DateIndex

37e144c

ParthivNaresh commented Dec 3, 2021

View reviewed changes

ParthivNaresh marked this pull request as ready for review December 3, 2021 02:37

auto-assign bot assigned ParthivNaresh Dec 3, 2021

chukarsten approved these changes Dec 3, 2021

View reviewed changes

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated Show resolved Hide resolved

evalml/pipelines/components/estimators/regressors/prophet_regressor.py Show resolved Hide resolved

evalml/tests/component_tests/test_prophet_regressor.py Show resolved Hide resolved

minor changes

bad23f9

Merge branch 'ARIMA_DateIndex' of https://github.com/alteryx/evalml i…

6daf8be

…nto ARIMA_DateIndex

ParthivNaresh requested review from angela97lin, jeremyliweishih, freddyaboulton, bchen1116 and eccabay December 3, 2021 16:55

eccabay reviewed Dec 3, 2021

View reviewed changes

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated Show resolved Hide resolved

evalml/pipelines/time_series_pipeline_base.py Show resolved Hide resolved

freddyaboulton approved these changes Dec 6, 2021

View reviewed changes

ParthivNaresh added 5 commits December 7, 2021 11:12

Merge branch 'main' into ARIMA_DateIndex

6f8405d

Merge branch 'main' into ARIMA_DateIndex

e0a7613

# Conflicts: # docs/source/release_notes.rst

Merge branch 'ARIMA_DateIndex' of https://github.com/alteryx/evalml i…

94f46fe

…nto ARIMA_DateIndex

Merge branch 'main' into ARIMA_DateIndex

44661ac

# Conflicts: # evalml/tests/component_tests/test_arima_regressor.py # evalml/tests/component_tests/test_prophet_regressor.py

no message

28aaec1

ParthivNaresh merged commit 6a82b55 into main Dec 7, 2021

chukarsten mentioned this pull request Dec 9, 2021

Release v0.39.0 #3136

Merged

freddyaboulton mentioned this pull request Dec 9, 2021

Changes and fixes for time series classification #3135

Closed

freddyaboulton deleted the ARIMA_DateIndex branch May 13, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ARIMA/Prophet #3104

Refactor ARIMA/Prophet #3104

ParthivNaresh commented Nov 30, 2021 •

edited

Loading

codecov bot commented Nov 30, 2021 •

edited

Loading

ParthivNaresh Dec 1, 2021

freddyaboulton Dec 6, 2021

ParthivNaresh Dec 1, 2021 •

edited

Loading

ParthivNaresh Dec 3, 2021

chukarsten left a comment

chukarsten commented Dec 3, 2021

freddyaboulton left a comment

freddyaboulton Dec 6, 2021

freddyaboulton Dec 6, 2021

ParthivNaresh Dec 7, 2021 •

edited

Loading

Refactor ARIMA/Prophet #3104

Refactor ARIMA/Prophet #3104

Conversation

ParthivNaresh commented Nov 30, 2021 • edited Loading

codecov bot commented Nov 30, 2021 • edited Loading

Codecov Report

ParthivNaresh Dec 1, 2021

Choose a reason for hiding this comment

freddyaboulton Dec 6, 2021

Choose a reason for hiding this comment

ParthivNaresh Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

ParthivNaresh Dec 3, 2021

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

chukarsten commented Dec 3, 2021

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Dec 6, 2021

Choose a reason for hiding this comment

freddyaboulton Dec 6, 2021

Choose a reason for hiding this comment

ParthivNaresh Dec 7, 2021 • edited Loading

Choose a reason for hiding this comment

ParthivNaresh commented Nov 30, 2021 •

edited

Loading

codecov bot commented Nov 30, 2021 •

edited

Loading

ParthivNaresh Dec 1, 2021 •

edited

Loading

ParthivNaresh Dec 7, 2021 •

edited

Loading