Select Smarter Lags for Time Series #3005

freddyaboulton · 2021-11-02T22:23:49Z

Pull Request Description

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2021-11-02T22:29:16Z

Codecov Report

Merging #3005 (b9930e6) into main (ae89b27) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3005     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        312     312             
  Lines      29775   29853     +78     
=======================================
+ Hits       29684   29762     +78     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/tests/automl_tests/test_automl.py	`99.5% <ø> (ø)`
.../automl_tests/test_automl_search_classification.py	`100.0% <ø> (ø)`
evalml/tests/component_tests/test_components.py	`98.9% <ø> (ø)`
...rmers/preprocessing/delayed_feature_transformer.py	`100.0% <100.0%> (ø)`
...mponent_tests/test_delayed_features_transformer.py	`100.0% <100.0%> (ø)`
.../tests/pipeline_tests/test_time_series_pipeline.py	`99.8% <100.0%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae89b27...b9930e6. Read the comment docs.

freddyaboulton · 2021-11-03T14:10:56Z

evalml/tests/component_tests/test_delayed_features_transformer.py

-            X=X, y=y
-        ),
+        DelayedFeatureTransformer(
+            max_delay=3, gap=0, forecast_horizon=1, conf_level=1.0


Using a conf_level of 1.0 selects all lags. Some of these tests rely on the DelayedFeatureTransformer selecting all lags.

bchen1116

Great work on this! You got this out so fast. I think one thing that we should add before merging this would be some documentation, maybe here, that would allow users to understand what this component is doing and how changing conf_level could change the lags used. This would help with un-black-boxing the component.

Otherwise, left some nits and some questions. Good stuff though!

bchen1116 · 2021-11-03T14:42:03Z

evalml/pipelines/components/transformers/preprocessing/delayed_feature_transformer.py

@@ -14,6 +18,9 @@ class DelayedFeatureTransformer(Transformer):
        date_index (str): Name of the column containing the datetime information used to order the data. Ignored.
        max_delay (int): Maximum number of time units to delay each feature. Defaults to 2.
        forecast_horizon (int): The number of time periods the pipeline is expected to forecast.
+        conf_level (float, None): Float between 0 and 1 that determines the confidence interval size used to select


nit: add default value here

Absolutely. Will get rid of the outdated None as well as adding some validation that conf_level is in the range (0, 1].

bchen1116 · 2021-11-03T15:17:18Z

evalml/pipelines/components/transformers/preprocessing/delayed_feature_transformer.py

+            # Return lags that are significant peaks or the first 10 significant lags
+            index = np.arange(len(acf_values))
+            significant = np.logical_or(ci_intervals[:, 0] > 0, ci_intervals[:, 1] < 0)
+            first_significant_10 = index[:10][significant[:10]]


This won't necessarily return the first 10 significant lags, right? It will only return the 10 lags if they're all significant, but if a few of them aren't significant, returns less lags. If that's what's happening, can we update the comment above to say that?

Yea I can see why this is confusing now. The intention is to select the significant lags in the range [0, 10] not the first 10 significant lags.

bchen1116 · 2021-11-03T15:17:59Z

evalml/pipelines/components/transformers/preprocessing/delayed_feature_transformer.py

+            significant_lags = (
+                set(index[significant]).intersection(peaks).union(first_significant_10)
+            )
+            # If not lags are significant get the first lag


nit: if no lags are significant...

bchen1116 · 2021-11-03T15:33:29Z

evalml/pipelines/components/transformers/preprocessing/delayed_feature_transformer.py

@@ -14,6 +18,9 @@ class DelayedFeatureTransformer(Transformer):
        date_index (str): Name of the column containing the datetime information used to order the data. Ignored.
        max_delay (int): Maximum number of time units to delay each feature. Defaults to 2.
        forecast_horizon (int): The number of time periods the pipeline is expected to forecast.
+        conf_level (float, None): Float between 0 and 1 that determines the confidence interval size used to select
+            which lags to compute from the set of [1, max_delay]. A delay of 1 will always be computed. If 1,
+            selects all possible lags in the set of [1, max_delay], inclusive.


I think we should clarify here that the range is actually (0, 1], since passing in 0 here throws an error, while passing 1 works.

Good catch. Added some parameter validation as well.

bchen1116 · 2021-11-03T16:49:08Z

evalml/tests/component_tests/test_delayed_features_transformer.py

+
+    first_significant_10 = [l for l in significant_lags if l < 10]
+    expected_lags = (
+        set(significant_lags + peaks).intersection(peaks).union(first_significant_10)


Doesnt set(significant_lags + peaks).intersection(peaks) just reduce to set(peaks)?
I think since we wanted the significant peaks, it would be just set(significant_lags).intersection(peaks)?

Yep, making this change now!

freddyaboulton · 2021-11-03T18:59:21Z

@bchen1116 Added details on the algorithm to the component docstring and added a link from the user guide to the component reference since I didn't want to jump deep into the specifics of a component in the user guide. Thanks for the feedback though, the documentation definitely needed improvement.

ParthivNaresh

Looks good! In your perf tests it showed that Daily Female Births had slightly lower validation results while Southern Oscillations had slightly better. It would be interesting to see how Prophet and ARIMA perform (once we enable DelayedFeatureTransformer for ARIMA).

ParthivNaresh · 2021-11-04T14:39:32Z

evalml/pipelines/components/transformers/preprocessing/delayed_feature_transformer.py

+    @staticmethod
+    def _find_significant_lags(y, conf_level, max_delay):
+        all_lags = np.arange(max_delay + 1)
+        if conf_level is not None and y is not None:


Is it necessary to check if the conf_level is not None here?

I think what happened originally I set the default value to None to mean "no lags" but that doesn't work well with our tuners so I made it be a float. I'll move the none check to the init! Thank you.

bchen1116

LGTM! Thanks for making the changes! Agreed with @ParthivNaresh about conf_level=None, but other than that, great work!

freddyaboulton force-pushed the 2733-smarter-lags-time-series branch from 5b166dc to 6a9afb3 Compare November 2, 2021 22:26

freddyaboulton marked this pull request as ready for review November 3, 2021 14:08

auto-assign bot assigned freddyaboulton Nov 3, 2021

freddyaboulton requested review from chukarsten, bchen1116, angela97lin, christopherbunn, dsherry, eccabay, jeremyliweishih and ParthivNaresh and removed request for chukarsten November 3, 2021 14:08

freddyaboulton commented Nov 3, 2021

View reviewed changes

freddyaboulton force-pushed the 2733-smarter-lags-time-series branch from 465c297 to cd7dbb5 Compare November 3, 2021 14:42

bchen1116 requested changes Nov 3, 2021

View reviewed changes

freddyaboulton requested a review from bchen1116 November 3, 2021 18:59

ParthivNaresh approved these changes Nov 4, 2021

View reviewed changes

bchen1116 approved these changes Nov 4, 2021

View reviewed changes

freddyaboulton added 9 commits November 4, 2021 12:57

Add unit tests

714b8f3

Update defaults

cc13a1c

Fix tests

8c64c25

Lint + fix tests

dcec8e2

Add to release notes

a922e8e

Add to requirements

39f8f70

Fix typo

f40753f

Failover to 1 lag if none are significant

c365830

Add statsmodels to core requirements in conda package

684bd60

freddyaboulton added 2 commits November 4, 2021 12:57

Adding doc/fixing comments/add link in user guide

a8eff19

Catch None

b9930e6

freddyaboulton force-pushed the 2733-smarter-lags-time-series branch from b3af9cd to b9930e6 Compare November 4, 2021 17:06

freddyaboulton merged commit 8985ea2 into main Nov 4, 2021

freddyaboulton deleted the 2733-smarter-lags-time-series branch November 4, 2021 17:34

chukarsten mentioned this pull request Nov 9, 2021

Release v0.37.0 #3029

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select Smarter Lags for Time Series #3005

Select Smarter Lags for Time Series #3005

freddyaboulton commented Nov 2, 2021

codecov bot commented Nov 2, 2021 •

edited

Loading

freddyaboulton Nov 3, 2021

bchen1116 left a comment

bchen1116 Nov 3, 2021

freddyaboulton Nov 3, 2021

bchen1116 Nov 3, 2021

freddyaboulton Nov 3, 2021

bchen1116 Nov 3, 2021

bchen1116 Nov 3, 2021

freddyaboulton Nov 3, 2021

bchen1116 Nov 3, 2021

freddyaboulton Nov 3, 2021

freddyaboulton commented Nov 3, 2021

ParthivNaresh left a comment

ParthivNaresh Nov 4, 2021

freddyaboulton Nov 4, 2021

bchen1116 left a comment

Select Smarter Lags for Time Series #3005

Select Smarter Lags for Time Series #3005

Conversation

freddyaboulton commented Nov 2, 2021

Pull Request Description

codecov bot commented Nov 2, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton commented Nov 3, 2021

ParthivNaresh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 2, 2021 •

edited

Loading