Clean up decomposer index logic #3829

eccabay · 2022-11-09T16:40:44Z

Ensures we set y's time index as well as X during drop_time_index, and saves the frequency in STLDecomposer so we don't mis-assume the frequency when determining the seasonality offset in inverse_transform

codecov · 2022-11-09T16:48:59Z

Codecov Report

Merging #3829 (ca03f1e) into main (9057491) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3829     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        344     344             
  Lines      36080   36116     +36     
=======================================
+ Hits       35942   35979     +37     
+ Misses       138     137      -1

Impacted Files	Coverage Δ
...nents/transformers/preprocessing/stl_decomposer.py	`100.0% <100.0%> (ø)`
.../pipelines/time_series_classification_pipelines.py	`100.0% <100.0%> (ø)`
evalml/pipelines/time_series_pipeline_base.py	`100.0% <100.0%> (+0.9%)`	⬆️
...valml/pipelines/time_series_regression_pipeline.py	`100.0% <100.0%> (ø)`
...omponent_tests/decomposer_tests/test_decomposer.py	`99.7% <100.0%> (+0.1%)`	⬆️
...peline_tests/test_time_series_baseline_pipeline.py	`100.0% <100.0%> (ø)`
.../tests/pipeline_tests/test_time_series_pipeline.py	`99.9% <100.0%> (+0.1%)`	⬆️
evalml/tests/utils_tests/test_gen_utils.py	`99.5% <100.0%> (+0.1%)`	⬆️
evalml/utils/gen_utils.py	`99.3% <100.0%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

fjlanasa · 2022-11-10T15:11:40Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

@@ -1072,7 +1072,7 @@ def test_binary_predict_pipeline_objective_mismatch(
        ValueError,
        match="Objective Precision Micro is not defined for time series binary classification.",
    ):
-        binary_pipeline.predict(X[30:32], "precision micro", X[:30], y[:30])


What's this change for? Is it because we can't infer frequencies with only two values? Should we handle this somewhere?

That's correct - pandas needs 3 or more values to infer a frequency. I'm not sure where we could handle this, though.

OK fair enough. How does this affect our ability to forecast out 1 unit? is it still possible?

@eccabay I thought we infer from y_train in fit? How does this predict line play into that?

@jeremyliweishih we infer from y_train in the decomposer's fit, so the pipeline has no frequency knowledge.

@fjlanasa that was a good catch - this change would have prevented us from forecasting out 1 unit. Fortunately, the frequency detection isn't actually required within drop_time_index thanks to the other changes here, so I've just reverted it.

jeremyliweishih

LGTM just a little suggestion!

jeremyliweishih · 2022-11-10T15:16:53Z

evalml/pipelines/time_series_pipeline_base.py

            if X.ww.schema is not None:
-                X = X.ww.drop([self.time_index])
+                X.ww.set_index(self.time_index)
+                X = X.ww.drop(self.time_index)


I thought setting the index dropped the column in woodwork as well? I might be remembering incorrectly though!

Looking at the _set_underlying_index() function here it looks like it doesn't drop it automatically

jeremyliweishih · 2022-11-10T15:20:46Z

evalml/tests/component_tests/decomposer_tests/test_decomposer.py

+
+    X, _, y = ts_data()
+    datetime_index = pd.date_range(start="01-01-2002", periods=len(X), freq="M")
+    datetime_index.freq = None


should we check the case where there is a freqstr as well?

christopherbunn · 2022-11-14T20:09:58Z

evalml/pipelines/time_series_pipeline_base.py

            if X.ww.schema is not None:
-                X = X.ww.drop([self.time_index])
+                X.ww.set_index(self.time_index)
+                X = X.ww.drop(self.time_index)


Looking at the _set_underlying_index() function here it looks like it doesn't drop it automatically

jeremyliweishih and others added 4 commits November 4, 2022 11:18

First approach:

eda7775

Drop the column as well

6832aaf

Update drop_time_index and saving frequencies during fit

db57cca

Merge branch 'main' into drop_time_index

35b7f6b

eccabay added 3 commits November 9, 2022 13:52

test fixes and release notes

e9fa4dd

More testing fix

4fe620f

Merge branch 'main' into drop_time_index

bfd883c

eccabay marked this pull request as ready for review November 9, 2022 19:45

auto-assign bot assigned eccabay Nov 9, 2022

eccabay requested review from jeremyliweishih, christopherbunn and fjlanasa November 9, 2022 19:45

fjlanasa reviewed Nov 10, 2022

View reviewed changes

jeremyliweishih approved these changes Nov 10, 2022

View reviewed changes

eccabay added 5 commits November 10, 2022 15:09

Fix ww time index bug

0b23ef1

Merge branch 'main' into drop_time_index

167ee75

Revert unnecessary test change

400b5e7

Capitalize on frequency knowledge in pipeline

c5f81f9

Have get_time_index set freq in all unknown cases

f727a60

christopherbunn approved these changes Nov 14, 2022

View reviewed changes

eccabay added 2 commits November 14, 2022 16:25

Update test to check both with and without freqstr

f839019

Empty commit

ca03f1e

eccabay enabled auto-merge (squash) November 15, 2022 13:05

eccabay merged commit 46a6564 into main Nov 15, 2022

eccabay deleted the drop_time_index branch November 15, 2022 14:26

chukarsten mentioned this pull request Nov 23, 2022

Release v0.63.0 #3860

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up decomposer index logic #3829

Clean up decomposer index logic #3829

eccabay commented Nov 9, 2022

codecov bot commented Nov 9, 2022 •

edited

Loading

fjlanasa Nov 10, 2022

eccabay Nov 10, 2022

fjlanasa Nov 10, 2022

jeremyliweishih Nov 10, 2022

eccabay Nov 10, 2022

jeremyliweishih left a comment

jeremyliweishih Nov 10, 2022

christopherbunn Nov 14, 2022

jeremyliweishih Nov 10, 2022

christopherbunn Nov 14, 2022

Clean up decomposer index logic #3829

Clean up decomposer index logic #3829

Conversation

eccabay commented Nov 9, 2022

codecov bot commented Nov 9, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 9, 2022 •

edited

Loading