Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up decomposer index logic #3829

Merged
merged 14 commits into from
Nov 15, 2022
Merged

Clean up decomposer index logic #3829

merged 14 commits into from
Nov 15, 2022

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Nov 9, 2022

Ensures we set y's time index as well as X during drop_time_index, and saves the frequency in STLDecomposer so we don't mis-assume the frequency when determining the seasonality offset in inverse_transform

@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Merging #3829 (ca03f1e) into main (9057491) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3829     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        344     344             
  Lines      36080   36116     +36     
=======================================
+ Hits       35942   35979     +37     
+ Misses       138     137      -1     
Impacted Files Coverage Δ
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <100.0%> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (+0.9%) ⬆️
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
...omponent_tests/decomposer_tests/test_decomposer.py 99.7% <100.0%> (+0.1%) ⬆️
...peline_tests/test_time_series_baseline_pipeline.py 100.0% <100.0%> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 99.9% <100.0%> (+0.1%) ⬆️
evalml/tests/utils_tests/test_gen_utils.py 99.5% <100.0%> (+0.1%) ⬆️
evalml/utils/gen_utils.py 99.3% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review November 9, 2022 19:45
@@ -1072,7 +1072,7 @@ def test_binary_predict_pipeline_objective_mismatch(
ValueError,
match="Objective Precision Micro is not defined for time series binary classification.",
):
binary_pipeline.predict(X[30:32], "precision micro", X[:30], y[:30])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this change for? Is it because we can't infer frequencies with only two values? Should we handle this somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct - pandas needs 3 or more values to infer a frequency. I'm not sure where we could handle this, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK fair enough. How does this affect our ability to forecast out 1 unit? is it still possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay I thought we infer from y_train in fit? How does this predict line play into that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih we infer from y_train in the decomposer's fit, so the pipeline has no frequency knowledge.

@fjlanasa that was a good catch - this change would have prevented us from forecasting out 1 unit. Fortunately, the frequency detection isn't actually required within drop_time_index thanks to the other changes here, so I've just reverted it.

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a little suggestion!

if X.ww.schema is not None:
X = X.ww.drop([self.time_index])
X.ww.set_index(self.time_index)
X = X.ww.drop(self.time_index)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought setting the index dropped the column in woodwork as well? I might be remembering incorrectly though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the _set_underlying_index() function here it looks like it doesn't drop it automatically


X, _, y = ts_data()
datetime_index = pd.date_range(start="01-01-2002", periods=len(X), freq="M")
datetime_index.freq = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check the case where there is a freqstr as well?

if X.ww.schema is not None:
X = X.ww.drop([self.time_index])
X.ww.set_index(self.time_index)
X = X.ww.drop(self.time_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the _set_underlying_index() function here it looks like it doesn't drop it automatically

@eccabay eccabay enabled auto-merge (squash) November 15, 2022 13:05
@eccabay eccabay merged commit 46a6564 into main Nov 15, 2022
@eccabay eccabay deleted the drop_time_index branch November 15, 2022 14:26
@chukarsten chukarsten mentioned this pull request Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants