Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add STLDecomposer to multiseries pipelines #4299

Merged
merged 13 commits into from Sep 8, 2023

Conversation

remyogasawara
Copy link
Collaborator

Resolves #4298

Acceptance Criteria (AC)

  • Pipelines generated during AutoMLSearch via make_pipeline for multiseries time series regression include one pipeline with the STLDecomposer and one pipeline without, following the with/without pattern we use for current time series regression problems

@codecov
Copy link

codecov bot commented Sep 2, 2023

Codecov Report

Patch coverage: 100.0% and project coverage change: +0.1% 🎉

Comparison is base (1329988) 99.7% compared to head (28e2cdb) 99.7%.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #4299     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        357     357             
  Lines      39577   39587     +10     
=======================================
+ Hits       39457   39467     +10     
  Misses       120     120             
Files Changed Coverage Δ
evalml/pipelines/component_graph.py 99.8% <ø> (ø)
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.7% <100.0%> (+0.1%) ⬆️
...valml/tests/automl_tests/test_default_algorithm.py 100.0% <100.0%> (ø)
...lml/tests/automl_tests/test_iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.6% <100.0%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -806,10 +806,11 @@ def graph(self, name=None, graph_format=None):
[
key + " : " + "{:0.2f}".format(val)
if (isinstance(val, float))
else key + " : " + str(val)
else key + " : " + str(val).replace("{", "").replace("}", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to follow 😅 can you add an explanatory comment?

@@ -442,6 +442,7 @@ def inverse_transform(
y.append(y_series)
y_df = pd.DataFrame(y).T
y_df.index = original_index
y_df.columns = y_t.columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this necessary? What was the situation where the columns weren't the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predictions weren't getting the corresponding series ID values as the column names and that's needed since the decomposer uses this to select the correct value. Before this was causing the decomposer to return NaN values. @christopherbunn figured that out so he might have more info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predictions that are generated do not have the series ID values as their column names. Copying these names over is required so we can inverse_transform from the decomposer.

Comment on lines 242 to 247
seasonal_period = STLDecomposer.determine_periodicity(
X,
y,
rel_max_order=order,
)
if seasonal_period is not None and seasonal_period <= DECOMPOSER_PERIOD_CAP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way determine_periodicity is set up, we're currently detecting a "period" on the single stacked target data column. I'm worried that that's too brittle, it could cause weird issues in the future. Could you put this in a conditional branch to ensure we only run it in the single series case, and for now just always add the decomposer for multiseries? We'll have to come back and revisit, but that should be ok for the MVP.

@remyogasawara remyogasawara merged commit 81abfca into main Sep 8, 2023
24 checks passed
@remyogasawara remyogasawara deleted the 4298_add_stl_to_ms_pipeline branch September 8, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add STLDecomposer to multiseries pipelines (EvalML)
4 participants