Skip to content

Integrate Decomposition into AutoMLSearch#3781

Closed
eccabay wants to merge 93 commits intomainfrom
stl_in_automl
Closed

Integrate Decomposition into AutoMLSearch#3781
eccabay wants to merge 93 commits intomainfrom
stl_in_automl

Conversation

@eccabay
Copy link
Copy Markdown
Contributor

@eccabay eccabay commented Oct 25, 2022

Includes some other small code changes to allow integration to be successful!

chukarsten and others added 30 commits October 11, 2022 10:42
…different classes. Moved the test to project the seasonal signal up to the parent Decomposer class. Moved the testing for the seasonal projection to the decomposer test module.
… to move that up to the base Decomposer class.
* Updated get_trend_df() to work out of sample.

* Fixed transform() to work with in sample, but not spanning the sample.

* Fixed inverse_transform to work with smaller than sample, in sample data.
…ple. Also updated test for transform to return same if y is None and moved that to parent class.
…etter reflect what's going on. Docstring changes.
…e seasonal sample to match the STLDecomposer.
self.gap,
)

# Properly fill in the dates in the gap
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a pre-existing bug where we fill in the gap between the training and testing data with the last row of training data, but that means creating an irregularly spaced time index.

Comment on lines +152 to +157
X_schema = X.ww.schema
y_schema = y.ww.schema
X = X.set_index(X[self.time_index])
y = y.set_axis(X[self.time_index])
X.ww.init(schema=X_schema)
y.ww.init(schema=y_schema)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to be saving the time index in the data's indices when we're trying to drop it, but the decomposer needs the time index data to decompose successfully. If anyone has suggestions on a better or cleaner way to get around this I am very open to it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop after fit or transform in the decomposer then? The problem is non-time-series native estimators will fail with a datetime column. Maybe we can amend the logic here to save the time index only when having the decomposer in the pipeline and then dropping the time index later?

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 25, 2022

Codecov Report

Merging #3781 (6f4f2de) into main (ed2a248) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3781     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        343     343             
  Lines      35680   35754     +74     
=======================================
+ Hits       35544   35617     +73     
- Misses       136     137      +1     
Impacted Files Coverage Δ
.../integration_tests/test_time_series_integration.py 100.0% <ø> (ø)
evalml/utils/__init__.py 100.0% <ø> (ø)
...omponents/transformers/preprocessing/decomposer.py 99.1% <100.0%> (-0.1%) ⬇️
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <100.0%> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 99.2% <100.0%> (-0.8%) ⬇️
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.6% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (+0.1%) ⬆️
evalml/tests/conftest.py 98.0% <100.0%> (+0.1%) ⬆️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review October 25, 2022 19:17
Copy link
Copy Markdown
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to figure out when and where we're dropping the time index so that the pipelines don't fail for estimators that can't accept time indexes but everything else LGTM!

Comment on lines +152 to +157
X_schema = X.ww.schema
y_schema = y.ww.schema
X = X.set_index(X[self.time_index])
y = y.set_axis(X[self.time_index])
X.ww.init(schema=X_schema)
y.ww.init(schema=y_schema)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop after fit or transform in the decomposer then? The problem is non-time-series native estimators will fail with a datetime column. Maybe we can amend the logic here to save the time index only when having the decomposer in the pipeline and then dropping the time index later?

@jeremyliweishih
Copy link
Copy Markdown
Collaborator

can we also add test_automl_supports_time_series_regression to make sure that the decomposer and its parameters are correct?

Copy link
Copy Markdown
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm - we're good but think we should add to test_automl_supports_time_series_regression still!

@eccabay
Copy link
Copy Markdown
Contributor Author

eccabay commented Oct 26, 2022

Closing in favor of #3785

@eccabay eccabay closed this Oct 26, 2022
@eccabay eccabay deleted the stl_in_automl branch November 15, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants