Skip to content

Integrate Decomposition into AutoMLSearch #3781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 93 commits into from
Closed

Integrate Decomposition into AutoMLSearch #3781

wants to merge 93 commits into from

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Oct 25, 2022

Includes some other small code changes to allow integration to be successful!

chukarsten and others added 30 commits October 11, 2022 10:42
…different classes. Moved the test to project the seasonal signal up to the parent Decomposer class. Moved the testing for the seasonal projection to the decomposer test module.
… to move that up to the base Decomposer class.
* Updated get_trend_df() to work out of sample.

* Fixed transform() to work with in sample, but not spanning the sample.

* Fixed inverse_transform to work with smaller than sample, in sample data.
…ple. Also updated test for transform to return same if y is None and moved that to parent class.
…etter reflect what's going on. Docstring changes.
…e seasonal sample to match the STLDecomposer.
@@ -116,6 +117,16 @@ def _add_training_data_to_X_Y(self, X, y, X_train, y_train):
self.gap,
)

# Properly fill in the dates in the gap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a pre-existing bug where we fill in the gap between the training and testing data with the last row of training data, but that means creating an irregularly spaced time index.

Comment on lines +152 to +157
X_schema = X.ww.schema
y_schema = y.ww.schema
X = X.set_index(X[self.time_index])
y = y.set_axis(X[self.time_index])
X.ww.init(schema=X_schema)
y.ww.init(schema=y_schema)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to be saving the time index in the data's indices when we're trying to drop it, but the decomposer needs the time index data to decompose successfully. If anyone has suggestions on a better or cleaner way to get around this I am very open to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop after fit or transform in the decomposer then? The problem is non-time-series native estimators will fail with a datetime column. Maybe we can amend the logic here to save the time index only when having the decomposer in the pipeline and then dropping the time index later?

@codecov
Copy link

codecov bot commented Oct 25, 2022

Codecov Report

Merging #3781 (6f4f2de) into main (ed2a248) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3781     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        343     343             
  Lines      35680   35754     +74     
=======================================
+ Hits       35544   35617     +73     
- Misses       136     137      +1     
Impacted Files Coverage Δ
.../integration_tests/test_time_series_integration.py 100.0% <ø> (ø)
evalml/utils/__init__.py 100.0% <ø> (ø)
...omponents/transformers/preprocessing/decomposer.py 99.1% <100.0%> (-0.1%) ⬇️
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <100.0%> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 99.2% <100.0%> (-0.8%) ⬇️
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.6% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (+0.1%) ⬆️
evalml/tests/conftest.py 98.0% <100.0%> (+0.1%) ⬆️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review October 25, 2022 19:17
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to figure out when and where we're dropping the time index so that the pipelines don't fail for estimators that can't accept time indexes but everything else LGTM!

Comment on lines +152 to +157
X_schema = X.ww.schema
y_schema = y.ww.schema
X = X.set_index(X[self.time_index])
y = y.set_axis(X[self.time_index])
X.ww.init(schema=X_schema)
y.ww.init(schema=y_schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop after fit or transform in the decomposer then? The problem is non-time-series native estimators will fail with a datetime column. Maybe we can amend the logic here to save the time index only when having the decomposer in the pipeline and then dropping the time index later?

@jeremyliweishih
Copy link
Collaborator

can we also add test_automl_supports_time_series_regression to make sure that the decomposer and its parameters are correct?

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm - we're good but think we should add to test_automl_supports_time_series_regression still!

@eccabay
Copy link
Contributor Author

eccabay commented Oct 26, 2022

Closing in favor of #3785

@eccabay eccabay closed this Oct 26, 2022
@eccabay eccabay deleted the stl_in_automl branch November 15, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants