Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling Mean features for time series #3028

Merged
merged 11 commits into from Nov 30, 2021

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Nov 9, 2021

Pull Request Description

Fixes #2510

Perf tests here


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #3028 (f5f496d) into main (5de7049) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3028     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        313     313             
  Lines      30490   30567     +77     
=======================================
+ Hits       30400   30477     +77     
  Misses        90      90             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
...alml/pipelines/components/transformers/__init__.py 100.0% <ø> (ø)
evalml/tests/automl_tests/test_automl.py 99.5% <ø> (ø)
evalml/tests/component_tests/test_utils.py 95.7% <ø> (ø)
evalml/tests/conftest.py 98.4% <ø> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.0% <ø> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.7% <ø> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 99.8% <ø> (ø)
...ators/regressors/time_series_baseline_estimator.py 100.0% <100.0%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5de7049...f5f496d. Read the comment docs.

@freddyaboulton freddyaboulton force-pushed the 2510-rolling-windows-time-series branch 3 times, most recently from 836955e to d22d9aa Compare November 22, 2021 20:26
gap=self.start_delay,
min_periods=size + 1,
)
rolling_mean = rolling_mean.get_function()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried computing the features with dfs but I think it's a bit awkward/confusing to have one set of features go through featuretools while the other does not: #3088

I spoke with the featuretools team, and it's probably best to wait until they release Lagged rolling primitives so we can refactor the whole component to use dfs at that point.

@freddyaboulton freddyaboulton marked this pull request as ready for review November 29, 2021 21:48
@freddyaboulton freddyaboulton changed the title Draft: Rolling Mean features for time series Rolling Mean features for time series Nov 29, 2021
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just needs a few docstring changes! Great work!

@@ -59,6 +66,7 @@ def __init__(
gap=0,
forecast_horizon=1,
conf_level=0.05,
rolling_window_size=0.25,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be added to the docstring!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely.

def _compute_delays(self, X_ww, y, original_features):
"""Computes the delayed features for all features in X and y.

Use the autocorrelation to determine delays.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we're going to keep this docstring, we should update.

max_delay=max_delay,
forecast_horizon=forecast_horizon,
gap=gap,
delay_features=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be parameterized into the other test? If you feel it's better tested out explicitly, that's cool too. I assume that's why you did this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I liked testing both halves of the feature engineering computation separately to make it easier to verify the output matched the expected value!

},
index=pd.RangeIndex(50, 81),
)
rolling_features_target_only = pd.DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rolling_features_target_only = rolling_features?

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like the tests you added, and the perf test results look good! Exciting to get this in!

@@ -59,6 +66,7 @@ def __init__(
gap=0,
forecast_horizon=1,
conf_level=0.05,
rolling_window_size=0.25,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to docstring

@freddyaboulton freddyaboulton merged commit 14981c8 into main Nov 30, 2021
@freddyaboulton freddyaboulton deleted the 2510-rolling-windows-time-series branch November 30, 2021 20:02
@chukarsten chukarsten mentioned this pull request Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spike - Compute rolling windows features for time series
3 participants