Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Time Series Baseline Regression Components and Pipelines #1496

Merged
merged 27 commits into from Dec 7, 2020

Conversation

jeremyliweishih
Copy link
Contributor

@jeremyliweishih jeremyliweishih commented Dec 2, 2020

Fixes #1482.

  • Add regressor
  • Add pipeline
  • Add to automl

@jeremyliweishih jeremyliweishih changed the title Js 1482 ts baseline Add Time Series Baseline Regression Components and Pipelines Dec 2, 2020
@codecov
Copy link

codecov bot commented Dec 2, 2020

Codecov Report

Merging #1496 (924a2c1) into main (f88a866) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1496     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         223      227      +4     
  Lines       15316    15483    +167     
=========================================
+ Hits        15309    15476    +167     
  Misses          7        7             
Impacted Files Coverage Δ
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.0% <ø> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/pipelines/__init__.py 100.0% <100.0%> (ø)
...lines/components/estimators/regressors/__init__.py 100.0% <100.0%> (ø)
...ators/regressors/time_series_baseline_regressor.py 100.0% <100.0%> (ø)
evalml/pipelines/regression/__init__.py 100.0% <100.0%> (ø)
...ines/regression/time_series_baseline_regression.py 100.0% <100.0%> (ø)
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f88a866...924a2c1. Read the comment docs.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review December 3, 2020 22:14
@jeremyliweishih jeremyliweishih requested review from dsherry, freddyaboulton and angela97lin and removed request for dsherry December 4, 2020 15:25
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih Thank you so much! This is great.

I think this is almost good to merge but I think we need to tweak the implementation of the regressor a bit to ensure the predictions and targets are offset by the right amount by the time we get to pipeline score.

evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
y = _convert_to_woodwork_structure(y)
y = _convert_woodwork_types_wrapper(y.to_series())

first = y.iloc[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do any shifting here and predict should basically be a no-op that returns the y that is passed in.

The score method in TimeSeriesRegressionPipeline will shift the target by -gap which will ensure the target is gap time periods ahead of the prediction. Right now, with this shift they would be offset by gap + 1 which I don't think is intended. Let's say the user has dates October 1-8 and the gap is 3. I think the desired pairs of prediction and target dates should be:

Prediction Date Target Date
2020-10-01 2020-10-04
2020-10-02 2020-10-05
2020-10-03 2020-10-06
2020-10-04 2020-10-07
2020-10-05 2020-10-08

We need to be mindful of the gap=0 case, cause we should shift by 1 in this predict method to avoid perfect overlap in the pipeline score method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton got it! will make the changes.

evalml/pipelines/time_series_regression_pipeline.py Outdated Show resolved Hide resolved
baseline = MeanBaselineRegressionPipeline(parameters={})

else:
baseline = TimeSeriesBaselineRegressionPipeline(parameters={"pipeline": {"gap": 0, "max_delay": 0}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the problem configuration parameters passed by the user to ensure the shifting between targets and predictions is accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, thanks!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih I think this is great! Thank you so much for helping on this. Before merging, I suggest we add a test that verifies that the predictions and targets are offset by the right amount in the baseline pipeline (just cause that's a pretty big deal and I want to make sure we have something in place to guard against regressions in future refactorings). I think we should also add the baseline pipeline and estimator to the api ref?

Other than that, I left some comments that don't need to be resolved before merging!

np.testing.assert_allclose(clf.predict(X, y), y)


def test_time_series_baseline_no_X(ts_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test that verifies that the predictions and targets are offset by the right amount in score for both the gap=0 and gap > 0 cases? This can be similar to test_score_drop_nans in test_time_series_pipeline.

@jeremyliweishih jeremyliweishih merged commit ed909a0 into main Dec 7, 2020
@dsherry dsherry mentioned this pull request Dec 29, 2020
@freddyaboulton freddyaboulton deleted the js_1482_ts_baseline branch May 13, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Baseline Pipeline for Time Series Regression problems
3 participants