-
Notifications
You must be signed in to change notification settings - Fork 91
Add Time Series Baseline Regression Components and Pipelines #1496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1496 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 223 227 +4
Lines 15316 15483 +167
=========================================
+ Hits 15309 15476 +167
Misses 7 7
Continue to review full report at Codecov.
|
evalml/pipelines/components/estimators/regressors/timeseries_baseline_regressor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih Thank you so much! This is great.
I think this is almost good to merge but I think we need to tweak the implementation of the regressor a bit to ensure the predictions and targets are offset by the right amount by the time we get to pipeline score
.
evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/estimators/regressors/timeseries_baseline_regressor.py
Outdated
Show resolved
Hide resolved
y = _convert_to_woodwork_structure(y) | ||
y = _convert_woodwork_types_wrapper(y.to_series()) | ||
|
||
first = y.iloc[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do any shifting here and predict
should basically be a no-op that returns the y
that is passed in.
The score
method in TimeSeriesRegressionPipeline
will shift the target by -gap
which will ensure the target is gap
time periods ahead of the prediction. Right now, with this shift
they would be offset by gap + 1
which I don't think is intended. Let's say the user has dates October 1-8 and the gap is 3. I think the desired pairs of prediction and target dates should be:
Prediction Date | Target Date |
---|---|
2020-10-01 | 2020-10-04 |
2020-10-02 | 2020-10-05 |
2020-10-03 | 2020-10-06 |
2020-10-04 | 2020-10-07 |
2020-10-05 | 2020-10-08 |
We need to be mindful of the gap=0
case, cause we should shift by 1
in this predict
method to avoid perfect overlap in the pipeline score
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton got it! will make the changes.
evalml/automl/automl_search.py
Outdated
baseline = MeanBaselineRegressionPipeline(parameters={}) | ||
|
||
else: | ||
baseline = TimeSeriesBaselineRegressionPipeline(parameters={"pipeline": {"gap": 0, "max_delay": 0}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the problem configuration parameters passed by the user to ensure the shifting between targets and predictions is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, thanks!
evalml/pipelines/components/estimators/regressors/timeseries_baseline_regressor.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/estimators/regressors/timeseries_baseline_regressor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih I think this is great! Thank you so much for helping on this. Before merging, I suggest we add a test that verifies that the predictions and targets are offset by the right amount in the baseline pipeline (just cause that's a pretty big deal and I want to make sure we have something in place to guard against regressions in future refactorings). I think we should also add the baseline pipeline and estimator to the api ref?
Other than that, I left some comments that don't need to be resolved before merging!
np.testing.assert_allclose(clf.predict(X, y), y) | ||
|
||
|
||
def test_time_series_baseline_no_X(ts_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test that verifies that the predictions and targets are offset by the right amount in score
for both the gap=0
and gap > 0
cases? This can be similar to test_score_drop_nans
in test_time_series_pipeline
.
Fixes #1482.