Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive prediction intervals #4127

Merged
merged 14 commits into from
Apr 10, 2023
Merged

Naive prediction intervals #4127

merged 14 commits into from
Apr 10, 2023

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Apr 6, 2023

Closes #4126

@codecov
Copy link

codecov bot commented Apr 6, 2023

Codecov Report

Merging #4127 (eb86336) into main (8d49f35) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4127     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        349     349             
  Lines      37704   37740     +36     
=======================================
+ Hits       37587   37623     +36     
  Misses       117     117             
Impacted Files Coverage Δ
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (ø)
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

* Remove drop_time_index from get_prediction_intervals

* drop time index only for transform and not for predict

* Maintain datetime frequency in drop_time_index

* Make sure indices for residuals match
@eccabay eccabay marked this pull request as ready for review April 7, 2023 13:34
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions

predictions = self._estimator_predict(features)
predictions.index = y.index
if not calculating_residuals:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we want this when calculating_residuals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what I've found and correct me if I'm wrong @eccabay , when we do a truly in-sample prediction the prediction size is not guaranteed to be the same size as the target values array...because of the lags I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to go back and do a little more experimentation here. You're right @chukarsten that the lags and dropping null rows means that the prediction size won't be the same size as the target values array. However, that's not the case if we've precomputed features with the DFS transformer, I had to add the index alignment @jeremyliweishih commented on to get this to work in the product. I'll do some experimenting to see if I can consolidate the two cases, and get something that makes a bit more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revisited this, it looks like a simpler way to keep this safe is to just limit setting the indices to be equal for when len(predictions) == len(y), it removes the need for the dfs check as well. I've updated accordingly but can always change it again if desired!

)
if self.component_graph.has_dfs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we only need to do this for the has_dfs case?


res_dict = {}
cov_to_mult = {0.75: 1.15, 0.85: 1.44, 0.95: 1.96}
for cov in coverage:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have explicit coverage re the math here? If not can we add it? If my understanding is correct, In test_time_series_pipeline_get_prediction_intervals it looks like we're only testing against the estimator prediction intervals which this implementation shouldn't be tested against.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always hesitant about testing the math - it's too easy to just create a repeat of what we call in the function, which doesn't serve any purpose other than checking that we haven't changed our implementation. Do you have any suggestions for how to test the math?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can test if the PIs logically make sense:

  • if the values are increasing as the number of predictions
  • if the values are increasing based off of the coverage that is selected

my main concern is that the test is asserting against the estimator PIs which we don't use anymore! Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to go down that route, and I'm not super in love with it, I think we'd have to functionalize both of these branches and unit test them individually. I would want the tests to be about as simple as we could make them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean @chukarsten - I added some testing to check the math (ish), do you want to take a look and see what you think? I've been on the fence about splitting the current test up into two, so I'm also happy to do that if you'd prefer simpler tests.

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing that's confusing to me is the parameterization of the test using features. Not sure what that's meant to say or what it's really testing. I think we need some additional explanation there for future devs coming down this road.

Comment on lines 1870 to 1873
@pytest.mark.parametrize("set_coverage", [True, False])
@pytest.mark.parametrize("add_decomposer", [True, False])
@pytest.mark.parametrize("no_preds_pi_estimator", [True, False])
@pytest.mark.parametrize("features", [True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need some inline comment coverage on these. It always takes me a while to understand no_preds_pi_estimator and what that's testing for. Now I'm a little confused what features being True or False means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've renamed features to featuretools_first and no_preds_pi_estimator to ts_native_estimator and added a couple comments, let me know what you think!

@eccabay eccabay requested a review from chukarsten April 10, 2023 13:39
@chukarsten chukarsten merged commit 401bf9a into main Apr 10, 2023
@chukarsten chukarsten deleted the TML-7064_naieve_PI branch April 10, 2023 17:35
@chukarsten chukarsten mentioned this pull request Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Naïve PI Approach
3 participants