-
Notifications
You must be signed in to change notification settings - Fork 89
Naive prediction intervals #4127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4127 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 349 349
Lines 37704 37740 +36
=======================================
+ Hits 37587 37623 +36
Misses 117 117
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
* Remove drop_time_index from get_prediction_intervals * drop time index only for transform and not for predict * Maintain datetime frequency in drop_time_index * Make sure indices for residuals match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions
predictions = self._estimator_predict(features) | ||
predictions.index = y.index | ||
if not calculating_residuals: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we want this when calculating_residuals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what I've found and correct me if I'm wrong @eccabay , when we do a truly in-sample prediction the prediction size is not guaranteed to be the same size as the target values array...because of the lags I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to go back and do a little more experimentation here. You're right @chukarsten that the lags and dropping null rows means that the prediction size won't be the same size as the target values array. However, that's not the case if we've precomputed features with the DFS transformer, I had to add the index alignment @jeremyliweishih commented on to get this to work in the product. I'll do some experimenting to see if I can consolidate the two cases, and get something that makes a bit more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revisited this, it looks like a simpler way to keep this safe is to just limit setting the indices to be equal for when len(predictions) == len(y)
, it removes the need for the dfs check as well. I've updated accordingly but can always change it again if desired!
) | ||
if self.component_graph.has_dfs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we only need to do this for the has_dfs
case?
|
||
res_dict = {} | ||
cov_to_mult = {0.75: 1.15, 0.85: 1.44, 0.95: 1.96} | ||
for cov in coverage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have explicit coverage re the math here? If not can we add it? If my understanding is correct, In test_time_series_pipeline_get_prediction_intervals
it looks like we're only testing against the estimator prediction intervals which this implementation shouldn't be tested against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always hesitant about testing the math - it's too easy to just create a repeat of what we call in the function, which doesn't serve any purpose other than checking that we haven't changed our implementation. Do you have any suggestions for how to test the math?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test if the PIs logically make sense:
- if the values are increasing as the number of predictions
- if the values are increasing based off of the coverage that is selected
my main concern is that the test is asserting against the estimator PIs which we don't use anymore! Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to go down that route, and I'm not super in love with it, I think we'd have to functionalize both of these branches and unit test them individually. I would want the tests to be about as simple as we could make them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean @chukarsten - I added some testing to check the math (ish), do you want to take a look and see what you think? I've been on the fence about splitting the current test up into two, so I'm also happy to do that if you'd prefer simpler tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only thing that's confusing to me is the parameterization of the test using features
. Not sure what that's meant to say or what it's really testing. I think we need some additional explanation there for future devs coming down this road.
@pytest.mark.parametrize("set_coverage", [True, False]) | ||
@pytest.mark.parametrize("add_decomposer", [True, False]) | ||
@pytest.mark.parametrize("no_preds_pi_estimator", [True, False]) | ||
@pytest.mark.parametrize("features", [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need some inline comment coverage on these. It always takes me a while to understand no_preds_pi_estimator
and what that's testing for. Now I'm a little confused what features
being True or False means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I've renamed features
to featuretools_first
and no_preds_pi_estimator
to ts_native_estimator
and added a couple comments, let me know what you think!
Closes #4126