New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: getQuantiles() returns values that exceed max #4744
Conversation
486914d
to
6dfaa74
Compare
bc9b11c
to
812c492
Compare
Updated PR to include a proposed fix. May need discussion to ensure this is the right way to do things. I fed getQuantiles() with more synthetic data sets to see that the numbers generated are sane. Since the algorithm is based off the BH/TT uniform() procedure, we know that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the one comment, @ginoledesma could you please fill out our CLA located at: http://druid.io/community/cla.html
Thanks for the contribution.
@@ -1581,7 +1581,7 @@ public double sum(final float b) | |||
z = (-b + Math.sqrt(b * b - 4 * a * c)) / (2 * a); | |||
} | |||
final double uj = this.positions[i - 1] + (this.positions[i] - this.positions[i - 1]) * z; | |||
quantiles[j] = (float) uj; | |||
quantiles[j] = ((float) uj < this.max()) ? (float) uj : this.max(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice how simple, yet effective this tweak is. Would it make sense to do a similar tweak for mins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, if you have a clearly worded explanation of why the uj's can be greater than the max, a comment would be useful. Something about why it's necessary to clamp them to the max.
Does this help with the issue reported in #3972 (with regard to the median, rather than high percentiles)? It seems like max-clamping might not help with the low/mid quantiles like median. |
812c492
to
1db9b41
Compare
@gianm I've updated the PR to include a comment on why clamping to max needs to happen. It's mostly based on the BH/TT paper but I hope it's clear and makes sense. The short version is that we don't expect any "points" (values) to exist to the left of the known minimum or the right of the known maximum. The clamping to min was already handled as a special case (line 1571 - https://github.com/druid-io/druid/pull/4744/files#diff-de22260542704e07a950d869cd124aefL1571). As for #3972, yes, it handles that situation. I've tried to replicate the original poster's situation, but I took a guess in filling in the blanks for the full ingestion spec. Here's what I have: Given the time series query: {
"queryType": "timeseries",
"dataSource": "test_quantile_fix",
"granularity": "all",
"intervals": [
"2017-09-01T00:00:00+08:00/2017-10-01T00:00:00+08:00"
],
"aggregations": [
{
"type": "approxHistogramFold",
"name": "hist_fold",
"fieldName": "hist_fold_api_duration",
"resolution": 10000,
"numBuckets": 100
}
],
"postAggregations": [
{
"type": "quantile",
"name": "myResult",
"fieldName": "hist_fold",
"probability": 0.5
}
],
"context": {}
} Druid 0.10.1 Result for q=0.5: 1111.9833 Druid 0.10.1 + patch Result for q=0.5: 190.86847 Reference (computed using SciPy for q=0.5): 190.0 You'll notice the breaks are completely different between the behavior without the clamped values for max, but I think that has more to do with the folding of multiple approximate histograms, which gets exacerbated when the original values weren't clamped to begin with. |
I'll update the PR with the CLA tomorrow — I did use some personal/company time so I'll check with the company to get that ironed out. |
1db9b41
to
420ae6a
Compare
Updated unit tests to be more thorough with computing a range of quantiles (q=[.10, ..., 0.90] in 0.10 increments) including the explicit test for "outliers" (q=0.05 and q=0.95). |
c1d1694
to
28de800
Compare
Not sure why the travis builds are failing, but it seems to be affecting the other PRs as well — particularly on those with the |
@ginoledesma We've been having some unreliability with one of the tests recently, sorry about that. I re-ran it and it may pass when run a second time. The patch looks good to me, thanks for the additional tests as well. Have you had a chance to get the CLA sorted? |
@ginoledesma If you merge master into your branch, it should help with the CI issues. We just did a commit there that tweaks some Travis settings and seems to be helping. |
28de800
to
773489f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@ginoledesma Hmm, this change does seem better, but I'm wondering how it helps with the test you describe in #4744 (comment). The patch only affects the |
@ginoledesma, any thoughts on the question from #4744 (comment)? |
waiting for answers to @gianm 's questions before merging. |
I don’t have a solid answer, unfortunately. I’m trying to come up with a smaller dataset that replicates the approximation breaks do it can be replicated in the tests properly, but I haven’t found any good leads to explain it. |
@ginoledesma thanks for the details. If you do find any new info please raise a new issue. In the meantime I'll merge this patch since it does look like an improvement. Thanks! |
Fixes #3972