Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: getQuantiles() returns values that exceed max #4744

Merged
merged 1 commit into from Sep 26, 2017

Conversation

ginoledesma
Copy link
Contributor

@ginoledesma ginoledesma commented Sep 1, 2017

Fixes #3972

@drcrallen drcrallen added the WIP label Sep 3, 2017
@ginoledesma ginoledesma changed the title [WIP] bug: getQuantiles() returns values that exceed max bug: getQuantiles() returns values that exceed max Sep 5, 2017
@ginoledesma ginoledesma force-pushed the histogram-quantile-bug branch 2 times, most recently from bc9b11c to 812c492 Compare September 5, 2017 18:58
@ginoledesma ginoledesma changed the title bug: getQuantiles() returns values that exceed max [WIP] bug: getQuantiles() returns values that exceed max Sep 5, 2017
@ginoledesma
Copy link
Contributor Author

ginoledesma commented Sep 5, 2017

Updated PR to include a proposed fix. May need discussion to ensure this is the right way to do things.

I fed getQuantiles() with more synthetic data sets to see that the numbers generated are sane. Since the algorithm is based off the BH/TT uniform() procedure, we know that uj should not exceed s given that uj should fall in the set of real numbers {u1 < . . . < u~B}, where s = u~B.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the one comment, @ginoledesma could you please fill out our CLA located at: http://druid.io/community/cla.html

Thanks for the contribution.

@@ -1581,7 +1581,7 @@ public double sum(final float b)
z = (-b + Math.sqrt(b * b - 4 * a * c)) / (2 * a);
}
final double uj = this.positions[i - 1] + (this.positions[i] - this.positions[i - 1]) * z;
quantiles[j] = (float) uj;
quantiles[j] = ((float) uj < this.max()) ? (float) uj : this.max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice how simple, yet effective this tweak is. Would it make sense to do a similar tweak for mins?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, if you have a clearly worded explanation of why the uj's can be greater than the max, a comment would be useful. Something about why it's necessary to clamp them to the max.

@gianm gianm added this to the 0.11.0 milestone Sep 6, 2017
@gianm
Copy link
Contributor

gianm commented Sep 6, 2017

Does this help with the issue reported in #3972 (with regard to the median, rather than high percentiles)? It seems like max-clamping might not help with the low/mid quantiles like median.

@ginoledesma
Copy link
Contributor Author

@gianm I've updated the PR to include a comment on why clamping to max needs to happen. It's mostly based on the BH/TT paper but I hope it's clear and makes sense. The short version is that we don't expect any "points" (values) to exist to the left of the known minimum or the right of the known maximum.

The clamping to min was already handled as a special case (line 1571 - https://github.com/druid-io/druid/pull/4744/files#diff-de22260542704e07a950d869cd124aefL1571).

As for #3972, yes, it handles that situation. I've tried to replicate the original poster's situation, but I took a guess in filling in the blanks for the full ingestion spec. Here's what I have:

Given the time series query:

{
  "queryType": "timeseries",
  "dataSource": "test_quantile_fix",
  "granularity": "all",
  "intervals": [
    "2017-09-01T00:00:00+08:00/2017-10-01T00:00:00+08:00"
  ],
  "aggregations": [
    {
      "type": "approxHistogramFold",
      "name": "hist_fold",
      "fieldName": "hist_fold_api_duration",
      "resolution": 10000,
      "numBuckets": 100
    }
  ],
  "postAggregations": [
    {
      "type": "quantile",
      "name": "myResult",
      "fieldName": "hist_fold",
      "probability": 0.5
    }
  ],
  "context": {}
}

Druid 0.10.1 Result for q=0.5: 1111.9833
druid_quantiles_stock.json.txt

Druid 0.10.1 + patch Result for q=0.5: 190.86847
druid_quantiles_patched.json.txt

Reference (computed using SciPy for q=0.5): 190.0

You'll notice the breaks are completely different between the behavior without the clamped values for max, but I think that has more to do with the folding of multiple approximate histograms, which gets exacerbated when the original values weren't clamped to begin with.

@ginoledesma
Copy link
Contributor Author

I'll update the PR with the CLA tomorrow — I did use some personal/company time so I'll check with the company to get that ironed out.

@ginoledesma
Copy link
Contributor Author

Updated unit tests to be more thorough with computing a range of quantiles (q=[.10, ..., 0.90] in 0.10 increments) including the explicit test for "outliers" (q=0.05 and q=0.95).

@ginoledesma ginoledesma force-pushed the histogram-quantile-bug branch 3 times, most recently from c1d1694 to 28de800 Compare September 11, 2017 19:08
@ginoledesma ginoledesma changed the title [WIP] bug: getQuantiles() returns values that exceed max bug: getQuantiles() returns values that exceed max Sep 12, 2017
@ginoledesma
Copy link
Contributor Author

Not sure why the travis builds are failing, but it seems to be affecting the other PRs as well — particularly on those with the -Xmx512m maven opts set.

@gianm
Copy link
Contributor

gianm commented Sep 12, 2017

@ginoledesma We've been having some unreliability with one of the tests recently, sorry about that. I re-ran it and it may pass when run a second time.

The patch looks good to me, thanks for the additional tests as well. Have you had a chance to get the CLA sorted?

@gianm
Copy link
Contributor

gianm commented Sep 13, 2017

@ginoledesma If you merge master into your branch, it should help with the CI issues. We just did a commit there that tweaks some Travis settings and seems to be helping.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gianm
Copy link
Contributor

gianm commented Sep 14, 2017

@ginoledesma Hmm, this change does seem better, but I'm wondering how it helps with the test you describe in #4744 (comment). The patch only affects the getQuantiles(float[]) method, which is only called in the post-aggregation step, and so it shouldn't affect the breaks or folding behavior at all (which happen before that, in the aggregation step).

@jon-wei jon-wei added the Bug label Sep 20, 2017
@gianm
Copy link
Contributor

gianm commented Sep 21, 2017

@ginoledesma, any thoughts on the question from #4744 (comment)?

@drcrallen
Copy link
Contributor

waiting for answers to @gianm 's questions before merging.

@ginoledesma
Copy link
Contributor Author

I don’t have a solid answer, unfortunately. I’m trying to come up with a smaller dataset that replicates the approximation breaks do it can be replicated in the tests properly, but I haven’t found any good leads to explain it.

@gianm
Copy link
Contributor

gianm commented Sep 26, 2017

@ginoledesma thanks for the details. If you do find any new info please raise a new issue. In the meantime I'll merge this patch since it does look like an improvement. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants