-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect quantiles result produced by quantilesDoublesSketch #8095
Comments
This is a known issue that was fixed in druid-0.14.1, however at the same time a bug in Theta sketch was introduced. So if you do not use Theta sketch, you may want to upgrade to 0.14.1 or 0.14.2 or 0.15.0, but if you do use Theta sketch, I would recommend either waiting for 0.16.0 or building from sources yourself. |
Thanks. @AlexanderSaydakov
We do use theta sketch, and I think the bug #7607 has been resolved in 0.14.2 ? So, is there other bugfixes for theta sketch to be released in 0.16.0? |
Confirmed and fixed by upgrading to 0.14.2. Thanks to @AlexanderSaydakov. Issue closed now. |
Yes, the fix in 14.2 was not complete. In some cases the issue still can show up. |
@AlexanderSaydakov
Releated information in
Any suggestions? Thanks! |
Thanks for quick replay! |
In the log above I see NonNativeWritableMemoryImpl that suggests using big endian byte order. I believe this can happen when a ByteBuffer passed by Druid into the quantiles sketch aggregator has big endian order set and is wrapped as Memory without forcing little endian order. It seems to me that the only place where this was not forced was fixed by #8085. I don't see how this is possible now. I need some way to reproduce the problem. |
Incorrect quantiles result returned by quantilesDoublesSketch
Affected Version
0.13.0
Description
I'm encountering a problem when using quantiles sketch in druid, for some specific intervals, I got weird quantile results, sometimes even with unreasonable results, e.g. p50 > p99.
Here's the case:
Query:
{
"queryType": "select",
"dataSource": "actiondata",
"intervals": ["2019-07-12T08:01:00.000Z/2019-07-12T08:03:00.000Z"],
"dimensions":["app_id", "app_version_id"],
"metrics":["count", "duration_quantiles"],
"filter": {"dimension": "app_id", "type": "selector", "value": "14"},
"granularity": "minute",
"pagingSpec":{"pagingIdentifiers": {}, "threshold":5000}
}
Response:
/// @see attachment select-result.json.txt
Query:
{
"queryType": "groupBy",
"dataSource": "actiondata",
"intervals": ["2019-07-12T08:01:00.000Z/2019-07-12T08:03:00.000Z"],
"dimensions":["app_id"],
"filter": {"dimension": "app_id", "type": "selector", "value": "14"},
"granularity":"all",
"aggregations": [
{ "fieldName": "duration_quantiles", "k": 256, "name": "duration_quantiles", "type": "quantilesDoublesSketch"}
],
"postAggregations": [
{
"type": "quantilesDoublesSketchToQuantiles",
"name": "duration_q",
"field": {
"fieldName": "duration_quantiles",
"name": null,
"type": "fieldAccess"
},
"fractions": [0.50, 0.80, 0.95, 0.99]
}
]
}
Response:
[
{
"version": "v1",
"timestamp": "2019-07-12T08:01:00.000Z",
"event": {
"appId": "14",
"duration_quantiles": 34740,
"duration_q": [0.0, 0.0, 0.0, 46.0]
}
}
]
As we can see, the actual merged result of the 4 sketches of duration_quantiles is [0.0, 0.0, 0.0, 46.0] for fractions: [0.5, 0.8, 0.95, 0.99].
BUT, if we union the 4 sketches (from the select query of step 1) using Yahoo's DoublesSketch lib directly, we got quite different result:
String[] sketches_base64 = new String[] {
// Base64-ed sketches of select-result
"AgMIGoAAAABXIAAAAA......",
"AgMIGoAAAAAUKwAAAA......",
"AgMIGoAAAADLHgAAAA......",
"AgMIGoAAAAB+HQAAAA......"
};
DoublesUnion union = DoublesUnion.builder().setMaxK(256).build();
for (String encodedSketch : sketches_base64) {
DoublesSketch sketch = DoublesSketchOperations.deserializeFromBase64EncodedString(encodedSketch);
union.update(sketch);
}
UpdateDoublesSketch unioned = union.getResult();
System.out.println("expected quantiles: " + Arrays.toString(unioned.getQuantiles(new double[] {0.5, 0.8, 0.95, 0.99})));
With output(expected quantiles) like this:
expected quantiles: [1.0, 7.0, 60.0, 368.0]
The actual merged result of quantilesDoublesSketch deviated significantly from the expected result.
Note that different orders of input sketches to union may result in different merged results slightly, but not that much.
I tried all arrangements of the 4 sketches to union, none of them results in the result [0.0, 0.0, 0.0, 46.0]!
Query:
{
"queryType": "groupBy",
"dataSource": "actiondata",
"intervals": ["2019-07-12T08:01:00.000Z/2019-07-12T08:02:00.000Z"],
...
}
Response:
[
{
"version": "v1",
"timestamp": "2019-07-12T08:01:00.000Z",
"event": {
"appId": "14",
"duration_quantiles": 19307,
"duration_q": [1.0, 7.0, 51.0, 685.0]
}
}
]
Query:
{
"queryType": "groupBy",
"dataSource": "actiondata",
"intervals": ["2019-07-12T08:02:00.000Z/2019-07-12T08:03:00.000Z"],
...
}
Response:
[
{
"version": "v1",
"timestamp": "2019-07-12T08:02:00.000Z",
"event": {
"appId": "14",
"duration_quantiles": 15433,
"duration_q": [1.0, 7.0, 35.0, 799.0]
}
}
]
I'm trying to debug this issue, adding some debugging log to:
DoublesSketchOperations#deserializeFromBase64EncodedString
DoublesSketchOperations#deserializeFromByteArray
DoublesSketchAggregatorFactory#combine
DoublesSketchAggregatorFactory#makeAggregateCombiner#fold
DoublesSketchToQuantilesPostAggregator#compute
like this:
public static DoublesSketch deserializeFromBase64EncodedString(final String str)
{
log.debug("deser sketch: " + str);
return deserializeFromByteArray(Base64.decodeBase64(str.getBytes(StandardCharsets.UTF_8)));
}
As the raw data in the segment seems ok, something must happened at merging/combining phase of sketches.
So I add debugging log to all possible places: deserializing, combining, folding.
But, sadly, it's not working....
As a result, debugging log of deserializing of the final/merged sketch (deser sketch(bin): ...) was found in the log, but not the original 4 sketches.
And the debugging log of combining/merging was not found either, neither in the log of broker node nor the historical nodes.
I also tried disabled the cache by setting useCache/populateCache to false in the context, nothing happened.
But when did the original 4 sketches get deserialized and merged?
I'm not quite familiar with the architecture or sourcecode of druid, can anyone give me some hints, to help me proceed? Thanks a lot!
The text was updated successfully, but these errors were encountered: