Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect quantiles result produced by quantilesDoublesSketch #8095

Closed
burningice2012 opened this issue Jul 17, 2019 · 8 comments
Closed

Incorrect quantiles result produced by quantilesDoublesSketch #8095

burningice2012 opened this issue Jul 17, 2019 · 8 comments

Comments

@burningice2012
Copy link

burningice2012 commented Jul 17, 2019

Incorrect quantiles result returned by quantilesDoublesSketch

Affected Version

0.13.0

Description

I'm encountering a problem when using quantiles sketch in druid, for some specific intervals, I got weird quantile results, sometimes even with unreasonable results, e.g. p50 > p99.
Here's the case:

  1. Data stored in druid segment looks like this:
    Query:
    {
    "queryType": "select",
    "dataSource": "actiondata",
    "intervals": ["2019-07-12T08:01:00.000Z/2019-07-12T08:03:00.000Z"],
    "dimensions":["app_id", "app_version_id"],
    "metrics":["count", "duration_quantiles"],
    "filter": {"dimension": "app_id", "type": "selector", "value": "14"},
    "granularity": "minute",
    "pagingSpec":{"pagingIdentifiers": {}, "threshold":5000}
    }
  • Where the metric count is of type longSum, and duration_quantiles, quantilesDoublesSketch. And SUM(count) should be equals to DoublesSketch_from_duration_quantiles.getN().

Response:
/// @see attachment select-result.json.txt

  1. When I run a groupBy query like this:
    Query:
    {
    "queryType": "groupBy",
    "dataSource": "actiondata",
    "intervals": ["2019-07-12T08:01:00.000Z/2019-07-12T08:03:00.000Z"],
    "dimensions":["app_id"],
    "filter": {"dimension": "app_id", "type": "selector", "value": "14"},
    "granularity":"all",
    "aggregations": [
    { "fieldName": "duration_quantiles", "k": 256, "name": "duration_quantiles", "type": "quantilesDoublesSketch"}
    ],
    "postAggregations": [
    {
    "type": "quantilesDoublesSketchToQuantiles",
    "name": "duration_q",
    "field": {
    "fieldName": "duration_quantiles",
    "name": null,
    "type": "fieldAccess"
    },
    "fractions": [0.50, 0.80, 0.95, 0.99]
    }
    ]
    }
    Response:
    [
    {
    "version": "v1",
    "timestamp": "2019-07-12T08:01:00.000Z",
    "event": {
    "appId": "14",
    "duration_quantiles": 34740,
    "duration_q": [0.0, 0.0, 0.0, 46.0]
    }
    }
    ]

As we can see, the actual merged result of the 4 sketches of duration_quantiles is [0.0, 0.0, 0.0, 46.0] for fractions: [0.5, 0.8, 0.95, 0.99].

BUT, if we union the 4 sketches (from the select query of step 1) using Yahoo's DoublesSketch lib directly, we got quite different result:

String[] sketches_base64 = new String[] {
// Base64-ed sketches of select-result
"AgMIGoAAAABXIAAAAA......",
"AgMIGoAAAAAUKwAAAA......",
"AgMIGoAAAADLHgAAAA......",
"AgMIGoAAAAB+HQAAAA......"
};

DoublesUnion union = DoublesUnion.builder().setMaxK(256).build();
for (String encodedSketch : sketches_base64) {
DoublesSketch sketch = DoublesSketchOperations.deserializeFromBase64EncodedString(encodedSketch);
union.update(sketch);
}
UpdateDoublesSketch unioned = union.getResult();
System.out.println("expected quantiles: " + Arrays.toString(unioned.getQuantiles(new double[] {0.5, 0.8, 0.95, 0.99})));

With output(expected quantiles) like this:

expected quantiles: [1.0, 7.0, 60.0, 368.0]

The actual merged result of quantilesDoublesSketch deviated significantly from the expected result.
Note that different orders of input sketches to union may result in different merged results slightly, but not that much.
I tried all arrangements of the 4 sketches to union, none of them results in the result [0.0, 0.0, 0.0, 46.0]!

  1. When I split the 2-minute interval into 2 1-minute intervals, both returned correct results:
    Query:
    {
    "queryType": "groupBy",
    "dataSource": "actiondata",
    "intervals": ["2019-07-12T08:01:00.000Z/2019-07-12T08:02:00.000Z"],
    ...
    }
    Response:
    [
    {
    "version": "v1",
    "timestamp": "2019-07-12T08:01:00.000Z",
    "event": {
    "appId": "14",
    "duration_quantiles": 19307,
    "duration_q": [1.0, 7.0, 51.0, 685.0]
    }
    }
    ]

Query:
{
"queryType": "groupBy",
"dataSource": "actiondata",
"intervals": ["2019-07-12T08:02:00.000Z/2019-07-12T08:03:00.000Z"],
...
}
Response:
[
{
"version": "v1",
"timestamp": "2019-07-12T08:02:00.000Z",
"event": {
"appId": "14",
"duration_quantiles": 15433,
"duration_q": [1.0, 7.0, 35.0, 799.0]
}
}
]

I'm trying to debug this issue, adding some debugging log to:
DoublesSketchOperations#deserializeFromBase64EncodedString
DoublesSketchOperations#deserializeFromByteArray
DoublesSketchAggregatorFactory#combine
DoublesSketchAggregatorFactory#makeAggregateCombiner#fold
DoublesSketchToQuantilesPostAggregator#compute

like this:
public static DoublesSketch deserializeFromBase64EncodedString(final String str)
{
log.debug("deser sketch: " + str);
return deserializeFromByteArray(Base64.decodeBase64(str.getBytes(StandardCharsets.UTF_8)));
}

  public static DoublesSketch deserializeFromByteArray(final byte[] data)
  {
    log.debug("deser sketch(bin): " + Base64.encodeBase64String(data));
    return DoublesSketch.wrap(Memory.wrap(data));
  }

As the raw data in the segment seems ok, something must happened at merging/combining phase of sketches.
So I add debugging log to all possible places: deserializing, combining, folding.
But, sadly, it's not working....
As a result, debugging log of deserializing of the final/merged sketch (deser sketch(bin): ...) was found in the log, but not the original 4 sketches.
And the debugging log of combining/merging was not found either, neither in the log of broker node nor the historical nodes.
I also tried disabled the cache by setting useCache/populateCache to false in the context, nothing happened.

But when did the original 4 sketches get deserialized and merged?
I'm not quite familiar with the architecture or sourcecode of druid, can anyone give me some hints, to help me proceed? Thanks a lot!

@AlexanderSaydakov
Copy link
Contributor

This is a known issue that was fixed in druid-0.14.1, however at the same time a bug in Theta sketch was introduced. So if you do not use Theta sketch, you may want to upgrade to 0.14.1 or 0.14.2 or 0.15.0, but if you do use Theta sketch, I would recommend either waiting for 0.16.0 or building from sources yourself.

@burningice2012
Copy link
Author

burningice2012 commented Jul 18, 2019

Thanks. @AlexanderSaydakov
I found the issues you mentioned in the release notes of 0.14.1.
To confirm:

  1. the related bug about quantiles you mentioned above should be: #7320 or 0.13.1 Apr 2, 2019: Fix Direct DoublesUnion Quantiles Bug
  2. and the newly introduced bug about theta sketch is: #7607.

We do use theta sketch, and I think the bug #7607 has been resolved in 0.14.2 ?
0.14.2-incubating release notes #7628
Bug Fixes
#7607 thetaSketch(with sketches-core-0.13.1) in groupBy always return value no more than 16384
#6483 Exception during sketch aggregations while using Result level cache
#7621 NPE when both populateResultLevelCache and grandTotal are set

So, is there other bugfixes for theta sketch to be released in 0.16.0?
Should I upgrade to 0.14.2+, or cherrypick the datasketch module from master and build myself?

@burningice2012 burningice2012 changed the title Incorrect quantiles result produed by quantilesDoublesSketch Incorrect quantiles result produced by quantilesDoublesSketch Jul 18, 2019
@burningice2012
Copy link
Author

Confirmed and fixed by upgrading to 0.14.2. Thanks to @AlexanderSaydakov. Issue closed now.

@AlexanderSaydakov
Copy link
Contributor

AlexanderSaydakov commented Jul 22, 2019

is there other bugfixes for theta sketch to be released in 0.16.0?

Yes, the fix in 14.2 was not complete. In some cases the issue still can show up.
The following pull request fixed the second part:
#7666
It is scheduled to be a part of 0.16.0

@quenlang
Copy link

@AlexanderSaydakov
Hi, I merged 7666 into 0.14.2 and build from the source. Then an exception occurred at data ingestion.
It seems a groupBy query performed at realtime peon task which caused this exception.

2019-07-31T04:09:25,691 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchOperations - deser sketch: AgMIAIAAAAABAAAAAAAAAAAAAAAAACxAAAAAAAAALEAAAAAAAAAsQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
2019-07-31T04:09:25,691 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchOperations - deser sketch(bin): AgMIAIAAAAABAAAAAAAAAAAAAAAAACxAAAAAAAAALEAAAAAAAAAsQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
2019-07-31T04:09:25,691 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeAggregator - aggregate sketch by merge aggregator: AgMIGoAAAAABAAAAAAAAAAAAAAAAACxAAAAAAAAALEAAAAAAAAAsQA==
2019-07-31T04:09:25,738 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchOperations - deser sketch: AgMIGoAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAACAQEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAACAQEA=
2019-07-31T04:09:25,738 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchOperations - deser sketch(bin): AgMIGoAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAACAQEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAACAQEA=
2019-07-31T04:09:25,738 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeAggregator - aggregate sketch by merge aggregator: AgMIGoAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAACAQEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAACAQEA=
2019-07-31T04:09:25,741 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchOperations - deser sketch: AgMIAIAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAACAQEAAAAAAAAAAAAAAAAAAAPA/AAAAAAAA8D8AAAAAAAAAAAAAAAAAAPA/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADwPwAAAAAAAAAAAAAAAAAA8D8AAAAAAIBAQAAAAAAAAAAAAAAAAAAA8D8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
2019-07-31T04:09:25,741 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchOperations - deser sketch(bin): AgMIAIAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAACAQEAAAAAAAAAAAAAAAAAAAPA/AAAAAAAA8D8AAAAAAAAAAAAAAAAAAPA/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADwPwAAAAAAAAAAAAAAAAAA8D8AAAAAAIBAQAAAAAAAAAAAAAAAAAAA8D8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
2019-07-31T04:09:25,741 INFO [task-runner-0-priority-0] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeAggregator - aggregate sketch by merge aggregator: AgMIGoAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAACAQEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAACAQEA=
2019-07-31T04:09:25,744 INFO [groupBy_SVR_ACTION_DATA_MIN_[2019-07-31T03:39:00.000Z/2019-07-31T04:09:00.000Z]] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeBufferAggregator - aggregate sketch by mergebuf aggregator: AgMIGoAAAACZAQAAAAAAAAAAAAAAAAAAAAAAAIBBx0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAACEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAACEAAAAAAAAAIQAAAAAAAABBAAAAAAAAAEEAAAAAAAAAQQAAAAAAAABRAAAAAAAAAFEAAAAAAAAAUQAAAAAAAABRAAAAAAAAAFEAAAAAAAAAYQAAAAAAAABhAAAAAAAAAHEAAAAAAAAAgQAAAAAAAACBAAAAAAAAAIkAAAAAAAAAiQAAAAAAAAChAAAAAAAAAKkAAAAAAAAAwQAAAAAAAADRAAAAAAAAAOkAAAAAAAAA9QAAAAAAAAD5AAAAAAAAAP0AAAAAAANCHQAAAAACAJsVAAAAAAIBBx0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAAhAAAAAAAAACEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAACEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAACEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAACEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAAEEAAAAAAAAAQQAAAAAAAABBAAAAAAAAAEEAAAAAAAAAQQAAAAAAAABBAAAAAAAAAEEAAAAAAAAAUQAAAAAAAABRAAAAAAAAAGEAAAAAAAAAYQAAAAAAAABxAAAAAAAAAHEAAAAAAAAAcQAAAAAAAABxAAAAAAAAAIEAAAAAAAAAgQAAAAAAAACBAAAAAAAAAIkAAAAAAAAAkQAAAAAAAAChAAAAAAAAALkAAAAAAAAA4QAAAAAAAADtAAAAAAAAAPkAAAAAAAIBAQAAAAAAAgEFAAAAAAACAQkAAAAAAAIBDQAAAAAAAgENAAAAAAACASUAAAAAAAABQQAAAAAAAgFJAAAAAAABAXUAAAAAAAMBkQAAAAAAAWKFA
2019-07-31T04:09:25,744 INFO [groupBy_SVR_ACTION_DATA_MIN_[2019-07-31T03:39:00.000Z/2019-07-31T04:09:00.000Z]] org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeBufferAggregator - aggregate sketch by mergebuf aggregator: AgMIGgABAAAHAAAAAAAAAAAAAAAAAAAAAAAAAAAAHEAAAAAAAAAAAAAAAAAAAPA/AAAAAAAA8D8AAAAAAAAAQAAAAAAAAAhAAAAAAAAAGEAAAAAAAAAcQA==
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f3aadbfe3e1, pid=10710, tid=139879948633856
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 11067 C2 com.tingyun.com.yahoo.memory.NonNativeWritableMemoryImpl.getDouble(J)D (30 bytes) @ 0x00007f3aadbfe3e1 [0x00007f3aadbfe3a0+0x41]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /data/tingyun/druid-0.12.3/hs_err_pid10710.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

Releated information in /data/tingyun/druid-0.12.3/hs_err_pid10710.log:

Register to memory mapping:

RAX=0x000000010086ac78 is pointing into metadata
RBX={method} {0x00007f38571e3b68} 'getDouble' '(J)D' in 'com/tingyun/com/yahoo/memory/NonNativeWritableMemoryImpl'
RCX=0x0000000000000080 is an unknown value
RDX=0x00007f38f00dea34 is an unknown value
RSP=0x00007f3856a61e20 is pointing into the stack for thread: 0x00007f3880010000
RBP=0x0000000000a80048 is an unknown value
RSI=0x000000009b3e8a20 is an oop
com.tingyun.com.yahoo.memory.BBNonNativeWritableMemoryImpl
 - klass: 'com/tingyun/com/yahoo/memory/BBNonNativeWritableMemoryImpl'
RDI=0x0000000000000000 is an unknown value
R8 =0x0000000000000100 is an unknown value
R9 =0x000000009b3fabd8 is an oop
com.tingyun.com.yahoo.sketches.quantiles.DirectDoublesSketchAccessor
 - klass: 'com/tingyun/com/yahoo/sketches/quantiles/DirectDoublesSketchAccessor'
R10=0x000000002010d58f is an unknown value
R11=0x0000000000000000 is an unknown value
R12=0x0000000000000000 is an unknown value
R13=0x00007f3856a61ed8 is pointing into the stack for thread: 0x00007f3880010000
R14=0x00007f3856a62020 is pointing into the stack for thread: 0x00007f3880010000
R15=0x00007f3880010000 is a thread
...
...
Stack: [0x00007f3856964000,0x00007f3856a65000],  sp=0x00007f3856a61e20,  free space=1015k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 11067 C2 com.tingyun.com.yahoo.memory.NonNativeWritableMemoryImpl.getDouble(J)D (30 bytes) @ 0x00007f3aadbfe3e1 [0x00007f3aadbfe3a0+0x41]
J 9754 C2 com.tingyun.com.yahoo.sketches.quantiles.DirectDoublesSketchAccessor.get(I)D (77 bytes) @ 0x00007f3aadd90ba8 [0x00007f3aadd90b20+0x88]
J 12669 C1 com.tingyun.com.yahoo.sketches.quantiles.DoublesMergeImpl.justZipWithStride(Lcom/tingyun/com/yahoo/sketches/quantiles/DoublesBufferAccessor;Lcom/tingyun/com/yahoo/sketches/quantiles/DoublesBufferAccessor;II)V (48 bytes) @ 0x00007f3aae91d6cc [0x00007f3aae91d500+0x1cc]
j  com.tingyun.com.yahoo.sketches.quantiles.DoublesMergeImpl.downSamplingMergeInto(Lcom/tingyun/com/yahoo/sketches/quantiles/DoublesSketch;Lcom/tingyun/com/yahoo/sketches/quantiles/UpdateDoublesSketch;)V+198
J 13391 C2 com.tingyun.com.yahoo.sketches.quantiles.DoublesUnionImpl.updateLogic(ILcom/tingyun/com/yahoo/sketches/quantiles/UpdateDoublesSketch;Lcom/tingyun/com/yahoo/sketches/quantiles/DoublesSketch;)Lcom/tingyun/com/yahoo/sketches/quantiles/UpdateDoublesSketch; (552 bytes) @ 0x00007f3aaebd302c [0x00007f3aaebd1940+0x16ec]
J 7158 C2 com.tingyun.com.yahoo.sketches.quantiles.DoublesUnionImpl.update(Lcom/tingyun/com/yahoo/sketches/quantiles/DoublesSketch;)V (17 bytes) @ 0x00007f3aadc86d64 [0x00007f3aadc86d20+0x44]
J 13468 C1 org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeBufferAggregator.aggregate(Ljava/nio/ByteBuffer;I)V (83 bytes) @ 0x00007f3aadf51a0c [0x00007f3aadf51380+0x68c]
J 13278 C2 org.apache.druid.query.groupby.epinephelinae.Grouper.aggregate(Ljava/lang/Object;)Lorg/apache/druid/query/groupby/epinephelinae/AggregateResult; (27 bytes) @ 0x00007f3aaeb60e10 [0x00007f3aaeb5ffc0+0xe50]
J 10418 C2 org.apache.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$HashAggregateIterator.aggregateSingleValueDims(Lorg/apache/druid/query/groupby/epinephelinae/Grouper;)V (117 bytes) @ 0x00007f3aae2afb70 [0x00007f3aae2afa20+0x150]
J 13638 C2 org.apache.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.initNewDelegate()Lorg/apache/druid/query/groupby/epinephelinae/CloseableGrouperIterator; (48 bytes) @ 0x00007f3aae217c6c [0x00007f3aae215880+0x23ec]
J 10400 C2 org.apache.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.hasNext()Z (57 bytes) @ 0x00007f3aae273748 [0x00007f3aae273640+0x108]
J 13110 C2 org.apache.druid.java.util.common.guava.ConcatSequence$$Lambda$136.accumulate(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (13 bytes) @ 0x00007f3aaea8987c [0x00007f3aaea89800+0x7c]
J 13399 C2 org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (27 bytes) @ 0x00007f3aad6b39e0 [0x00007f3aad6b3940+0xa0]

Any suggestions? Thanks!

@AlexanderSaydakov
Copy link
Contributor

I think this is the same issue as #8032
Should be fixed by #8055
and included into the upcoming 0.15.1 release

@quenlang
Copy link

quenlang commented Aug 1, 2019

I think this is the same issue as #8032
Should be fixed by #8055
and included into the upcoming 0.15.1 release

Thanks for quick replay!
I'm sure I had merged #8055 before I build with #7666, because of the #8032 was reported by myself. But the exception also occurred.
Do you need any other information about this exception? @AlexanderSaydakov

@AlexanderSaydakov
Copy link
Contributor

In the log above I see NonNativeWritableMemoryImpl that suggests using big endian byte order. I believe this can happen when a ByteBuffer passed by Druid into the quantiles sketch aggregator has big endian order set and is wrapped as Memory without forcing little endian order. It seems to me that the only place where this was not forced was fixed by #8085. I don't see how this is possible now. I need some way to reproduce the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants