Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLLSketchMerge aggregator failing for some metrics after upgrade to v0.18 #9736

Closed
scrawfor opened this issue Apr 21, 2020 · 33 comments · Fixed by #9880
Closed

HLLSketchMerge aggregator failing for some metrics after upgrade to v0.18 #9736

scrawfor opened this issue Apr 21, 2020 · 33 comments · Fixed by #9880
Labels
Milestone

Comments

@scrawfor
Copy link
Contributor

scrawfor commented Apr 21, 2020

Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").

Affected Version

v0.18 (upgraded from 0.16.0)

Description

The HLLSketchMerge aggregator is failing for some of our metrics after upgrading to druid 0.18.0. Reverting back to 0.16.0 fixes the issue. I have isolated specific segments where the issue occurs, moved those segments back to our 0.16 historical and have been successfully able to query the same metric.

Re-indexing data does not seem to fix the issue.

Error Message.

{
  "error": "Unknown exception",
  "errorMessage": "java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.",
  "errorClass": "java.lang.RuntimeException",
  "host": null
}
@gianm
Copy link
Contributor

gianm commented Apr 21, 2020

Hi @scrawfor, would you be able to upload one of the specific segments that has this problem?

It sounds like it might be a backwards compatibility issue in DataSketches, but having a copy of the actual sketch image would help debug.

/cc @AlexanderSaydakov for fyi

@gianm
Copy link
Contributor

gianm commented Apr 21, 2020

By the way, Druid 0.16 used DataSketches 0.13.4, and Druid 0.18 uses DataSketches 1.2.0-incubating.

@scrawfor
Copy link
Contributor Author

Unfortunately I can't upload it. I can try to recreate the issue with generic data, but I'm not sure how productive that would be. Are there any other debugging steps I could take?

I tried to load a 0.17.1 historical server to see if it occurred in that release as well, but I ran into issues with the authentication extension and was unable to get the node started.

@clintropolis clintropolis added this to the 0.18.1 milestone Apr 21, 2020
@gianm
Copy link
Contributor

gianm commented Apr 21, 2020

@scrawfor Perhaps you could extract and upload just the HLL column? It only contains sketches of data, so it's less sensitive than the entire segment.

A good way to do it is to unzip the segment and look at the meta.smoosh file. It has one line per column, where each line has four parts: column name, smoosh file index (usually "0" except for large multipart segments), start byte offset within that smoosh file, end byte offset. So for this column:

diffUrl,0,3316799,5665555

You could extract it by running:

dd bs=1 if=00000.smoosh skip=3316799 count=2348756 of=diffUrl

What I'm after is a binary image of a specific sketch that exhibits the problem — I think once we have that it should be easier to find and fix it.

If it is possible to reproduce this on some test data that would be great too.

@leerho
Copy link
Contributor

leerho commented Apr 21, 2020

Do we have a stack trace that shows where in the sketch code this occurs?

@leerho
Copy link
Contributor

leerho commented Apr 21, 2020

It would also be helpful to know in what kind of operation: update or merge, getResult(), getEstimate(), etc.

@leerho
Copy link
Contributor

leerho commented Apr 21, 2020

The exception thrown is "java.util.concurrent.ExecutionException". Our sketches are not thread-safe with the single exception of our Theta Sketch, which has concurrent configuration options. So if there is more than one thread touching the HLL sketch, all bets are off.

I can't explain why it doesn't fail in Druid 0.16, unless your threading model changed between 0.16 and 0.18.

If concurrency is the problem, the simplest & fastest way to fix this would be to put a synchronized wrapper around the sketch.

@scrawfor
Copy link
Contributor Author

@gianm I'll extract it tomorrow. Thanks for the instructions.

@leerho It's a merge aggregator.

{
    "type" : "HLLSketchMerge",
    "name" : "unique_views_hll",
    "fieldName" : "unique_views_hll"
}

Here is a stack strace

2020-04-21T16:42:03,398 ERROR [processing-3] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences!
org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.
	at org.apache.datasketches.hll.Union.<init>(Union.java:100) ~[datasketches-java-1.2.0-incubating.jar:?]
	at org.apache.datasketches.hll.Union.writableWrap(Union.java:140) ~[datasketches-java-1.2.0-incubating.jar:?]
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchMergeBufferAggregator.aggregate(HllSketchMergeBufferAggregator.java:113) ~[?:?]
	at org.apache.druid.query.aggregation.AggregatorAdapters.aggregateBuffered(AggregatorAdapters.java:164) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.AbstractBufferHashGrouper.aggregate(AbstractBufferHashGrouper.java:161) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.SpillingGrouper.aggregate(SpillingGrouper.java:168) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.ConcurrentGrouper.aggregate(ConcurrentGrouper.java:267) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.Grouper.aggregate(Grouper.java:85) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.RowBasedGrouperHelper.lambda$createGrouperAccumulatorPair$2(RowBasedGrouperHelper.java:330) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.lambda$accumulate$0(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.accumulate(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:87) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:171) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:153) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:246) [druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:233) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
2020-04-21T16:42:03,425 WARN [qtp1446521801-86[groupBy_[analytics-data-primary]_be057876-fdd5-4085-be7f-9c17456e11ca]] org.apache.druid.server.QueryLifecycle - Exception while processing queryId [be057876-fdd5-4085-be7f-9c17456e11ca] (java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.)
2020-04-21T16:42:03,400 ERROR [processing-4] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences!
org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.
	at org.apache.datasketches.hll.Union.<init>(Union.java:100) ~[datasketches-java-1.2.0-incubating.jar:?]
	at org.apache.datasketches.hll.Union.writableWrap(Union.java:140) ~[datasketches-java-1.2.0-incubating.jar:?]
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchMergeBufferAggregator.aggregate(HllSketchMergeBufferAggregator.java:113) ~[?:?]
	at org.apache.druid.query.aggregation.AggregatorAdapters.aggregateBuffered(AggregatorAdapters.java:164) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.AbstractBufferHashGrouper.aggregate(AbstractBufferHashGrouper.java:161) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.SpillingGrouper.aggregate(SpillingGrouper.java:168) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.ConcurrentGrouper.aggregate(ConcurrentGrouper.java:267) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.Grouper.aggregate(Grouper.java:85) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.RowBasedGrouperHelper.lambda$createGrouperAccumulatorPair$2(RowBasedGrouperHelper.java:330) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.lambda$accumulate$0(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.accumulate(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:87) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:171) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:153) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:246) [druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:233) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
2020-04-21T16:42:03,447 ERROR [processing-4] com.google.common.util.concurrent.Futures$CombinedFuture - input future failed.
java.lang.RuntimeException: org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:253) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:233) ~[druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.
	at org.apache.datasketches.hll.Union.<init>(Union.java:100) ~[datasketches-java-1.2.0-incubating.jar:?]
	at org.apache.datasketches.hll.Union.writableWrap(Union.java:140) ~[datasketches-java-1.2.0-incubating.jar:?]
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchMergeBufferAggregator.aggregate(HllSketchMergeBufferAggregator.java:113) ~[?:?]
	at org.apache.druid.query.aggregation.AggregatorAdapters.aggregateBuffered(AggregatorAdapters.java:164) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.AbstractBufferHashGrouper.aggregate(AbstractBufferHashGrouper.java:161) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.SpillingGrouper.aggregate(SpillingGrouper.java:168) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.ConcurrentGrouper.aggregate(ConcurrentGrouper.java:267) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.Grouper.aggregate(Grouper.java:85) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.RowBasedGrouperHelper.lambda$createGrouperAccumulatorPair$2(RowBasedGrouperHelper.java:330) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.lambda$accumulate$0(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.accumulate(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:87) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:171) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:153) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:246) ~[druid-processing-0.18.0.jar:0.18.0]
	... 6 more

@leerho
Copy link
Contributor

leerho commented Apr 22, 2020

Thanks for the stack trace. It is very helpful. I think I may have a clue what may be happening.

The specific exception thrown by the HLL union operator is:

org.apache.datasketches.SketchesArgumentException: Incoming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.

This occurs in only one place, which is the union.update(HllSketch input) method. It is detecting that a special flag is set in the input sketch that is only ever set during a union operation. Thus the incoming sketch had to be an image of a union operator and not that of a streaming updatable sketch nor a streaming compact sketch.

This flag is used to detect that the internal state of the union data structure is not finalized. It is only finalized when you call

HllSketch out = union.getResult(TgtHllType type);

Using a serialized union operator as input to a union merge operation is not legal and it never has been. The example code also illustrates the use of union.getResult(type), and that example code has been there for several years.

It is only with version 1.2.0 that this special case is properly detected.

There still might be a problem in the HLL code, but I would appreciate it if you could check your usage code and see if my hunch is correct.

Lee.

@leerho
Copy link
Contributor

leerho commented Apr 22, 2020

Also, please check that these sketches are only touched by one thread at a time. The "concurrent" exception that was thrown makes me nervous :)

@scrawfor
Copy link
Contributor Author

@gianm I pulled out all sensitive info and re-indexed our data, so I'm attaching the full segment. The offending metric is unique_views_hll.
hll_segment.zip

Also, I did some tests with other query types and found timeseries and topN queries were successful.

@leerho I'll have to leave that to others more familiar with the druid extension to comment on, but looking at the code it does seem like a lock is acquired.

@gianm
Copy link
Contributor

gianm commented Apr 22, 2020

@scrawfor Thanks for the upload. Do you have an example of a query that exhibits the problem?

@scrawfor
Copy link
Contributor Author

@scrawfor Thanks for the upload. Do you have an example of a query that exhibits the problem?

Sure.

{
  "dataSource" : "hlltest",
  "queryType" : "groupBy",
  "intervals" : [ "2020-04-06/2020-04-07" ],
  "granularity" : "ALL",
  "aggregations" : [{
    "type" : "HLLSketchMerge",
    "name" : "unique_views_hll",
    "fieldName" : "unique_views_hll"
  }],
  "limitSpec" : {
    "type" : "default",
    "limit" : 50000,
    "columns" : [ {
      "dimension" : "unique_views_hll",
      "direction" : "descending",
      "dimensionOrder" : "alphanumeric"
    } ]
  },
  "dimensions" : []
}

@clintropolis
Copy link
Member

Thanks for the segment and example query, I can reproduce this issue in the debugger with group by query and can confirm that it doesn't seem to affect timeseries queries, which means it is likely an issue with HllSketchMergeBufferAggregator or how it is being used, (timeseries is using HllSketchMergeAggregator).

The "concurrent" exception that was thrown makes me nervous :)

This is an unfortunate presentation issue, that's not the actual exception on the historical, but the side effect of the broker parallel merging catching an error on an individual query. I'll see if I can try to improve this in the future to make it less confusing.

@scrawfor
Copy link
Contributor Author

@clintropolis Thanks for confirming. I did find that the HllSketchMergeAggregator seemed to be affected as well.

I tried to reissue the query this evening, but it's using HllSketchMergeBufferAggregator instead.

2020-04-22T14:38:27,220 ERROR [qtp1446521801-81[groupBy_[analytics-data-daily-dev]_b5a02c60-3786-499a-a275-8e8bfae81e0e]] org.apache.druid.server.QueryResource - Exception handling request: {class=org.apache.druid.server.QueryResource, exceptionType=class org.apache.datasketches.SketchesArgumentException, exceptionMessage=Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set., query={"queryType":"groupBy","dataSource":{"type":"table","name":"analytics-data-daily-dev"},"intervals":{"type":"LegacySegmentSpec","intervals":["2020-04-01T00:00:00.000Z/2020-04-14T00:00:00.000Z"]},"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"dimensions":[],"aggregations":[{"type":"HLLSketchMerge","name":"unique_views_hll","fieldName":"unique_views_hll","lgK":12,"tgtHllType":"HLL_4","round":false}],"postAggregations":[{"type":"HLLSketchToString","name":"unique_views_hll_sketch","field":{"type":"fieldAccess","name":"unique_views_hll","fieldName":"unique_views_hll"}}],"having":null,"limitSpec":{"type":"default","columns":[{"dimension":"unique_views_hll","direction":"descending","dimensionOrder":{"type":"alphanumeric"}}],"limit":50000},"context":{"groupByStrategy":"v1","queryId":"b5a02c60-3786-499a-a275-8e8bfae81e0e"},"descending":false}, peer=10.89.92.231} (org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.)
2020-04-22T14:38:27,221 ERROR [processing-4] org.apache.druid.query.GroupByMergedQueryRunner - Exception with one of the sequences!
java.lang.NullPointerException: null
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchMergeAggregator.aggregate(HllSketchMergeAggregator.java:63) ~[?:?]
	at org.apache.druid.segment.incremental.OnheapIncrementalIndex.doAggregate(OnheapIncrementalIndex.java:252) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.segment.incremental.OnheapIncrementalIndex.addToFacts(OnheapIncrementalIndex.java:162) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:614) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:608) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.GroupByQueryHelper$3.accumulate(GroupByQueryHelper.java:155) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.GroupByQueryHelper$3.accumulate(GroupByQueryHelper.java:139) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.lambda$accumulate$0(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.accumulate(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:87) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:171) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:153) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.GroupByMergedQueryRunner$1$1.call(GroupByMergedQueryRunner.java:121) [druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.GroupByMergedQueryRunner$1$1.call(GroupByMergedQueryRunner.java:111) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
2020-04-22T14:38:27,221 ERROR [processing-3] org.apache.druid.query.GroupByMergedQueryRunner - Exception with one of the sequences!
java.lang.NullPointerException: null
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchMergeAggregatorFactory.factorize(HllSketchMergeAggregatorFactory.java:89) ~[?:?]
	at org.apache.druid.segment.incremental.OnheapIncrementalIndex.factorizeAggs(OnheapIncrementalIndex.java:233) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.segment.incremental.OnheapIncrementalIndex.addToFacts(OnheapIncrementalIndex.java:165) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:614) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:608) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.GroupByQueryHelper$3.accumulate(GroupByQueryHelper.java:155) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.groupby.GroupByQueryHelper$3.accumulate(GroupByQueryHelper.java:139) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.lambda$accumulate$0(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.ConcatSequence.accumulate(ConcatSequence.java:41) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:87) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:171) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:153) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74) ~[druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.18.0.jar:0.18.0]
	at org.apache.druid.query.GroupByMergedQueryRunner$1$1.call(GroupByMergedQueryRunner.java:121) [druid-processing-0.18.0.jar:0.18.0]
	at org.apache.druid.query.GroupByMergedQueryRunner$1$1.call(GroupByMergedQueryRunner.java:111) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.18.0.jar:0.18.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

@AlexanderSaydakov
Copy link
Contributor

How was that segment with sketches created?

@scrawfor
Copy link
Contributor Author

scrawfor commented Apr 22, 2020

@AlexanderSaydakov Using native batch indexer, local Firehose, built the sketch over a string column.

Here's the metric spec.

 {
          "type": "HLLSketchBuild",
          "name": "unique_views_hll",
          "fieldName": "view_id",
          "lgK": 14,
          "tgtHllType": "HLL_4",
          "round": false
 },

@AlexanderSaydakov
Copy link
Contributor

using what version of Druid?

@scrawfor
Copy link
Contributor Author

0.16.0

@clintropolis
Copy link
Member

The exception occurs on this line on i think the 7th call to aggregate using the example query, which is wrapping the memory location it gets from the union it is building for the aggregation, though I guess the previous aggregate operation could have left it in this state? I had some other stuff to do so haven't got to dig in much yet to determine if the issue is with the value stored in the column (pointing to the build aggregator being the issue) or it occurs from the aggregation itself at query time.

@leerho
Copy link
Contributor

leerho commented Apr 23, 2020

We have a fix!

@clintropolis @scrawfor @gianm @AlexanderSaydakov
I want to thank all of you for your help! This was truly a team effort!

With clues from @clintropolis and @scrawfor @AlexanderSaydakov was able to reproduce the bug with his knowledge of how the aggregator works. And from that I was able to locate the bug, which was my fault. I put in a check for a flag where there did not need to be one. So it was actually throwing an unnecessary exception.

We will be going over this part of the code carefully, adding unit tests and preparing for a new release. Due to the dual 72 hour release cycles this will take a week or so.

Thank you for your patience!

Lee.

@leerho
Copy link
Contributor

leerho commented Apr 23, 2020

Is there a schema file somewhere that describes the layout of the hll_segment.zip file that @scrawfor attached to this issue. I'd like to write a parser so that in the future if we need to pull out just the sketches I can do it more efficiently. I was able to find some of the sketches by hand, but it is a lot of work :)

@clintropolis
Copy link
Member

clintropolis commented Apr 23, 2020

Is there a schema file somewhere that describes the layout of the hll_segment.zip file that @scrawfor attached to this issue. I'd like to write a parser so that in the future if we need to pull out just the sketches I can do it more efficiently. I was able to find some of the sketches by hand, but it is a lot of work :)

@gianm described how to get the raw column out of the segment in this comment by finding the position information in the meta.smoosh, but you can also extract base64 serialized versions of the column with the dump-segment tool, if you're wanting to easily look at values for individual rows.

From the druid package directory:

$ java -classpath "lib/*" -Ddruid.extensions.loadList="[\"druid-datasketches\"]" org.apache.druid.cli.Main tools dump-segment -d /Users/clint/workspace/data/druid/localStorage/hll_segment/ --dump rows --column unique_views_hll

which will spit out something like this:

{"unique_views_hll":"AwEHDggYAAGwAAAAAK6iCQFzJwQDzXwGBTMWCgeFZARCyewJDVSOBw8Q6QQRuVwO2jliD4GTxAnxybMEFnw+CRj2xw4Z9/MEGovwBxyq+godenYKHl8CBx9N7AaPkw0KIzRKEQ8VoATGX+QGJnkMBydvIAXWU8YNKqXbBCuKUgYu5Y0LMOZHBzLeBQczolUHJ66KBjVFJA427xcFGB5sB+tuxQs5gRAbYG1uFD13/RA+TOQF+EZ8FIe4PxBCoDwEKhhEF27WWgc9CJYGSFPOD0ld1A3NfH8JS/3JB0ynuAu6ugsETuvmCk/ByAutozQHUVk8FkiEUQdUcOoIVRcHCFZKLgRXXloFNSM8DqxXSwZbS8IEXNjhDF3WcxJeRQYIYGuxB2EK6wdjnhESZNyBBGeCsghoY0UHSCCMBWwN7gRuu5sKb9xvCXES+wpy6RALcx69BXSE1At1ZcgEvF0GBrPFjQx5L7YFeia2Bkwu+A2AfysHgDD6C4GBoQSCmcQLRZ43BO+VRROFOvMHh6nKBrPqiQWLPqQHjcHlBo5JHQSPwzgErnF6DpF0vAWSLWcLaQ4UC5ckBAyYzLcHm+LIBZ3OQAuBHaIFoEwADKGVjASibboM7VpAB6SJAAWlDrwHptSdCaf2UxAdjHAFgSqBG6YGjgiul1EIM19xBLAZqgqyZUEMs2H7EbTFIRC2l1YOuIUZJLlyjw26y6cINoV1CbxJvQi9qn0HvpsMB78T3Qa8Bu4MxgD9Bch6GwjvbC8Hy/MyB5E7aAXN0FEFzmVGCc8vIwfSby4G1ENTD4JTYwTW2XcH18g/BNhTUBr/bVUF2tK0BtxzuAbd9OYR3/THBuCZsQrirXUHrpsqBeUdggnmZX8E5zomDurPIQXr6l0H7Vs3Be5otAY2ua4E8aijCvJQzgf0gtAW+CxWBPrluhH7eqkKrsTVBHKM9wQ="}
{"unique_views_hll":"AgEHDgMIBQDERQoE+BSGB6dz6ATE/HoHZs5tIA=="}
{"unique_views_hll":"AgEHDgMIBQCPrFwWNt2mCkmuNQZFWZgYxJpXDg=="}
....

@clintropolis
Copy link
Member

I ran some tests, and it appears that downgrading datasketches-java to 1.1.0-incubating doesn't have this issue. Since we have an unrelated major regression with stream ingestion and are doing a 0.18.1 release asap, as a precaution I have opened #9751 which makes this change, in case our critical fix release is ready to go before your new version is released. @leerho / @AlexanderSaydakov can you think of any reason not to do this?

@AlexanderSaydakov
Copy link
Contributor

AlexanderSaydakov commented Apr 23, 2020

I don't see why not. I looked at our release notes for 1.2.0, and there is nothing that might affect Druid, just no HLL union speed improvement for now.

@leerho
Copy link
Contributor

leerho commented Apr 24, 2020

@clintropolis @gianm
Thanks for pointing me to the dump segment tool. I got it to work :)

@suneet-s
Copy link
Contributor

apache/datasketches-java#308 - Looks like this is the PR with the fix.

@leerho
Copy link
Contributor

leerho commented Apr 30, 2020 via email

@leerho
Copy link
Contributor

leerho commented Apr 30, 2020

Folks,
One of the learnings from this debugging exercise is that it would have been really useful to be able to quickly examine the sketches in the hll_segment.zip that @scrawfor posted in this issue.

As a result, I have developed a small tool that takes the output of the dump_segment_tool , and extracts the sketches as binary files. This allows us to easily examine the details of individual sketches with methods already available in the DataSketches library.

Hopefully, this will make debugging issues involving sketches in Druid much easier and faster.

The question is where should we put this tool so others can use it? Obviously it makes assumptions about Druid's segment structure and Druid's Dump-Segment tool. It doesn't make sense to put it in the DataSketches library as it is specific to Druid. I'd be glad to submit a PR and add it to druid/services/src/main/java/org/apache/druid/cli directory. Or perhaps it should be added to the druid/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches folder.

Please advise.

Lee.

@gianm
Copy link
Contributor

gianm commented Apr 30, 2020

@leerho, would the work you did make sense as an additional option on the DumpSegment tool? If so that seems like the most natural place.

@leerho
Copy link
Contributor

leerho commented May 1, 2020

@gianm

Could we set up a separate issue for this discussion? It is a bit off topic for this bug.

@gianm
Copy link
Contributor

gianm commented May 2, 2020

Yes, @leerho that makes sense. Please start a new issue and at-mention me on it via @gianm.

@leerho
Copy link
Contributor

leerho commented May 13, 2020

We announced a new release on May 7th that fixes this issue on dev@druid.apache.org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants