-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query realtime datasource may get NullPointerException just when segment unannouncing. #12168
Labels
Comments
A simple solution is just sleep some time after “Unannounce the segment” and before the actually droping segment. |
gianm
added a commit
to gianm/druid
that referenced
this issue
Oct 26, 2023
This can happen if the segment is removed while a query is in progress. Returning empty causes the server to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. Fixes apache#12168.
gianm
added a commit
to gianm/druid
that referenced
this issue
Oct 26, 2023
…-segment retry bug. Fixes apache#12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included.
gianm
added a commit
to gianm/druid
that referenced
this issue
Oct 26, 2023
…-segment retry bug. Fixes apache#12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included.
gianm
added a commit
to gianm/druid
that referenced
this issue
Oct 26, 2023
…-segment retry bug. Fixes apache#12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included.
This was referenced Nov 10, 2023
gianm
added a commit
that referenced
this issue
Nov 20, 2023
…g-segment retry bug. (#15260) * Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug. Fixes #12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included. * Some more test coverage.
writer-jill
pushed a commit
to writer-jill/druid
that referenced
this issue
Nov 20, 2023
…g-segment retry bug. (apache#15260) * Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug. Fixes apache#12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included. * Some more test coverage.
yashdeep97
pushed a commit
to yashdeep97/druid
that referenced
this issue
Dec 1, 2023
…g-segment retry bug. (apache#15260) * Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug. Fixes apache#12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included. * Some more test coverage.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").
Affected Version
0.22.1
Description
Broker process a query will dispatch subquery to different nodes, and a peon process a subquery for some segment, the segment may by unannouncing and peon process the subquery may get NullPointerException.
2022-01-16T00:12:42,443 INFO [[index_kafka_monitor_alert_7321a5cf7c99960_aoekboeg]-appenderator-persist] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[monitor_alert_2022-01-16T08:00:00.000Z_2022-01-16T09:00:00.000Z_2022-01-16T00:00:00.158Z_112] at path[/druid/segments/9.138.162.20:8106_indexer-executor__default_tier_2022-01-15T23:22:41.747Z_c7cd5c7591a24f4cb29aef61d58c107d0]
2022-01-16T00:12:42,467 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [1] Segments
2022-01-16T00:12:42,649 ERROR [processing-0] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences!
java.lang.NullPointerException: null
at org.apache.druid.segment.realtime.FireHydrant.getSegmentForQuery(FireHydrant.java:180) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.segment.realtime.appenderator.SinkQuerySegmentWalker.lambda$null$3(SinkQuerySegmentWalker.java:216) ~[druid-server-0.22.0.jar:0.22.0]
at com.google.common.collect.Iterators$8.transform(Iterators.java:794) ~[guava-16.0.1.jar:?]
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) ~[guava-16.0.1.jar:?]
at org.apache.druid.query.SinkQueryRunners$1.next(SinkQueryRunners.java:56) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.query.SinkQueryRunners$1.next(SinkQueryRunners.java:46) ~[druid-processing-0.22.0.jar:0.22.0]
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:646) ~[guava-16.0.1.jar:?]
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.1.jar:?]
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.1.jar:?]
at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43) ~[guava-16.0.1.jar:?]
at com.google.common.collect.Iterators.addAll(Iterators.java:356) ~[guava-16.0.1.jar:?]
at com.google.common.collect.Lists.newArrayList(Lists.java:147) ~[guava-16.0.1.jar:?]
at com.google.common.collect.Lists.newArrayList(Lists.java:129) ~[guava-16.0.1.jar:?]
at org.apache.druid.query.ChainedExecutionQueryRunner$1.make(ChainedExecutionQueryRunner.java:92) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:39) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:78) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:86) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:170) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:43) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:152) ~[druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.22.0.jar:0.22.0]
at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:245) [druid-processing-0.22.0.jar:0.22.0]
at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:232) [druid-processing-0.22.0.jar:0.22.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_272]
at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.22.0.jar:0.22.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_272]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_272]
2022-01-16T00:12:42,683 ERROR [qtp191953464-298[groupBy_[monitor_alert]59354191-cf20-40fc-b6e9-a1a322a54a7f]] org.apache.druid.server.QueryLifecycle - Exception while processing queryId [59354191-cf20-40fc-b6e9-a1a322a54a7f] (java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException)
2022-01-16T00:12:42,802 ERROR [qtp191953464-298[groupBy[monitor_alert]_59354191-cf20-40fc-b6e9-a1a322a54a7f]] org.apache.druid.server.QueryResource - Exception handling request: {class=org.apache.druid.server.QueryResource, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException, query={"queryType":"groupBy","dataSource":{"type":"table","name":"monitor_alert_to_analysis"},"intervals":{"type":"segments","segments":[{"itvl":"2022-01-16T07:00:00.000Z/2022-01-16T08:00:00.000Z","ver":"2022-01-15T23:00:17.052Z","part":1499},{"itvl":"2022-01-16T08:00:00.000Z/2022-01-16T09:00:00.000Z","ver":"2022-01-16T00:00:00.158Z","part":112},{"itvl":"2022-01-16T08:00:00.000Z/2022-01-16T09:00:00.000Z","ver":"2022-01-16T00:00:00.158Z","part":268}]},"virtualColumns":[{"type":"expression","name":"v0","expression":"(("data_time" + 28800) * 1000)","outputType":"LONG"}],"filter":{"type":"and","fields":[{"type":"selector","dimension":"app_mark","value":"895_4455_cos_53","extractionFn":null},{"type":"selector","dimension":"metric","value":"total_req","extractionFn":null},{"type":"selector","dimension":"tag12","value":"[云][COS]","extractionFn":null},{"type":"selector","dimension":"tag13","value":"[COS]","extractionFn":null},{"type":"selector","dimension":"tag14","value":"[coshttpsvr]","extractionFn":null},{"type":"bound","dimension":"v0","lower":"1642313400000","upper":"1642320600000","lowerStrict":false,"upperStrict":false,"extractionFn":null,"ordering":{"type":"numeric"}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"tag20","outputName":"d0","outputType":"STRING"}],"aggregations":[],"postAggregations":[],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"applyLimitPushDown":false,"defaultTimeout":300000,"finalize":false,"fudgeTimestamp":"-4611686018427387904","groupByOutermost":false,"groupByStrategy":"v2","maxQueuedBytes":41841,"maxScatterGatherBytes":9223372036854775807,"queryFailTime":1642292259982,"queryId":"59354191-cf20-40fc-b6e9-a1a322a54a7f","resultAsArray":true,"sqlQueryId":"12b20021-2583-4763-864a-36d27086ab51","timeout":299544},"descending":false}, peer=9.138.162.166} (java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException)
The text was updated successfully, but these errors were encountered: