Fix thread safety issue and add cache to EmptySegmentPruner by Jackie-Jiang · Pull Request #7828 · apache/pinot

Jackie-Jiang · 2021-11-25T00:17:34Z

Currently EmptySegmentPruner might have race condition where _emptySegments can be modified and accessed at the same time. Fix it by using a concurrent set for _emptySegments.
Also add cache to the pruner so that no need to recalculate the result given the same input segments.

richardstartin · 2021-11-25T00:36:58Z

Why not use a Sets.newConcurrentHashSet() rather than copy on write? That would simplify this a lot.

codecov-commenter · 2021-11-25T00:47:33Z

Codecov Report

Merging #7828 (0b26ea9) into master (1e25962) will decrease coverage by 0.02%.
The diff coverage is 84.37%.

@@             Coverage Diff              @@
##             master    #7828      +/-   ##
============================================
- Coverage     71.67%   71.64%   -0.03%     
  Complexity     4089     4089              
============================================
  Files          1579     1579              
  Lines         80843    80851       +8     
  Branches      12010    12017       +7     
============================================
- Hits          57944    57929      -15     
- Misses        18995    19018      +23     
  Partials       3904     3904

Flag	Coverage Δ
integration1	`29.29% <31.25%> (-0.17%)`	⬇️
integration2	`27.92% <31.25%> (-0.12%)`	⬇️
unittests1	`68.66% <ø> (+<0.01%)`	⬆️
unittests2	`14.62% <84.37%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...oker/routing/segmentpruner/EmptySegmentPruner.java	`91.37% <84.37%> (-8.63%)`	⬇️
...data/manager/realtime/SegmentCommitterFactory.java	`64.70% <0.00%> (-29.42%)`	⬇️
...er/api/resources/LLCSegmentCompletionHandlers.java	`53.46% <0.00%> (-8.92%)`	⬇️
...altime/ServerSegmentCompletionProtocolHandler.java	`49.52% <0.00%> (-8.58%)`	⬇️
...e/data/manager/realtime/SplitSegmentCommitter.java	`53.84% <0.00%> (-7.70%)`	⬇️
...nction/DistinctCountBitmapAggregationFunction.java	`47.66% <0.00%> (-7.26%)`	⬇️
.../pinot/core/query/scheduler/PriorityScheduler.java	`80.82% <0.00%> (-2.74%)`	⬇️
...ot/common/protocols/SegmentCompletionProtocol.java	`93.80% <0.00%> (-1.91%)`	⬇️
...r/validation/RealtimeSegmentValidationManager.java	`74.28% <0.00%> (-1.43%)`	⬇️
...a/org/apache/pinot/core/common/DataBlockCache.java	`96.21% <0.00%> (-0.76%)`	⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e25962...0b26ea9. Read the comment docs.

Jackie-Jiang · 2021-11-25T01:10:29Z

@richardstartin I actually considered that, and decided to go with the snapshot approach because we want to optimize the query runtime method as much as possible

richardstartin · 2021-11-25T07:01:44Z

I actually considered that, and decided to go with the snapshot approach because we want to optimize the query runtime method as much as possible

I don’t think this is likely to be an optimisation unless the removeAll call is made frequently and the number of empty segments is large.

I think when an optimisation makes the code much harder to read we need 4 bits of data:

evidence that there is a problem - e.g. wall time thresholded tracing events around EmptySegmentPruner.prune, or a sampling profiler reporting lots of samples in this method
evidence that the thing being optimised causes the problem - assuming we have data that there is a problem, is it the removeAll call (which using a HashSet may optimise relative to a concurrent variant) or the copy on line 151 which is the cause?
evidence the problem is common - e.g. statistics about number of empty segments from a real cluster, do we have a metric for this?
evidence the optimisation is effective - for widely observed set sizes, does it actually make a difference to have a HashSet instead of a concurrent HashSet?

I raise this only because the code is much harder to read than if it used a threadsafe Set. If it were as readable I would be inclined to assume that the optimisation is effective.

jackjlli · 2021-11-29T20:28:48Z

...t-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/EmptySegmentPruner.java

+
+    // Return the cached result when the input is the same reference
+    ResultCache resultCache = _resultCache;
+    if (resultCache != null && resultCache._inputSegments == segments) {


use resultCache != null && resultCache._inputSegments.equals(segments) here?

This is part of the COW mechanism, it's checking that the result cache wasn't built by some other racing thread, it has to be the same reference and the contents are irrelevant. This is why I challenge whether the performance justifies the complexity, because it's fairly subtle.

We don't want to compare the actual content because it can be expensive

Jackie-Jiang · 2021-11-29T22:03:41Z

@richardstartin I didn't think from the readability perspective. I agree there is no clear evidence on the optimization. Changed it to use the concurrent set.

richardstartin

Even I can read it now 👍

) Currently EmptySegmentPruner might have race condition where _emptySegments can be modified and accessed at the same time. Fix it by using a concurrent set for _emptySegments. Also add cache to the pruner so that no need to recalculate the result given the same input segments.

Fix thread safety issue and add cache to EmptySegmentPruner

6b68207

Jackie-Jiang requested a review from npawar November 25, 2021 00:17

jackjlli reviewed Nov 29, 2021

View reviewed changes

Use Concurrent Set

0b26ea9

richardstartin approved these changes Nov 29, 2021

View reviewed changes

jackjlli approved these changes Nov 29, 2021

View reviewed changes

Jackie-Jiang merged commit 9ba03e4 into apache:master Nov 29, 2021

Jackie-Jiang deleted the empty_segment_pruner_cache branch November 29, 2021 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix thread safety issue and add cache to EmptySegmentPruner#7828

Fix thread safety issue and add cache to EmptySegmentPruner#7828
Jackie-Jiang merged 2 commits intoapache:masterfrom
Jackie-Jiang:empty_segment_pruner_cache

Jackie-Jiang commented Nov 25, 2021 •

edited

Loading

Uh oh!

richardstartin commented Nov 25, 2021

Uh oh!

codecov-commenter commented Nov 25, 2021 •

edited

Loading

Uh oh!

Jackie-Jiang commented Nov 25, 2021

Uh oh!

richardstartin commented Nov 25, 2021 •

edited

Loading

Uh oh!

jackjlli Nov 29, 2021

Uh oh!

richardstartin Nov 29, 2021

Uh oh!

Jackie-Jiang Nov 29, 2021

Uh oh!

Jackie-Jiang commented Nov 29, 2021

Uh oh!

richardstartin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Jackie-Jiang commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardstartin commented Nov 25, 2021

Uh oh!

codecov-commenter commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jackie-Jiang commented Nov 25, 2021

Uh oh!

richardstartin commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackjlli Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

richardstartin Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang commented Nov 29, 2021

Uh oh!

richardstartin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jackie-Jiang commented Nov 25, 2021 •

edited

Loading

codecov-commenter commented Nov 25, 2021 •

edited

Loading

richardstartin commented Nov 25, 2021 •

edited

Loading