Avoid unnecessary cache building for cachingCost #12465

AmatyaAvadhanula · 2022-04-20T16:50:05Z

Description

CachingCostBalancerStrategy can be inefficient when there are a large number of segments in the load / drop queue.

It builds a cache which takes O(N ^ 2) and computes it N times in the process of loading N segments.

This can be avoided by simply computing and adding the pairwise costs in O(N) computed N times.

Key changed/added classes in this PR

CachingCostBalancerStrategy

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

…_cache_cachingCost

kfaraz

Thanks for trying to solve this, @AmatyaAvadhanula !

I would suggest breaking up this PR into two parts.

First for interned intervals. This would require a perf evaluation of how interning the interval affects both computation time and memory footprint.
Second for removing the cache building step. This would require an analysis/proof of how the cost computed by the new method is the same as that computed by the existing method using the cache chain. There should be tests around this proof too. It would also be nice to have some perf evaluation in this PR as there would be a marked decrease in the number of ephemeral objects created and also compute times.

kfaraz · 2022-05-18T06:31:17Z

core/src/main/java/org/apache/druid/timeline/SegmentId.java

@@ -80,6 +79,12 @@ public final class SegmentId implements Comparable<SegmentId>
   */
  private static final Interner<String> STRING_INTERNER = Interners.newWeakInterner();

+  /**
+   * Store Intervals since creating them each time before returning is an expensive operation


Thanks for adding this!

kfaraz · 2022-05-18T06:33:20Z

core/src/main/java/org/apache/druid/timeline/SegmentId.java

-    this.intervalStartMillis = interval.getStartMillis();
-    this.intervalEndMillis = interval.getEndMillis();
-    this.intervalChronology = interval.getChronology();
+    this.interval = INTERVAL_INTERNER.intern(interval);


Can interval ever be null here?
If not, we can add Objects.requireNonNull similar to the datasource validation in the previous line.

kfaraz · 2022-05-18T06:37:24Z

server/src/main/java/org/apache/druid/server/coordinator/CachingCostBalancerStrategy.java

@@ -70,10 +70,19 @@ protected double computeCost(DataSegment proposalSegment, ServerHolder server, b
    return cost * (server.getMaxSize() / server.getAvailableSize());
  }

-  private ClusterCostCache costCacheForLoadingSegments(ServerHolder server)
+  private double costCacheForLoadingSegments(ServerHolder server, DataSegment proposalSegment)


Nit: Rename to computeCostForLoadingSegmentOnServer

imply-cheddar · 2022-06-17T11:49:06Z

Doing just the interval interning would make this mergeable, definitely do a separate PR for the caching as the correctness of that is less clear.

We should probably include the flamegraphs that led us to make this code change in this PR.

In terms of memory consumption, the fields being stored on SegmentId are the exact same as what an Interval stores, by interning and reusing the same reference, given that the same interval tends to show up a lot, we should actually save on memory consumption versus increase it while also improving performance.

AmatyaAvadhanula · 2023-06-27T04:47:43Z

Closing since #14484 deprecates cachingCost

Avoid unnecessary cache building for cachingCost

ad258f5

FrankChen021 added Area - Segment Balancing/Coordination Performance labels Apr 21, 2022

AmatyaAvadhanula added 2 commits May 6, 2022 15:15

Merge remote-tracking branch 'upstream/master' into feature-redundant…

679962e

…_cache_cachingCost

Store intervals in SegmentId with interning

5efc03a

kfaraz requested changes May 20, 2022

View reviewed changes

AmatyaAvadhanula closed this Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid unnecessary cache building for cachingCost #12465

Avoid unnecessary cache building for cachingCost #12465

AmatyaAvadhanula commented Apr 20, 2022 •

edited

Loading

kfaraz left a comment

kfaraz May 18, 2022

kfaraz May 18, 2022

kfaraz May 18, 2022

imply-cheddar commented Jun 17, 2022

AmatyaAvadhanula commented Jun 27, 2023

Avoid unnecessary cache building for cachingCost #12465

Avoid unnecessary cache building for cachingCost #12465

Conversation

AmatyaAvadhanula commented Apr 20, 2022 • edited Loading

Description

Key changed/added classes in this PR

kfaraz left a comment

Choose a reason for hiding this comment

kfaraz May 18, 2022

Choose a reason for hiding this comment

kfaraz May 18, 2022

Choose a reason for hiding this comment

kfaraz May 18, 2022

Choose a reason for hiding this comment

imply-cheddar commented Jun 17, 2022

AmatyaAvadhanula commented Jun 27, 2023

AmatyaAvadhanula commented Apr 20, 2022 •

edited

Loading