Avoid expensive findEntry call in segment metadata query #10892

abhishekagarwal87 · 2021-02-16T11:43:11Z

This PR fixes two performance issues in the timeline conversion, observed when an interval has a large number of segments. For such an interval,

Reading the timeline entries from the original timeline. PartitionHolder.asImmutable is an expensive operation since it involves a deep copy of the chunk. And on top of that, a deep copy is done for every segment in the interval which significantly amplifies the performance overhead O(N*N)
Adding the entries to the new timeline. This involves an isComplete call which too is O(N) for a single interval. Thus, writing to a new timeline is O(N*N) as well.

core/src/main/java/org/apache/druid/timeline/TimelineLookup.java

gianm · 2021-02-16T16:44:51Z

The change looks like a good idea to me. How about:

changing all call sites
removing findEntry and the PartitionHolder asImmutable method
adding a performance test that verifies that CachingClusteredClient is able to do its work in a reasonable amount of time even when there are some crazy number of segments in a time chunk, like 100,000? (That ought to be enough for anybody.) The "reasonable" amount of time might need to be somewhat high to prevent flaky tests from being an issue. Maybe 5–15 seconds. In reality, we'd want this to happen much faster than that, but we don't want to cause flaky tests.

gianm · 2021-02-16T18:14:08Z

Another thing: I see that asImmutable doesn't just return an immutable copy, it also calls OvershadowableManager.copyVisible to potentially not return all of the chunks. So cutting out the call to asImmutable would also cut out the call to copyVisible. I'm not sure what consequences that might have. Have you analyzed it and determined that it's OK to cut out that call?

maytasm · 2021-02-16T19:53:39Z

@abhishekagarwal87 Another idea I had is in the CachingClusteredClient.getQueryRunnerForSegments we can cache the result returned from timeline.findEntry(spec.getInterval(), spec.getVersion()). We can maintains a map while iterating through all the input specs and cache returned PartitionHolder<ServerSelector> for the interval/version pair. This is useful when the Iterable<SegmentDescriptor> specs contains a lot of same interval/version pair. For example, if new there are a lot of segments per interval, the specs can contains pretty much all a single interval/version pair.

abhishekagarwal87 · 2021-02-17T06:55:58Z

The change looks like a good idea to me. How about:

changing all call sites

removing findEntry and the PartitionHolder asImmutable method

adding a performance test that verifies that CachingClusteredClient is able to do its work in a reasonable amount of time even when there are some crazy number of segments in a time chunk, like 100,000? (That ought to be enough for anybody.) The "reasonable" amount of time might need to be somewhat high to prevent flaky tests from being an issue. Maybe 5–15 seconds. In reality, we'd want this to happen much faster than that, but we don't want to cause flaky tests.

Yes. I plan to remove all findEntry calls and fix tests accordingly. I will add the perf test as well. Thanks for the tip.

abhishekagarwal87 · 2021-02-17T07:04:10Z

Another thing: I see that asImmutable doesn't just return an immutable copy, it also calls OvershadowableManager.copyVisible to potentially not return all of the chunks. So cutting out the call to asImmutable would also cut out the call to copyVisible. I'm not sure what consequences that might have. Have you analyzed it and determined that it's OK to cut out that call?

Good catch. As far as I can tell, cutting down that call should not make a difference. copyVisible retains only visible chunks in knownPartitionChunks. However, in the getChunk call too, we are only returning a chunk if its visible. There shouldn't be a mismatch in behavior after the change.

abhishekagarwal87 · 2021-02-17T07:08:50Z

@abhishekagarwal87 Another idea I had is in the CachingClusteredClient.getQueryRunnerForSegments we can cache the result returned from timeline.findEntry(spec.getInterval(), spec.getVersion()). We can maintains a map while iterating through all the input specs and cache returned PartitionHolder<ServerSelector> for the interval/version pair. This is useful when the Iterable<SegmentDescriptor> specs contains a lot of same interval/version pair. For example, if new there are a lot of segments per interval, the specs can contains pretty much all a single interval/version pair.

yeah, I thought about that too. I didn't go ahead with that approach since that would require creating a new temporary state variable. For queries with many intervals, it could be that we end up creating many map entries with deep copies of OvershawdoableManager. what do you think?

abhishekagarwal87 · 2021-02-18T19:20:22Z

It would seem this change alone is not enough. Adding a chunk to timeline is also O(N*N) since it calls PartitionHolder.isComplete which is O(N) op.

maytasm · 2021-03-04T01:28:54Z

server/src/test/java/org/apache/druid/client/CachingClusteredClientPerfTest.java

+    );
+
+    for (int ii = 0; ii < segmentCount; ii++) {
+      segmentDescriptors.add(new SegmentDescriptor(interval, "1", ii));


For the purpose of perf testing, does it matter if the segments have:

different interval

different versions

The bottlenecks are in PartitionHolder when it has too many objects. If segments have different intervals and versions, the size of PartitionHolder will be much smaller and there is unlikely to be a perf issue.

maytasm · 2021-03-04T01:30:27Z

server/src/test/java/org/apache/druid/client/CachingClusteredClientPerfTest.java

+{
+
+  @Test(timeout = 10_000)
+  public void testGetQueryRunnerForSegments_singleIntervalLargeSegments()


Do we have perf different before and after this change?

It was under a second after the change. Before the change, it was more than a minute though I didn't let it run completely so it may have been very high.

maytasm · 2021-03-04T01:39:57Z

server/src/main/java/org/apache/druid/client/CachingClusteredClient.java

+          unfilteredIterator,
+          Objects::nonNull
+      );
+      // We add all the entries via batch add to avoid overhead of single add call. The call to add an entry to interval


I think the comment can be a little more clear in mentioning that by avoiding to call add n times and calling addAll once reduces O(n)*n to O(n)

I removed unnecessary details and just put that addAll is much more efficient than add.

maytasm · 2021-03-04T01:43:20Z

core/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java

  {
    lock.readLock().lock();
    try {
      for (Entry<Interval, TreeMap<VersionType, TimelineEntry>> entry : allTimelineEntries.entrySet()) {
        if (entry.getKey().equals(interval) || entry.getKey().contains(interval)) {
          TimelineEntry foundEntry = entry.getValue().get(version);
          if (foundEntry != null) {
-            return foundEntry.getPartitionHolder().asImmutable();


is the asImmutable method still needed?

Looks like its not. will delete this method as well as the class ImmutablePartitionHolder.

maytasm

LGTM. Few comments on perf improvement validation / testing

jihoonson · 2021-03-04T06:05:58Z

server/src/test/java/org/apache/druid/client/CachingClusteredClientPerfTest.java

+import static org.mockito.ArgumentMatchers.any;
+
+/**
+ * Performance tests for {@link CachingClusteredClient} can be added here. There is one test for a scenario


What performance does this class test? Is it a scalability test? It would be nice to make it clear.

any kind of perf test that doesn't require a real cluster.

jihoonson · 2021-03-04T06:08:19Z

core/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java

+
+  public static class PartitionChunkEntry<VersionType, ObjectType>
+  {
+    private final Interval interval;


Please add javadoc explaining what this interval is. There are two types of interval in timeline. See LogicalSegment.

jihoonson · 2021-03-04T06:12:42Z

core/src/test/java/org/apache/druid/timeline/VersionedIntervalTimelineTestBase.java

+      PartitionChunk<OvershadowableInteger> actual
+  )
+  {
+    SingleElementPartitionChunk<OvershadowableInteger> expectedSingle = (SingleElementPartitionChunk<OvershadowableInteger>) expected;


SingleElementPartitionChunk is used only by deprecated things such as NoneShardSpec and Tranquility. We should deprecate SingleElementPartitionChunk as well and stop using it. You can use NumberedPartitionChunk instead.

Since it's unrelated to my change, I will leave this one for now as it is. Though I did mark SingleElementPartitionChunk deprecated.

jihoonson · 2021-03-08T22:15:29Z

LGTM

* Avoid expensive findEntry call in segment metadata query * other places * Remove findEntry * Fix add cost * Refactor a bit * Add performance test * Add comment * Review comments * intellij (cherry picked from commit 489f5b1)

…che#10892)" This reverts commit 51a973a.

Revert "Avoid expensive findEntry call in segment metadata query (apache#10892)"

…ery (apache#10892)"" This reverts commit 97c45bc.

Revert "Revert "Avoid expensive findEntry call in segment metadata query (apache#10892)""

…che#10892)" This reverts commit 51a973a.

…ery (apache#10892)"" This reverts commit 97c45bc.

abhishekagarwal87 added 2 commits February 16, 2021 17:12

Avoid expensive findEntry call in segment metadata query

452fdd0

other places

4e6d143

abhishekagarwal87 commented Feb 16, 2021

View reviewed changes

core/src/main/java/org/apache/druid/timeline/TimelineLookup.java Outdated Show resolved Hide resolved

suneet-s added the WIP label Feb 17, 2021

jihoonson added Area - Querying Performance labels Feb 18, 2021

Remove findEntry

aef9f17

abhishekagarwal87 added 4 commits February 19, 2021 01:20

Fix add cost

386fed6

Refactor a bit

b2efed2

Add performance test

659c1de

Add comment

675a958

abhishekagarwal87 changed the title ~~[Draft] Avoid expensive findEntry call in segment metadata query~~ Avoid expensive findEntry call in segment metadata query Mar 1, 2021

maytasm reviewed Mar 4, 2021

View reviewed changes

maytasm approved these changes Mar 4, 2021

View reviewed changes

jihoonson reviewed Mar 4, 2021

View reviewed changes

suneet-s removed the WIP label Mar 4, 2021

abhishekagarwal87 added 2 commits March 8, 2021 12:46

Review comments

cfea19f

intellij

ec121c2

jihoonson approved these changes Mar 8, 2021

View reviewed changes

suneet-s merged commit 489f5b1 into apache:master Mar 9, 2021

harinirajendran mentioned this pull request May 3, 2021

Avoid expensive findEntry call in segment metadata query (#10892) confluentinc/druid#35

Closed

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

harinirajendran added a commit to confluentinc/druid that referenced this pull request Oct 1, 2021

Revert "Avoid expensive findEntry call in segment metadata query (apa…

97c45bc

…che#10892)" This reverts commit 51a973a.

Zohimi added a commit to confluentinc/druid that referenced this pull request Oct 1, 2021

Merge pull request #49 from confluentinc/harini-confluent

cffdf07

Revert "Avoid expensive findEntry call in segment metadata query (apache#10892)"

harinirajendran added a commit to confluentinc/druid that referenced this pull request Oct 4, 2021

Revert "Revert "Avoid expensive findEntry call in segment metadata qu…

cce3794

…ery (apache#10892)"" This reverts commit 97c45bc.

harinirajendran added a commit to confluentinc/druid that referenced this pull request Oct 4, 2021

Merge pull request #50 from confluentinc/harini-confluent

289e47d

Revert "Revert "Avoid expensive findEntry call in segment metadata query (apache#10892)""

harinirajendran added a commit to confluentinc/druid that referenced this pull request Feb 23, 2022

Revert "Avoid expensive findEntry call in segment metadata query (apa…

78cf68c

…che#10892)" This reverts commit 51a973a.

harinirajendran added a commit to confluentinc/druid that referenced this pull request Feb 23, 2022

Revert "Revert "Avoid expensive findEntry call in segment metadata qu…

25bef25

…ery (apache#10892)"" This reverts commit 97c45bc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid expensive findEntry call in segment metadata query #10892

Avoid expensive findEntry call in segment metadata query #10892

abhishekagarwal87 commented Feb 16, 2021 •

edited

Loading

gianm commented Feb 16, 2021 •

edited

Loading

gianm commented Feb 16, 2021

maytasm commented Feb 16, 2021

abhishekagarwal87 commented Feb 17, 2021

abhishekagarwal87 commented Feb 17, 2021

abhishekagarwal87 commented Feb 17, 2021

abhishekagarwal87 commented Feb 18, 2021

maytasm Mar 4, 2021

abhishekagarwal87 Mar 8, 2021

maytasm Mar 4, 2021

abhishekagarwal87 Mar 8, 2021

maytasm Mar 4, 2021

abhishekagarwal87 Mar 8, 2021

maytasm Mar 4, 2021

abhishekagarwal87 Mar 8, 2021

maytasm left a comment

jihoonson Mar 4, 2021

abhishekagarwal87 Mar 8, 2021

jihoonson Mar 4, 2021

jihoonson Mar 4, 2021 •

edited

Loading

abhishekagarwal87 Mar 8, 2021

jihoonson commented Mar 8, 2021

Avoid expensive findEntry call in segment metadata query #10892

Avoid expensive findEntry call in segment metadata query #10892

Conversation

abhishekagarwal87 commented Feb 16, 2021 • edited Loading

gianm commented Feb 16, 2021 • edited Loading

gianm commented Feb 16, 2021

maytasm commented Feb 16, 2021

abhishekagarwal87 commented Feb 17, 2021

abhishekagarwal87 commented Feb 17, 2021

abhishekagarwal87 commented Feb 17, 2021

abhishekagarwal87 commented Feb 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maytasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson Mar 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson commented Mar 8, 2021

abhishekagarwal87 commented Feb 16, 2021 •

edited

Loading

gianm commented Feb 16, 2021 •

edited

Loading

jihoonson Mar 4, 2021 •

edited

Loading