[pulsar-broker] Optimize message replay for large backlog consumer #3732

rdhabalia · 2019-03-02T04:41:39Z

Motivation

It address #3731.

Modification

Introduce ConcurrentSortedLongPairSet that helps broker to avoid random read across multiple managed-ledgers. It still uses ConcurrentLongPairSet to avoid object allocation for message-ids.
introduce <T> Set<T> items(int numberOfItems, BiFunction<Long, Long, T> longPairConverter) method in LongPairSet to avoid creating temporary LongPair objects.
this change will not impact to normal usecase where topic doesn't have backlog and doesn't have many active managed-ledgers.

Note

I have done performance testing on GrowablePriorityLongPairQueue which I had introduced sometime back but its insert/remove is super slow and CPU intensive so, we can't use it in this scenario.

merlimat · 2019-03-02T06:14:19Z

...ar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongPairSet.java

+    }
+
+    @Override
+    public <T> Set<T> items(int numberOfItems, BiFunction<Long, Long, T> longPairConverter) {


If we use some kind a specialized interface we can avoid 2 long -> Long conversions:

interface LongPairFunction<T> { T apply(long a, long b); }

yes, I wanted to create it custom interface to reduce number of obj creation but then I missed it. Good catch. I will fix it.

merlimat · 2019-03-02T06:15:15Z

...ar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongPairSet.java

+
+    @Override
+    public <T> Set<T> items(int numberOfItems, BiFunction<Long, Long, T> longPairConverter) {
+        Set<T> items = new HashSet<>();


Is it necessary to return a set? typically the iteration methods are used to go through it inline without intermediary collection.

yes, this method is used by PersistentDispatcherMultipleConsumers to prepare the sub-set of replay-messages and then it passes this set to cursor.asyncReplayEntries to read the entries. So, this method is not used for inline execution.

merlimat · 2019-03-02T06:21:30Z

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

@@ -288,8 +287,8 @@ public void readMoreEntries() {
                    return;
                }

-                Set<PositionImpl> messagesToReplayNow = messagesToReplay.items(messagesToRead).stream()
-                        .map(pair -> new PositionImpl(pair.first, pair.second)).collect(toSet());
+                Set<PositionImpl> messagesToReplayNow = messagesToReplay.items(messagesToRead,


Instead of having the messagesToReplay to be sorted, wouldn't be easier to just make the messagesToReplayNow to be a SortedSet ?

In this case, we don't care to have messagesToReplay to be always sorted, just when there is a redelivery is when we care.

wouldn't be easier to just make the messagesToReplayNow to be a SortedSet ?

No because messagesToReplayNow is a very small sub-set (100 msgs) of messagesToReplay (> 1M msgs) and the main issue is messagesToReplay can have more than million random messages and we want to read messages from the same ledger. making ONLY messagesToReplayNow sorted set will not help because there is a possibility that 100 read messages from messagesToReplay might be random from different ledgers and it will still perform random read across multiple ledgers.

Oh that's true.

rdhabalia · 2019-03-05T00:42:11Z

rerun java8 tests

merlimat

👍

rdhabalia · 2019-03-05T07:28:40Z

rerun java8 tests

rdhabalia · 2019-03-05T22:03:04Z

rerun java8 tests

rdhabalia · 2019-03-06T00:11:11Z

rerun java8 tests

add tests fix items arg with logPairFunction interface

rdhabalia · 2019-03-08T05:41:38Z

rerun integration tests

rdhabalia added the area/broker label Mar 2, 2019

rdhabalia added this to the 2.4.0 milestone Mar 2, 2019

rdhabalia self-assigned this Mar 2, 2019

rdhabalia requested review from merlimat, maskit, sijie and saandrews March 2, 2019 04:41

merlimat reviewed Mar 2, 2019

View reviewed changes

rdhabalia force-pushed the redel_opt branch from 14ebbc4 to 7f95c71 Compare March 4, 2019 22:43

merlimat approved these changes Mar 5, 2019

View reviewed changes

rdhabalia force-pushed the redel_opt branch from 7f95c71 to 400b408 Compare March 5, 2019 18:05

[pulsar-broker] Optimize redelivery message reading

5f865c2

add tests fix items arg with logPairFunction interface

rdhabalia force-pushed the redel_opt branch from 400b408 to 5f865c2 Compare March 8, 2019 00:19

sijie approved these changes Mar 8, 2019

View reviewed changes

sijie merged commit 4ce8b32 into apache:master Mar 8, 2019

codelipenghui mentioned this pull request May 17, 2019

[pulsar-broker: message-dispatching] Slow message redelivery for subscription with large backlog and random unack messages. #3731

Closed

rdhabalia deleted the redel_opt branch March 4, 2020 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pulsar-broker] Optimize message replay for large backlog consumer #3732

[pulsar-broker] Optimize message replay for large backlog consumer #3732

rdhabalia commented Mar 2, 2019

merlimat Mar 2, 2019

rdhabalia Mar 4, 2019

merlimat Mar 2, 2019

rdhabalia Mar 4, 2019

merlimat Mar 2, 2019

rdhabalia Mar 4, 2019

merlimat Mar 5, 2019

rdhabalia commented Mar 5, 2019

merlimat left a comment

rdhabalia commented Mar 5, 2019

rdhabalia commented Mar 5, 2019

rdhabalia commented Mar 6, 2019

rdhabalia commented Mar 8, 2019

[pulsar-broker] Optimize message replay for large backlog consumer #3732

[pulsar-broker] Optimize message replay for large backlog consumer #3732

Conversation

rdhabalia commented Mar 2, 2019

Motivation

Modification

Note

merlimat Mar 2, 2019

Choose a reason for hiding this comment

rdhabalia Mar 4, 2019

Choose a reason for hiding this comment

merlimat Mar 2, 2019

Choose a reason for hiding this comment

rdhabalia Mar 4, 2019

Choose a reason for hiding this comment

merlimat Mar 2, 2019

Choose a reason for hiding this comment

rdhabalia Mar 4, 2019

Choose a reason for hiding this comment

merlimat Mar 5, 2019

Choose a reason for hiding this comment

rdhabalia commented Mar 5, 2019

merlimat left a comment

Choose a reason for hiding this comment

rdhabalia commented Mar 5, 2019

rdhabalia commented Mar 5, 2019

rdhabalia commented Mar 6, 2019

rdhabalia commented Mar 8, 2019