Ensure read-lock is not continuously held on a section while iterating over concurrent maps #9787

merlimat · 2021-03-03T19:36:01Z

Motivation

As discussed in #9764 the fact that we're potentially holding a read-lock while scanning through a section of the map has several implications:

If the process functions is taking a long time (eg: making a blocking request to ZK that might even timeout), the writes operations on that section of the map are stalled during that time.
It's deadlock prone:
1. If a thread tries to use the map while scanning through it can deadlock itself
2. If the processing operation waits for the completion of some operation from a different thread and that thread tries to use the same map, it can create a deadlock.

Instead of holding the lock throughout the scan of the section, we should instead release the read lock before calling the processing function, going back into the optimistic read mode.

This will not add any overhead (in terms of volatile reads) compared to the current implementation, but will avoid all the possible deadlock traps, since we're never going to be holding the lock while calling the user code.

…g over concurrent maps

lhotari · 2021-03-03T20:19:35Z

...ar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentOpenHashMap.java

-            } finally {
-                if (acquiredReadLock) {
+                    storedKey = (K) table[bucket];
+                    storedValue = (V) table[bucket + 1];
                    unlockRead(stamp);


try-finally for unlock? (Same comment to all unlock locations)

Sure, I think in all places here it should be guaranteed to not throw, but makes sense to do it as general practice.

lhotari · 2021-03-03T20:26:04Z

Nice! Can't wait to try it out.

eolivelli · 2021-03-04T06:36:21Z

Sorry. I came too late to the party.
Btw I believe this approach is good

+1

…g over concurrent maps (apache#9787) * Ensure read-lock is not continuously held on a section while iterating over concurrent maps * Added try/finally

lhotari · 2021-03-08T08:32:34Z

btw. I happened to come across #8877 which was a fix to a deadlock.
This PR would also prevent such problems.

…g over concurrent maps (#9787) * Ensure read-lock is not continuously held on a section while iterating over concurrent maps * Added try/finally

…g over concurrent maps (apache#9787) * Ensure read-lock is not continuously held on a section while iterating over concurrent maps * Added try/finally (cherry picked from commit 4c369c9)

### Motivation In several places in the code when iterating over the custom hashmaps, we are taking over a copy of the map. This was done every time the iteration could end up modifying the map, since there was a non-reentrant mutex taken during the iteration. Any modification would lead to a deadlock. Since the behavior was changed in #9787 to not hold the section mutex during the iteration, there's no more need to make a copy of the maps.

### Motivation In several places in the code when iterating over the custom hashmaps, we are taking over a copy of the map. This was done every time the iteration could end up modifying the map, since there was a non-reentrant mutex taken during the iteration. Any modification would lead to a deadlock. Since the behavior was changed in apache#9787 to not hold the section mutex during the iteration, there's no more need to make a copy of the maps.

Fixes #618 ### Motivation See #618 (comment) for the deadlock analysis. ### Modifications - Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved. - Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking. - Run a single cursor expire task instead one task per channel, since #404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.

Fixes streamnative#618 ### Motivation See streamnative#618 (comment) for the deadlock analysis. ### Modifications - Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved. - Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking. - Run a single cursor expire task instead one task per channel, since streamnative#404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.

Fixes #618 ### Motivation See #618 (comment) for the deadlock analysis. ### Modifications - Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved. - Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking. - Run a single cursor expire task instead one task per channel, since #404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.

### Motivation In several places in the code when iterating over the custom hashmaps, we are taking over a copy of the map. This was done every time the iteration could end up modifying the map, since there was a non-reentrant mutex taken during the iteration. Any modification would lead to a deadlock. Since the behavior was changed in apache#9787 to not hold the section mutex during the iteration, there's no more need to make a copy of the maps.

yws-tracy

hi mrlimat @merlimat @eolivelli @lhotari :
I do not quite understand enhance for loop is replaced with "for (int i = 0...) " in this mr, as I know for array loop, enhanced for loop is quite equivalent to "for (int i = 0...) ", Is there any other purpose？ look forward to your reply,
thanks a lot

Ensure read-lock is not continuously held on a section while iteratin…

9dccf8a

…g over concurrent maps

merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Mar 3, 2021

merlimat added this to the 2.8.0 milestone Mar 3, 2021

merlimat requested review from sijie, srkukarni, rdhabalia, jerrypeng and codelipenghui March 3, 2021 19:36

merlimat self-assigned this Mar 3, 2021

merlimat mentioned this pull request Mar 3, 2021

[Common] Reduce duration of locks when processing items held in ConcurrentOpenHashMaps #9764

Closed

lhotari reviewed Mar 3, 2021

View reviewed changes

Added try/finally

b7360a1

codelipenghui approved these changes Mar 4, 2021

View reviewed changes

merlimat merged commit c86f2f7 into apache:master Mar 4, 2021

merlimat deleted the for-each branch March 4, 2021 05:26

lhotari mentioned this pull request Mar 5, 2021

[pulsar-broker] Allow broker to discover and unblock stuck subscription #9789

Merged

This was referenced Mar 11, 2021

Catastrophic frequent random subscription freezes, especially on high-traffic topics. #6054

Closed

Flaky-test: ReplicatorTest.testReplication #9458

Closed

merlimat added the release/2.7.3 label May 22, 2021

merlimat mentioned this pull request May 22, 2021

Ensure all the ReadHandle gets properly closed on cache invalidation #10659

Merged

merlimat mentioned this pull request May 24, 2021

Avoid making copies of internal maps when iterating #10691

Merged

codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label May 27, 2021

lhotari mentioned this pull request Jul 8, 2021

[broker] Fix issue that message ordering could be broken when redelivering messages on Key_Shared subscription #10762

Merged

yws-tracy reviewed Jan 3, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure read-lock is not continuously held on a section while iterating over concurrent maps #9787

Ensure read-lock is not continuously held on a section while iterating over concurrent maps #9787

merlimat commented Mar 3, 2021

lhotari Mar 3, 2021

merlimat Mar 3, 2021

lhotari commented Mar 3, 2021

eolivelli commented Mar 4, 2021

lhotari commented Mar 8, 2021

yws-tracy left a comment •

edited

Loading

Ensure read-lock is not continuously held on a section while iterating over concurrent maps #9787

Ensure read-lock is not continuously held on a section while iterating over concurrent maps #9787

Conversation

merlimat commented Mar 3, 2021

Motivation

lhotari Mar 3, 2021

Choose a reason for hiding this comment

merlimat Mar 3, 2021

Choose a reason for hiding this comment

lhotari commented Mar 3, 2021

eolivelli commented Mar 4, 2021

lhotari commented Mar 8, 2021

yws-tracy left a comment • edited Loading

Choose a reason for hiding this comment

yws-tracy left a comment •

edited

Loading