Fix deadlock while getting queue message count during cleanup #2476
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#2369 introduced a deadlock bug.
jstack shows:
Thread 1:
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.startPaging(PagingStoreImpl.java:481)
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.addSize(PagingStoreImpl.java:739)
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.nonDurableUp(PagingStoreImpl.java:952)
at org.apache.activemq.artemis.api.core.RefCountMessage.incrementRefCount(RefCountMessage.java:50)
at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.incrementDelayDeletionCount(LargeServerMessageImpl.java:149)
locked org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl@67359741
at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl$LargeMessageDeliverer.(ServerConsumerImpl.java:1171)
at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl$LargeMessageDeliverer.(ServerConsumerImpl.java:1151)
at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:431)
locked java.lang.Object@3f31a7b3
at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2809)
at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2196)
locked org.apache.activemq.artemis.core.server.impl.QueueImpl@6c2bd0dc
at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$1900(QueueImpl.java:105)
at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3001)
locked org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner@79dea1f9
Thread 2:
at org.apache.activemq.artemis.core.server.impl.QueueImpl.getScheduledCount(QueueImpl.java:1085)
blocked on org.apache.activemq.artemis.core.server.impl.QueueImpl@742b7e17
at org.apache.activemq.artemis.core.server.impl.QueueImpl.getMessageCount(QueueImpl.java:1077)
at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.deliverIfNecessary(PageCursorProviderImpl.java:610)
at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.cleanup(PageCursorProviderImpl.java:365)
locked org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl@5aa5010
at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl$1.run(PageCursorProviderImpl.java:288)
The cleanup thread held pagingStore lock and requested queue lock. The largeMessageDeliver held queue lock and requested pagingStore lock. Deadlock occurred.
Put queue::getMessageCount outside of pagingstore lock to fix the bug.