-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][client] Fix producer thread block forever on memory limit controller #21790
[fix][client] Fix producer thread block forever on memory limit controller #21790
Conversation
pulsar-client/src/main/java/org/apache/pulsar/client/impl/MemoryLimitController.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/MemoryLimitController.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/MemoryLimitController.java
Outdated
Show resolved
Hide resolved
Great catch @wenbingshen ! this has been a long outstanding issue. |
pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageContainerImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Thanks
/pulsarbot rerun-failure-checks |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21790 +/- ##
============================================
+ Coverage 73.42% 73.58% +0.15%
+ Complexity 32795 32247 -548
============================================
Files 1897 1858 -39
Lines 140656 138021 -2635
Branches 15491 15111 -380
============================================
- Hits 103282 101561 -1721
+ Misses 29297 28607 -690
+ Partials 8077 7853 -224
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe modify the method tryReserveMemory
and forceReserveMemory
like below is better?
public void forceReserveMemory(long size) {
if (size < 0) {
releaseMemory(-size);
return;
}
// do original implementation
....
}
public void tryReserveMemory(long size) {
if (size < 0) {
releaseMemory(-size);
return;
}
// do original implementation
....
}
I think it's not a good idea. It's better to add validation and throw IllegalArgumentException if the value is negative If there would have been validation, the API would have never been misused and the issue wouldn't have happened in the past. |
Sure |
yes, agree |
@lhotari Good idea! I think we can add validation that the value is negative and throw an IllegalArgumentException by submitting another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhotari Good idea! I think we can add validation that the value is negative and throw an IllegalArgumentException by submitting another PR.
At least, we should add a WARN log when calling tryReserveMemory/forceReserveMemory
with a negative value
@poorbarcode I will add an assert after later. |
assert in separate follow up PR
@lhotari @wenbingshen @Technoboy-
I never see the following PR. To prevent new PRs from incorrectly using |
…oller (apache#21790) (cherry picked from commit 99d06b9) (cherry picked from commit 04ed338)
…oller (apache#21790) (cherry picked from commit 99d06b9) (cherry picked from commit 04ed338)
Motivation
When the modification of the current PR is rolled back, the following unit test will fail, and the main producer thread will be stuck forever in org.apache.pulsar.client.impl.MemoryLimitController#reserveMemory condition.await();
According to my investigation this is related to this PR #17936
When the producer turns on the following parameters and uses asynchronous sending:
.compressionType(CompressionType.SNAPPY)
.blockIfQueueFull(true)
.enableBatching(true)
Set .memoryLimit(1, SizeUnit.KILO_BYTES)); small enough to make it easier to reproduce the problem.
BatchMessageContainer will apply for a batchAllocatedSize from MemoryLimitController when building BatchMsgMetadata for the first time. At this time, MemoryLimitController.currentUsage=(msg payload size + batchedMessageMetadataAndPayload size),
The main production thread continues to send data asynchronously, but at this time the MemoryLimitController has reached the memoryLimit limit, and the main thread will be stuck in condition.await(); waiting to wake up;
When the batch message compression is completed, the BatchMessageContainer will call updateAndReserveBatchAllocatedSize again. At this time, the memory will be released to the MemoryLimitController to the actual batch message size, but it will not wake up those threads that are blocked and waiting due to insufficient memory.
After the batch message is sent, call org.apache.pulsar.client.impl.MemoryLimitController#releaseMemory to release the batch size memory. However, since the size released by MemoryLimitController.currentUsage + is smaller than memoryLimit, no threads are awakened.
The main production thread is always stuck in condition.await();
Documentation
doc
doc-required
doc-not-needed
doc-complete