Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory limiter for the add entry request to avoid OOM #3139

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

zymap
Copy link
Member

@zymap zymap commented Mar 25, 2022


Master issue: #3231

Motivation

When there has a bookie is slow to respond the add entry request,
and the AQ is smaller than WQ, the client will hole the entry buffer
and that will cause the memory won't be released.

More context:
apache/pulsar#14861

Modifications

Add the memory limit for the client to the send add request.

@zymap zymap self-assigned this Mar 25, 2022
@zymap
Copy link
Member Author

zymap commented Mar 25, 2022

The test come with next commit

@zymap zymap changed the title Add memory limiter for the add entry request to avoid OOM [WIP] Add memory limiter for the add entry request to avoid OOM Mar 25, 2022
@dlg99
Copy link
Contributor

dlg99 commented Mar 28, 2022

@zymap Have you tried configuring backpressure for the BK client and server? I think it covers your case.

@zymap
Copy link
Member Author

zymap commented Mar 28, 2022

@dlg99 Thanks for your information. But looks like the maxAddsInProgressLimit only applied on the v3 request?

@zymap
Copy link
Member Author

zymap commented Apr 21, 2022

I tried the backpressure with V2 protocol. It doesn't work. Because it is a server limitation, when there has a server slow to respond, the other server will respond successfully, but the client will hold the entry buffer until the slowest response is returned.

Same issue: https://lists.apache.org/thread/r2yh6gxgnhzzz35o63db040wt2t6o108

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some feedback.

Please add tests as well.

@merlimat FYI

@@ -103,6 +106,7 @@ public class BookieClientImpl implements BookieClient, PerChannelBookieClientFac
private final BookieAddressResolver bookieAddressResolver;

private final long bookieErrorThresholdPerInterval;
private Optional<MemoryLimitController> memoryLimitController;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

@@ -378,6 +414,31 @@ public void safeRun() {
}
}

private static class WriteAndFlushCallbackImpl implements WriteAndFlushCallback {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be probably recycled

private Optional<WriteAndFlushCallback> setMemoryLimit(final long entryId, final long entrySize) throws InterruptedException {
if (getMemoryLimitController().isPresent()) {
MemoryLimitController mlc = getMemoryLimitController().get();
mlc.reserveMemory(entrySize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle this with a timeout otherwise we may block forever?

@@ -200,6 +200,10 @@ public class ClientConfiguration extends AbstractConfiguration<ClientConfigurati
protected static final String CLIENT_CONNECT_BOOKIE_UNAVAILABLE_LOG_THROTTLING =
"clientConnectBookieUnavailableLogThrottling";

// client memory limit options
protected static final String CLIENT_MEMORY_LIMIT_ENABLED = "clientMemoryLimitEnabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is only for writes.
We should reflect this in the name

@hangc0276
Copy link
Contributor

This issue has been resolved by #3321 , #3323 , #3324 , we can close this PR.

@hangc0276 hangc0276 closed this Jul 25, 2022
@eolivelli
Copy link
Contributor

@hangc0276 I don't think that the issue is resolved.
We still have the problem that you may have a unbounded number of pending writes

@zymap what do you think ?

@eolivelli eolivelli reopened this Sep 27, 2022
@zymap zymap marked this pull request as draft September 28, 2022 07:12
---

*Motivation*

When there has a bookie is slowly to respond the add entry request,
and the AQ is smaller than WQ, the client will hole the entry buffer
and that will cause the memory won't be released.

More context:
apache/pulsar#14861

*Modifications*

Add the memory limit for the client to send add request.
@zymap
Copy link
Member Author

zymap commented Oct 11, 2022

The test output looks like works as expected, I will try to add more tests and clean the code.

logs
2022-10-11T16:56:19,449 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$1@68] - Write state changed to false
2022-10-11T16:56:19,449 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:19,449 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 0
2022-10-11T16:56:20,400 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 1024
2022-10-11T16:56:20,400 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 2048
2022-10-11T16:56:20,400 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 3072
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 4096
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 5120
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 6144
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 7168
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 8192
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 9216
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:BookieClientMemoryCounterTest$1@68] - Write state changed to true
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:BookieClientMemoryCounterTest@78] - wait for the memory released
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 8192
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 7168
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 6144
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 5120
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 4096
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 3072
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 2048
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 1024
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$1@68] - Write state changed to false
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 0
2022-10-11T16:56:21,403 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 1024
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 2048
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 3072
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 4096
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 5120
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 6144
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 7168
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 8192
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 9216
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:BookieClientMemoryCounterTest$1@68] - Write state changed to true
2022-10-11T16:56:21,405 - INFO  - [Time-limited test:BookieClientMemoryCounterTest@78] - wait for the memory released
2022-10-11T16:56:21,454 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:21,455 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 8192

@zymap zymap changed the title [WIP] Add memory limiter for the add entry request to avoid OOM Add memory limiter for the add entry request to avoid OOM Oct 18, 2022
@zymap zymap marked this pull request as ready for review October 18, 2022 12:15
}

public long getWriteMemoryLowWaterMark() {
return getInt(WRITE_MEMORY_LOW_WATER_MARK, 64 * 1024 * 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLong?

}

public long getWriteMemoryHighWaterMark() {
return getInt(WRITE_MEMORY_HIGH_WATER_MARK, 256 * 1024 * 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLong?

*/
public class WriteWaterMark {
private static final long DEFAULT_LOW_WATER_MARK = 1610612736L; // 1.5GB
private static final long DEFAULT_HIGH_WATER_MARK = 2147483648L; // 2GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the default lowWaterMark=64MB and HighWaterMark=256MB in ClientConfiguration, but default lowWaterMark=1.5GB and HighWaterMark=2GB in WriteWaterMark?

@Slf4j
public class WriteMemoryCounter {
private final WriteWaterMark writeWaterMark;
private AtomicLong sizeCounter = new AtomicLong(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final type?


public void decrementPendingWriteBytes(long size) {
long usage = sizeCounter.addAndGet(-size);
log.info("decrement the size to {}", usage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to debug level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants