Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in follower #7744

Closed
korthout opened this issue Sep 1, 2021 · 5 comments · Fixed by #7762
Closed

Possible memory leak in follower #7744

korthout opened this issue Sep 1, 2021 · 5 comments · Fixed by #7762
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround

Comments

@korthout
Copy link
Member

korthout commented Sep 1, 2021

Describe the bug

Medic benchmark CW35 is showing signs of a possible memory leak.

See Process memory usage and JVM memory usage. Note that this occurs on 1 of the followers (zeebe-2, leader is zeebe-1). There have not been any role changes.

Screen Shot 2021-09-01 at 17 34 38

Screen Shot 2021-09-01 at 17 35 37

See Grafana medic-cw-35-01f895df7-benchmark for more details.

@korthout korthout added the kind/bug Categorizes an issue or PR as a bug label Sep 1, 2021
@korthout korthout added this to the Build State on Followers milestone Sep 1, 2021
@npepinpe npepinpe added Impact: Memory Consumption scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround labels Sep 2, 2021
@npepinpe npepinpe added this to Ready in Zeebe Sep 2, 2021
@deepthidevaki deepthidevaki self-assigned this Sep 2, 2021
@deepthidevaki
Copy link
Contributor

deepthidevaki commented Sep 2, 2021

Observed high direct memory use in both followers

2021-08-31 17:10:02.351 CEST
Exception in thread "Broker-2-zb-actors-3" java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at io.camunda.zeebe.util.allocation.DirectBufferAllocator.allocate(DirectBufferAllocator.java:20) at io.camunda.zeebe.util.allocation.BufferAllocators.allocateDirect(BufferAllocators.java:16) at io.camunda.zeebe.dispatcher.DispatcherBuilder.initAllocatedBuffer(DispatcherBuilder.java:147) at io.camunda.zeebe.dispatcher.DispatcherBuilder.build(DispatcherBuilder.java:93) at io.camunda.zeebe.logstreams.impl.log.LogStreamImpl.openAppender(LogStreamImpl.java:280) at io.camunda.zeebe.logstreams.impl.log.LogStreamImpl.createWriter(LogStreamImpl.java:208) at io.camunda.zeebe.logstreams.impl.log.LogStreamImpl.lambda$newLogStreamBatchWriter$1(LogStreamImpl.java:115) at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:191)

zeebe-1 is the leader for all partition.
image

Heap OOM could be just a consequence of OOM in direct memory.

@deepthidevaki
Copy link
Contributor

Also found many StreamProcessor instance in the heap dump. Most of them are already closed (checked their state), but not garbage collected.

image

image

Cyclic references??
image

@deepthidevaki
Copy link
Contributor

image

@deepthidevaki
Copy link
Contributor

Here https://github.com/camunda-cloud/zeebe/blob/a3156320fdb56f7fb6c252b37da32a75dfebd059/broker/src/main/java/io/camunda/zeebe/broker/engine/impl/PartitionCommandSenderImpl.java#L34
we are adding TopologyPartitionListener as a listener to TopologyManager. But we never remove it when we close the leader.

@deepthidevaki
Copy link
Contributor

It is not just the topology listener. StreamProcessor and LogStream are not garbage collected due to "several" listeners registered but never removed when a role is closed. So after a each transition, it is accumulated and eventually gits OOM on direct memory.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants