Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak #7762

Merged
5 commits merged into from
Sep 6, 2021
Merged

Fix memory leak #7762

5 commits merged into from
Sep 6, 2021

Conversation

deepthidevaki
Copy link
Contributor

Description

When we install services for a role, it installs several listeners. Most of them were not removed when these services are closed. References to StreamProcessor and LogStream are kept indirectly via these listeners. Thus they are not garbage collected. This lead to OOM in direct memory and eventually OOM in heap.

This PR fixes it by unregistering listeners when closing services.
After this fix, StreamProcessor is garbage collected. Following is from a heapdump after several role transitions in a broker
image

Related issues

closes #7744

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/0.25) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement

After transition, StreamProcessor and LogStream objects are not garbage collected because they are referenced from AsyncSnapshotDirector. AsyncSnapshotDirector was referenced by the registered listener.
The listener is registerd by ReplayStateMachine when started, but never removed. This is currently not a problem because logStream is closed and recreated. But this is a potential cause for memory leak in future where we want to have a long living logstream object.
After a role transition, Exporter, StreamProcessor and LogStream objects were not garbage collected because of indirect references from the listener in MessagingService.
After a role transition, StreamProcessor objects were not garbage collected due to references from this listener leading to memory leak.
@deepthidevaki
Copy link
Contributor Author

A cluster is running with this fix under namespace "dd-7744-mem-leak".

Copy link
Member

@saig0 saig0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 Thanks 🥞

@deepthidevaki
Copy link
Contributor Author

I will leave the cluster running to check if this fully fixes the issue. But will merge this PR anyway.

bors r+

@ghost
Copy link

ghost commented Sep 6, 2021

Build succeeded:

@ghost ghost merged commit 73de24c into develop Sep 6, 2021
@ghost ghost deleted the dd-7744-mem-leak branch September 6, 2021 13:00
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2021

Backport failed for stable/1.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin stable/1.0
git worktree add -d .worktree/backport-7762-to-stable/1.0 origin/stable/1.0
cd .worktree/backport-7762-to-stable/1.0
git checkout -b backport-7762-to-stable/1.0
ancref=$(git merge-base aef2ecf940c0cb2ec807b0d972e448337328d036 4459ea19d146a8cb32d108fa93c0b09b1b661f7d)
git cherry-pick -x $ancref..4459ea19d146a8cb32d108fa93c0b09b1b661f7d

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2021

Backport failed for stable/1.1, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin stable/1.1
git worktree add -d .worktree/backport-7762-to-stable/1.1 origin/stable/1.1
cd .worktree/backport-7762-to-stable/1.1
git checkout -b backport-7762-to-stable/1.1
ancref=$(git merge-base aef2ecf940c0cb2ec807b0d972e448337328d036 4459ea19d146a8cb32d108fa93c0b09b1b661f7d)
git cherry-pick -x $ancref..4459ea19d146a8cb32d108fa93c0b09b1b661f7d

ghost pushed a commit that referenced this pull request Sep 7, 2021
7775: [Backport stable/1.1] Remove listeners to prevent memory leak r=deepthidevaki a=deepthidevaki

## Description

This is a backport of #7762 , but only [one commit ](https://github.com/camunda-cloud/zeebe/pull/7762/commits/4459ea19d146a8cb32d108fa93c0b09b1b661f7d)from this PR was relevant for the backport.


Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
ghost pushed a commit that referenced this pull request Sep 7, 2021
7777: [Backport stable/1.0] Remove listeners to prevent memory leak r=deepthidevaki a=deepthidevaki

## Description

Backport of #7775 

Backport of 4459ea1 from PR #7762 

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
ghost pushed a commit that referenced this pull request Sep 7, 2021
7777: [Backport stable/1.0] Remove listeners to prevent memory leak r=deepthidevaki a=deepthidevaki

## Description

Backport of #7775 

Backport of 4459ea1 from PR #7762 

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible memory leak in follower
3 participants