Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Memory leak #3183

Closed
3 of 4 tasks
ktechmidas opened this issue Jun 18, 2023 · 5 comments · Fixed by #4240
Closed
3 of 4 tasks

[Bug]: Memory leak #3183

ktechmidas opened this issue Jun 18, 2023 · 5 comments · Fixed by #4240
Labels
bug Something isn't working

Comments

@ktechmidas
Copy link

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Is this a UI / front end issue? Use the lemmy-ui repo.

Summary

We have a memory leak at latte.isnot.coffee which is on 0.17.4

First one on 16th when we had 2GB RAM:

./syslog.1:Jun 16 10:57:02 ip-172-31-33-220 kernel: [192823.528249] systemd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0 ./syslog.1:Jun 16 10:57:02 ip-172-31-33-220 kernel: [192823.528294] oom_kill_process.cold+0xb/0x10 ./syslog.1:Jun 16 10:57:02 ip-172-31-33-220 kernel: [192823.528302] __alloc_pages_may_oom+0x117/0x1e0 ./syslog.1:Jun 16 10:57:02 ip-172-31-33-220 kernel: [192823.528544] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name ./syslog.1:Jun 16 10:57:02 ip-172-31-33-220 kernel: [192823.528915] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-271703674a88a2f1391d06caf09c8f1dcff9370114d888bfbdae7fa9c8b1291f.scope,task=lemmy,pid=5860,uid=0 ./syslog.1:Jun 16 10:57:02 ip-172-31-33-220 kernel: [192823.528940] Out of memory: Killed process 5860 (lemmy) total-vm:668336kB, anon-rss:591732kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1360kB oom_score_adj:0 ./syslog.1:Jun 16 10:57:04 ip-172-31-33-220 systemd[1]: docker-271703674a88a2f1391d06caf09c8f1dcff9370114d888bfbdae7fa9c8b1291f.scope: A process of this unit has been killed by the OOM killer

Second one today on 4GB RAM:

Jun 18 04:38:38 ip-172-31-33-220 kernel: [142633.318633] actix-rt|system invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0 Jun 18 04:38:38 ip-172-31-33-220 kernel: [142633.318678] oom_kill_process.cold+0xb/0x10 Jun 18 04:38:38 ip-172-31-33-220 kernel: [142633.318686] __alloc_pages_may_oom+0x117/0x1e0 Jun 18 04:38:38 ip-172-31-33-220 kernel: [142633.318913] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jun 18 04:38:38 ip-172-31-33-220 kernel: [142633.319121] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker-271703674a88a2f1391d06caf09c8f1dcff9370114d888bfbdae7fa9c8b1291f.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-d0145aff21f6b2df6d9049d6b1e7b98b670c46ae63494abfe8c0153e09799e39.scope,task=node,pid=1349,uid=0 Jun 18 04:38:38 ip-172-31-33-220 kernel: [142633.319174] Out of memory: Killed process 1349 (node) total-vm:2700484kB, anon-rss:2004100kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:18016kB oom_score_adj:0 Jun 18 04:38:38 ip-172-31-33-220 systemd[1]: docker-d0145aff21f6b2df6d9049d6b1e7b98b670c46ae63494abfe8c0153e09799e39.scope: A process of this unit has been killed by the OOM killer.

Steps to Reproduce

  1. Leave server running for 24-48 hours
  2. Check for OOM killer in the logs

Technical Details

OS: Ubuntu 22.04
Only fix is a reboot

Version

0.17.4

Lemmy Instance URL

latte.isnot.coffee

@ktechmidas ktechmidas added the bug Something isn't working label Jun 18, 2023
@dullbananas
Copy link
Collaborator

Might be fixed by #3111

@dessalines
Copy link
Member

Could also be websockets, which kept in-memory tables for rooms and users, which is gone in 0.18

@Nutomic
Copy link
Member

Nutomic commented Jun 22, 2023

Gonna close this, reopen if it still happens with 0.18

@Nutomic Nutomic closed this as completed Jun 22, 2023
@dessalines dessalines reopened this Dec 9, 2023
@dessalines
Copy link
Member

We've also been noticing memory leak problems for the lemmy_server on lemmy.ml, even on the newest 0.19.0-rc.8 releases. The only way to fix it is via a manual restart, which makes this fairly critical. @dullbananas

My best guesses are:

  • The rate limiter map not freeing up keys.
  • Possibly the federation queue, although as long as it doesn't have any long running open postgres connections, it should be fine.
  • Actix web threads not properly closing, maybe if an error was hit.

@dullbananas could you look to make sure it's not the rate limiter?

cc @Nutomic @phiresky

@Nutomic
Copy link
Member

Nutomic commented Dec 11, 2023

@dessalines Please open a new issue, its unlikely that its related to this random old issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants