Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM Counter Incrementing Incorrectly #10

Open
dmitrii-didenko opened this issue Sep 10, 2021 · 1 comment
Open

OOM Counter Incrementing Incorrectly #10

dmitrii-didenko opened this issue Sep 10, 2021 · 1 comment

Comments

@dmitrii-didenko
Copy link

Hi!

Thank you for the project!

There seem to be one weird bug. Here is the description:

  1. I've installed MCM on the kube cluster v1.21.2 with docker runtime
  2. Port-forward one MCM container to check metrics
    k port-forward monitoring-missingcm-h257l 3001:3001
  3. Connect to some container located on the same node as MCM pod and trigger oom event with the help of stress command:
stress --vm 1 --vm-bytes 3024M
stress: info: [389] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [389] (415) <-- worker 390 got signal 9
stress: WARN: [389] (417) now reaping child worker processes
stress: FAIL: [389] (451) failed run completed in 2s

Please note, we should run the above command several times to reproduce the issue

  1. Check the container_ooms metrics for the above container

Expected result: the container_ooms counter should have value exact the same as number of times the stress command was executed

Actual result: container_ooms is greater than the number of times the stress command was executed. I've got the value 13 even though I run command only 3 times

Additional info:
I've checked docker events on the node while reproducing the issue. The number of oom events is matched with the number of stress runs.
Also checked /var/log/messages on the node. Result is as expected - the number of oom logs is matched with the number of stress runs.

Any idea what could be wrong here?

@draganm
Copy link
Owner

draganm commented Oct 8, 2021

Hi, thank you for reporting this. Let's try to get to the bottom of it ...

questions:

  • what was the count of ooms before you've run stress for the first time?
  • If you run stress only once, by how much does the number of ooms increase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants