New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qmemman take too much memory from dom0, causing OOM (R3) #959

Closed
marmarek opened this Issue Apr 22, 2015 · 9 comments

Comments

Projects
None yet
2 participants
@marmarek
Member

marmarek commented Apr 22, 2015

It happened twice on my system. First time it killed Xorg, second time OOM killed libvirtd - both are quite fatal. In both cases it happened during starting some VM.

Qmemman should not cause OOM in any VM, especially dom0. If there is no enough memory, it should simply prevent the VM from starting.

@marmarek marmarek self-assigned this Apr 22, 2015

@marmarek marmarek added this to the Release 3 milestone Apr 22, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 22, 2015

Member

Actually it's not qmemman fault. Its libvirtd that takes that memory. It looks like it have "autoballoon" feature enabled, so when starting any VM, it takes memory from dom0 for that.
It is enabled when it doesn't detect dom0_mem parameter in Xen command line, so we have two options:

  1. Add dom0_mem parameter to Xen cmdline
  2. Disable that feature with a patch for libvirt
Member

marmarek commented Apr 22, 2015

Actually it's not qmemman fault. Its libvirtd that takes that memory. It looks like it have "autoballoon" feature enabled, so when starting any VM, it takes memory from dom0 for that.
It is enabled when it doesn't detect dom0_mem parameter in Xen command line, so we have two options:

  1. Add dom0_mem parameter to Xen cmdline
  2. Disable that feature with a patch for libvirt
@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak May 10, 2015

Some ideas:

  • Maybe there should be some priorities. If there is low memory, the memory manager has to make some risky decision, which will potentially cause OOM in one or more of VMs. But Dom0 should be a high priority VM, since OOM in Dom0 is likely fatal.
  • Priorities for sys-net and sys-firewall are considerable, but not essential.
  • Swap in Dom0 is a quick workaround. Swapping in Dom0 is uncomfortable, but it is better than a crash.

v6ak commented May 10, 2015

Some ideas:

  • Maybe there should be some priorities. If there is low memory, the memory manager has to make some risky decision, which will potentially cause OOM in one or more of VMs. But Dom0 should be a high priority VM, since OOM in Dom0 is likely fatal.
  • Priorities for sys-net and sys-firewall are considerable, but not essential.
  • Swap in Dom0 is a quick workaround. Swapping in Dom0 is uncomfortable, but it is better than a crash.
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 10, 2015

Member

Qubes memory manager is designed this way it will never takes used memory
from any VM. If there is no free memory to distribute, it will simply
refuse to such operation - so either you will not start new VM, or some
VM will not get additional memory when requested.

This bug is result of libvirt memory managing, in addition to qmemman.
The former should be somehow disabled.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented May 10, 2015

Qubes memory manager is designed this way it will never takes used memory
from any VM. If there is no free memory to distribute, it will simply
refuse to such operation - so either you will not start new VM, or some
VM will not get additional memory when requested.

This bug is result of libvirt memory managing, in addition to qmemman.
The former should be somehow disabled.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak May 14, 2015

Unfortunately, adding a swap does not prevent the crash. (My experience.)

v6ak commented May 14, 2015

Unfortunately, adding a swap does not prevent the crash. (My experience.)

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 14, 2015

Member

Try adding dom0_mem=min:1024M to xen cmdline (GRUB_CMDLINE_XEN_DEFAULT setting in /etc/default/grub, then call grub2-mkconfig -o /boot/grub2/grub.cfg).

Member

marmarek commented May 14, 2015

Try adding dom0_mem=min:1024M to xen cmdline (GRUB_CMDLINE_XEN_DEFAULT setting in /etc/default/grub, then call grub2-mkconfig -o /boot/grub2/grub.cfg).

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 2, 2015

Member

I think adding the option at installation time is enough.

Member

marmarek commented Sep 2, 2015

I think adding the option at installation time is enough.

@marmarek marmarek closed this Sep 2, 2015

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Sep 2, 2015

I am not sure about that. I used to get some crashes with that when swap was disabled. But it was OK when swap was enabled.

v6ak commented Sep 2, 2015

I am not sure about that. I used to get some crashes with that when swap was disabled. But it was OK when swap was enabled.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 2, 2015

Member

Maybe qmemman is simply too slow, so without safety margin in form of
swap it ends up in OOM...

If that's the case, the solution is simply to not disable swap and close
this issue.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 2, 2015

Maybe qmemman is simply too slow, so without safety margin in form of
swap it ends up in OOM...

If that's the case, the solution is simply to not disable swap and close
this issue.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Sep 9, 2015

I am not sure if this is the case. As far as I remember, qmemman does not distinguish between used RAM and used SWAP – it just sums up those two numbers. (Well, I simplify it a bit, as there are also caches, but this should not make any difference…) My OOM case was:

  1. There is something in the swap.
  2. sudo swapoff -a && sudo swapon -a
  3. crash

Reason for doing so is that I have seen some memory usage reduction after such operation. Maybe Linux sometimes stores some garbage in swap and removes it when needed.

However, qmemman being too slow is probably more likely to cause problems for dom0 that for any other domain: If there is lack of memory, then qmemman has to act. However, lack of memory might slow the whole system down. (I don't know what exactly is happening, maybe kernel clears some caches, flushes some buffers etc. But I have seen such slowdown even without any swap. In fact, the slowdown is somewhat different from traditional swapping slowdown.) In this case, qmemman might be slown down, which may prevent adding more memory on time. Even worse, OOM killer might theoretically kill the qmemman.

So, using some (even small) swap seems to be a good idea even if I have 16GiB RAM. I have my swap on a LVM partition, so I can encrypt it with a random key without referring to /dev/sdaX.

v6ak commented Sep 9, 2015

I am not sure if this is the case. As far as I remember, qmemman does not distinguish between used RAM and used SWAP – it just sums up those two numbers. (Well, I simplify it a bit, as there are also caches, but this should not make any difference…) My OOM case was:

  1. There is something in the swap.
  2. sudo swapoff -a && sudo swapon -a
  3. crash

Reason for doing so is that I have seen some memory usage reduction after such operation. Maybe Linux sometimes stores some garbage in swap and removes it when needed.

However, qmemman being too slow is probably more likely to cause problems for dom0 that for any other domain: If there is lack of memory, then qmemman has to act. However, lack of memory might slow the whole system down. (I don't know what exactly is happening, maybe kernel clears some caches, flushes some buffers etc. But I have seen such slowdown even without any swap. In fact, the slowdown is somewhat different from traditional swapping slowdown.) In this case, qmemman might be slown down, which may prevent adding more memory on time. Even worse, OOM killer might theoretically kill the qmemman.

So, using some (even small) swap seems to be a good idea even if I have 16GiB RAM. I have my swap on a LVM partition, so I can encrypt it with a random key without referring to /dev/sdaX.

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 22, 2017

anaconda: add dom0_mem=min:1024M to default xen cmdline
This will solve #959 for new installations.

Related to QubesOS/qubes-issues#959
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment