New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qmemman prevents creation of VMs with large amounts of RAM #1136

Closed
qubesuser opened this Issue Aug 19, 2015 · 23 comments

Comments

Projects
None yet
2 participants
@qubesuser

Launching a 16GB HVM domain on a machine with 64GB RAM (and not many other VMs running) doesn't work, while an 8GB HVM seems to work reliably.

It seems that there is some issue in qmemman that makes it fail.

Disabling qmemman temporarily works around the issue.

For anyone with the same problem, here is a reliable way to do so:

sudo mv /usr/lib64/python2.7/site-packages/qubes/qmemman_client.py* /root
sudo systemctl stop qubes-qmemman
sudo xl mem-set dom0 4g
sudo xl mem-max dom0 4g
sleep 5
qvm-start VMNAME
sudo systemctl start qubes-qmemman
sudo mv /root/qmemman_client.py* /usr/lib64/python2.7/site-packages/qubes
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 8, 2015

Member

Can you provide some error message and/or qmemman log from such problem?

Member

marmarek commented Oct 8, 2015

Can you provide some error message and/or qmemman log from such problem?

@marmarek marmarek added this to the Release 3.1 milestone Oct 8, 2015

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

Right now it's even giving me "ERROR: insufficient memory to start VM" for a 16GB HVM when dom0 has 56GB assigned (of which 38GB completely free according to "free").

Log has (not totally sure if it's related to this though):

:37,956 qmemman.systemstate[2565]: dom '0' still hold more memory than have assigned (5365133312 > 3659216295)
:42,449 qmemman.systemstate[2565]: dom '0' didnt react to memory request (holds 5138825216, requested balloon down to 3640800861)

dmesg and xl dmesg seem to say nothing relevant except "xen_balloon: Initialising balloon driver", kernel is 3.19.8-100.fc20.x86_64, xen is 4.4.2-7.fc20

Disabling memory balancing and setting dom0 memory manually to 8GB makes the error go away.

After reenabling memory balancing and letting dom0 size go back to 50GB, it worked once with a 16GB HVM, then it failed with the same error, then again it worked, then it fails.

When working, it shows messages like this:

:14,306 qmemman.systemstate[1992]: dom '0' still hold more memory than have assigned (39990571008 > 35968897614)
:14,813 qmemman.systemstate[1992]: dom '0' still hold more memory than have assigned (38106079232 > 35952503502)

When it says "insufficient memory", no messages are shown.

I can't reproduce it right now, but I also managed to have qvm-start start the HVM, but get it in ---sc--- state.

PV AppVM with 16GB RAM also fails with insufficient memory.

Killing and restarting qmemman doesn't seem to change things.

Right now it's even giving me "ERROR: insufficient memory to start VM" for a 16GB HVM when dom0 has 56GB assigned (of which 38GB completely free according to "free").

Log has (not totally sure if it's related to this though):

:37,956 qmemman.systemstate[2565]: dom '0' still hold more memory than have assigned (5365133312 > 3659216295)
:42,449 qmemman.systemstate[2565]: dom '0' didnt react to memory request (holds 5138825216, requested balloon down to 3640800861)

dmesg and xl dmesg seem to say nothing relevant except "xen_balloon: Initialising balloon driver", kernel is 3.19.8-100.fc20.x86_64, xen is 4.4.2-7.fc20

Disabling memory balancing and setting dom0 memory manually to 8GB makes the error go away.

After reenabling memory balancing and letting dom0 size go back to 50GB, it worked once with a 16GB HVM, then it failed with the same error, then again it worked, then it fails.

When working, it shows messages like this:

:14,306 qmemman.systemstate[1992]: dom '0' still hold more memory than have assigned (39990571008 > 35968897614)
:14,813 qmemman.systemstate[1992]: dom '0' still hold more memory than have assigned (38106079232 > 35952503502)

When it says "insufficient memory", no messages are shown.

I can't reproduce it right now, but I also managed to have qvm-start start the HVM, but get it in ---sc--- state.

PV AppVM with 16GB RAM also fails with insufficient memory.

Killing and restarting qmemman doesn't seem to change things.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

Ok, I think I know the cause. qmemman waits limited time for a VM to give back memory. Currently it is 0.5 sec. Probably it is too little time for Linux to give 16GB.
Try editing that file and increasing either tries or BALOON_DELAY. If that helps, we may need to introduce some smarted mechanism - either make this value configurable (easier), or some adaptive approach - like waiting as long as VM is still giving something (harder).

Member

marmarek commented Oct 9, 2015

Ok, I think I know the cause. qmemman waits limited time for a VM to give back memory. Currently it is 0.5 sec. Probably it is too little time for Linux to give 16GB.
Try editing that file and increasing either tries or BALOON_DELAY. If that helps, we may need to introduce some smarted mechanism - either make this value configurable (easier), or some adaptive approach - like waiting as long as VM is still giving something (harder).

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

I can't reproduce it right now, but I also managed to have qvm-start start the HVM, but get it in ---sc--- state.

Check stubdomain log in such case (/var/log/xen/console/guest-VMNAME-dm.log). I guess you'll find something about video memory allocation... If that's the case, try increasing videoram in libvirt config template (/usr/share/qubes/vm-template-hvm.xml).

Member

marmarek commented Oct 9, 2015

I can't reproduce it right now, but I also managed to have qvm-start start the HVM, but get it in ---sc--- state.

Check stubdomain log in such case (/var/log/xen/console/guest-VMNAME-dm.log). I guess you'll find something about video memory allocation... If that's the case, try increasing videoram in libvirt config template (/usr/share/qubes/vm-template-hvm.xml).

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

The "insufficient memory" error seems to be solved by raising MAX_TRIES in do_balloon, both for HVM and PV VMs.

It probably should be checking whether it is making progress in freeing memory rather than having a fixed limit (and ideally a progress bar should be shown to the user).

Right now on my machine, the ballooning rate for dom0 seems to be around 5GB/second, although I suppose this varies a lot between machines, states and configurations.

Can't reproduce the crashed HVM at the moment.

The "insufficient memory" error seems to be solved by raising MAX_TRIES in do_balloon, both for HVM and PV VMs.

It probably should be checking whether it is making progress in freeing memory rather than having a fixed limit (and ideally a progress bar should be shown to the user).

Right now on my machine, the ballooning rate for dom0 seems to be around 5GB/second, although I suppose this varies a lot between machines, states and configurations.

Can't reproduce the crashed HVM at the moment.

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

Still two problems that happen when the VM size is very close to the maximum possible available memory:

  1. libxenlight internal error cannot create domain for HVM
  2. It seems possible to trigger dom0 OOM which then kills the GUI. This might be related to the fact that sometimes the system gets into a state where dom0 total memory reported by "free" is around 1-2GB lower than the one reported by xl list, meaning balloon requests are off by 1-2 GB causing OOM.

Still two problems that happen when the VM size is very close to the maximum possible available memory:

  1. libxenlight internal error cannot create domain for HVM
  2. It seems possible to trigger dom0 OOM which then kills the GUI. This might be related to the fact that sometimes the system gets into a state where dom0 total memory reported by "free" is around 1-2GB lower than the one reported by xl list, meaning balloon requests are off by 1-2 GB causing OOM.
@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

So basically there seems to be an issue where an HVM with 52GB RAM fails to start, either with xenlight internal error, or with the stubdomain just not appearing in xl list.

However, if I hack qmemman to add an extra 512MB to the memory request, then it starts (but 256MB is not enough).

Stubdomain is 44 MB on "xl list".

Once it is running, "xl list" reports 3443 MB memory for dom0, but "free" only reports 2176664 total memory.

When the large VM is stopped, then dom0 memory from "xl list" and "free" seems to better match again.

So basically:

  1. It seems an additional ~1% RAM is required to start HVMs for some reason (internal per-memory-page structures?)
  2. Something odd is happening with dom0 RAM, either memory balancing is getting the values out of sync, or there is some "hidden" memory allocated in dom0

So basically there seems to be an issue where an HVM with 52GB RAM fails to start, either with xenlight internal error, or with the stubdomain just not appearing in xl list.

However, if I hack qmemman to add an extra 512MB to the memory request, then it starts (but 256MB is not enough).

Stubdomain is 44 MB on "xl list".

Once it is running, "xl list" reports 3443 MB memory for dom0, but "free" only reports 2176664 total memory.

When the large VM is stopped, then dom0 memory from "xl list" and "free" seems to better match again.

So basically:

  1. It seems an additional ~1% RAM is required to start HVMs for some reason (internal per-memory-page structures?)
  2. Something odd is happening with dom0 RAM, either memory balancing is getting the values out of sync, or there is some "hidden" memory allocated in dom0
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

On Fri, Oct 09, 2015 at 04:04:23AM -0700, qubesuser wrote:

Still two problems that happen when the VM size is very close to the maximum possible available memory:

  1. libxenlight internal error cannot create domain for HVM

Check /var/log/libvirt/libxl/VMNAME.log. And probably also stubdom log
mentioned earlier.

  1. It seems possible to trigger dom0 OOM which then kills the GUI. This might be related to the fact that sometimes the system gets into a state where dom0 total memory reported by "free" is around 1-2GB lower than the one reported by xl list, meaning balloon requests are off by 1-2 GB causing OOM.

Take a look at #959. Do you have dom0_mem in Xen cmdline (xl info)?
Also if you have a lot of physical memory, IMHO it makes sense to add
dom0_mem=max:XXX where XXX is for example 4096M.
Maybe we should set it always, regardless of memory size (in case of
smaller systems, dom0 simply will not reach that limit).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 9, 2015

On Fri, Oct 09, 2015 at 04:04:23AM -0700, qubesuser wrote:

Still two problems that happen when the VM size is very close to the maximum possible available memory:

  1. libxenlight internal error cannot create domain for HVM

Check /var/log/libvirt/libxl/VMNAME.log. And probably also stubdom log
mentioned earlier.

  1. It seems possible to trigger dom0 OOM which then kills the GUI. This might be related to the fact that sometimes the system gets into a state where dom0 total memory reported by "free" is around 1-2GB lower than the one reported by xl list, meaning balloon requests are off by 1-2 GB causing OOM.

Take a look at #959. Do you have dom0_mem in Xen cmdline (xl info)?
Also if you have a lot of physical memory, IMHO it makes sense to add
dom0_mem=max:XXX where XXX is for example 4096M.
Maybe we should set it always, regardless of memory size (in case of
smaller systems, dom0 simply will not reach that limit).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

On Fri, Oct 09, 2015 at 04:19:37AM -0700, qubesuser wrote:

So basically:

  1. It seems an additional ~1% RAM is required to start HVMs for some reason (internal structures?)

Logs mentioned earlier should also show how much memory is needed for
what.

  1. Something odd is happening with dom0 RAM, either memory balancing is getting the values out of sync, or there is some "hidden" memory allocated in dom0

Maybe something related to page tables or other in kernel memory "metadata"?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 9, 2015

On Fri, Oct 09, 2015 at 04:19:37AM -0700, qubesuser wrote:

So basically:

  1. It seems an additional ~1% RAM is required to start HVMs for some reason (internal structures?)

Logs mentioned earlier should also show how much memory is needed for
what.

  1. Something odd is happening with dom0 RAM, either memory balancing is getting the values out of sync, or there is some "hidden" memory allocated in dom0

Maybe something related to page tables or other in kernel memory "metadata"?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

I think the problem regarding the xenlight internal error is that Xen has memory overhead for VMs, but Qubes is ignoring that.

At https://wiki.openstack.org/wiki/XenServer/Overhead there is a table that shows that for instance a 122GB VM requires an extra 1GB.

For small VMs, the extra few MBs that Qubes leaves free are enough, but that's not the case for large VMs.

OpenStack uses the following code to estimate overhead, should probably imitate it in Qubes (from nova/virt/xenapi/driver.py)

OVERHEAD_BASE = 3
OVERHEAD_PER_MB = 0.00781
OVERHEAD_PER_VCPU = 1.5

overhead = ((memory_mb * OVERHEAD_PER_MB) + (vcpus * OVERHEAD_PER_VCPU) + OVERHEAD_BASE)
overhead = math.ceil(overhead)

BTW, I have properly patched qmemman to check for progress, should submit a pull request soon

I think the problem regarding the xenlight internal error is that Xen has memory overhead for VMs, but Qubes is ignoring that.

At https://wiki.openstack.org/wiki/XenServer/Overhead there is a table that shows that for instance a 122GB VM requires an extra 1GB.

For small VMs, the extra few MBs that Qubes leaves free are enough, but that's not the case for large VMs.

OpenStack uses the following code to estimate overhead, should probably imitate it in Qubes (from nova/virt/xenapi/driver.py)

OVERHEAD_BASE = 3
OVERHEAD_PER_MB = 0.00781
OVERHEAD_PER_VCPU = 1.5

overhead = ((memory_mb * OVERHEAD_PER_MB) + (vcpus * OVERHEAD_PER_VCPU) + OVERHEAD_BASE)
overhead = math.ceil(overhead)

BTW, I have properly patched qmemman to check for progress, should submit a pull request soon

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

Pull request at marmarek/qubes-core-admin#6

I think I won't attempt to fix the missing overhead calculation myself because it looks like qmemman might need to become aware of it, and I'm not quite sure how qmemman balancing works exactly.

So, to sum up:

  1. "Insufficient memory" => pull request marmarek/qubes-core-admin#6
  2. xenlight internal error => need to fix Qubes to account for overhead as described above
  3. ---sc--- state => cannot reproduce, might be due to missing overhead calculation as well
  4. Mysterious dom0 "xl list"/free gap and dom0 OOM => no idea, might be related to overhead or maybe to balancing not happening when it should?

Pull request at marmarek/qubes-core-admin#6

I think I won't attempt to fix the missing overhead calculation myself because it looks like qmemman might need to become aware of it, and I'm not quite sure how qmemman balancing works exactly.

So, to sum up:

  1. "Insufficient memory" => pull request marmarek/qubes-core-admin#6
  2. xenlight internal error => need to fix Qubes to account for overhead as described above
  3. ---sc--- state => cannot reproduce, might be due to missing overhead calculation as well
  4. Mysterious dom0 "xl list"/free gap and dom0 OOM => no idea, might be related to overhead or maybe to balancing not happening when it should?
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

On Fri, Oct 09, 2015 at 04:57:28AM -0700, qubesuser wrote:

Pull request at marmarek/qubes-core-admin#6

I think I won't attempt to fix the missing overhead calculation myself because it looks like qmemman might need to become aware of it, and I'm not quite sure how qmemman balancing works exactly.

I think it would be enough to just include the overhead during VM
startup. qmemman already add some margin to the number reported by a VM
("CACHE_FACTOR", 1.3), so even if there is linear overhead for Xen about
1% it would be already handled (VM will simply receive smaller margin).

This doesn't solve anything about "xl list"/free gap, but I think those
are two independent things - one about Xen overhead, the other one about
Linux kernel overhead.

So, to sum up:

  1. "Insufficient memory" => pull request marmarek/qubes-core-admin#6

Thanks!

  1. xenlight internal error => need to fix Qubes to account for overhead as described above

I'll add requesting 1% more memory than VM have assigned.

  1. Mysterious dom0 "xl list"/free gap and dom0 OOM => no idea, might be related to overhead or maybe to balancing not happening properly?

May be fixed by limiting dom0 to 4GB. Even if there is some Linux kernel
overhead linear to maxmem, it would be greatly limited this way.
Take a look at this commit:
QubesOS/qubes-core-admin@bf21730

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 9, 2015

On Fri, Oct 09, 2015 at 04:57:28AM -0700, qubesuser wrote:

Pull request at marmarek/qubes-core-admin#6

I think I won't attempt to fix the missing overhead calculation myself because it looks like qmemman might need to become aware of it, and I'm not quite sure how qmemman balancing works exactly.

I think it would be enough to just include the overhead during VM
startup. qmemman already add some margin to the number reported by a VM
("CACHE_FACTOR", 1.3), so even if there is linear overhead for Xen about
1% it would be already handled (VM will simply receive smaller margin).

This doesn't solve anything about "xl list"/free gap, but I think those
are two independent things - one about Xen overhead, the other one about
Linux kernel overhead.

So, to sum up:

  1. "Insufficient memory" => pull request marmarek/qubes-core-admin#6

Thanks!

  1. xenlight internal error => need to fix Qubes to account for overhead as described above

I'll add requesting 1% more memory than VM have assigned.

  1. Mysterious dom0 "xl list"/free gap and dom0 OOM => no idea, might be related to overhead or maybe to balancing not happening properly?

May be fixed by limiting dom0 to 4GB. Even if there is some Linux kernel
overhead linear to maxmem, it would be greatly limited this way.
Take a look at this commit:
QubesOS/qubes-core-admin@bf21730

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

I added a pull request to try to tackle the memory overhead at marmarek/qubes-core-admin#7 on top of the code in the previous pull request.

It tries to minimize the impact on the codebase by dividing Xen total and free memory by the overhead, instead of multiplying everything else by it.

I guess it should be enough since basically transferring memory between VMs doesn't change the overhead, but not completely sure.

The memory gap issue makes this hard to test since qmemman attempts to OOM dom0, trying to figure out what's happening there now.

I added a pull request to try to tackle the memory overhead at marmarek/qubes-core-admin#7 on top of the code in the previous pull request.

It tries to minimize the impact on the codebase by dividing Xen total and free memory by the overhead, instead of multiplying everything else by it.

I guess it should be enough since basically transferring memory between VMs doesn't change the overhead, but not completely sure.

The memory gap issue makes this hard to test since qmemman attempts to OOM dom0, trying to figure out what's happening there now.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

On Fri, Oct 09, 2015 at 10:54:28AM -0700, qubesuser wrote:

I added a pull request to try to tackle the memory overhead at marmarek/qubes-core-admin#7 on top of the code in the previous pull request.

Thanks :)

It tries to minimize the impact on the codebase by dividing Xen total and free memory by the overhead, instead of multiplying everything else by it.

I guess it should be enough since basically transferring memory between VMs doesn't change the overhead, but not completely sure.

The question is whether that overhead is taken from VM memory, or added
above it. In the former case, your approach is ok, otherwise no qmemman
modification is required at all. Since all that is about 1% of RAM
(read: not that much), IMO it's safer to apply the change.

The memory gap issue makes this hard to test since qmemman attempts to OOM dom0, trying to figure out what's happening there now.

Have you tried limiting dom0 memory with dom0_mem=max:4096M xen
parameter?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 9, 2015

On Fri, Oct 09, 2015 at 10:54:28AM -0700, qubesuser wrote:

I added a pull request to try to tackle the memory overhead at marmarek/qubes-core-admin#7 on top of the code in the previous pull request.

Thanks :)

It tries to minimize the impact on the codebase by dividing Xen total and free memory by the overhead, instead of multiplying everything else by it.

I guess it should be enough since basically transferring memory between VMs doesn't change the overhead, but not completely sure.

The question is whether that overhead is taken from VM memory, or added
above it. In the former case, your approach is ok, otherwise no qmemman
modification is required at all. Since all that is about 1% of RAM
(read: not that much), IMO it's safer to apply the change.

The memory gap issue makes this hard to test since qmemman attempts to OOM dom0, trying to figure out what's happening there now.

Have you tried limiting dom0 memory with dom0_mem=max:4096M xen
parameter?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

So the problem is that it seems that the kernel has something like 1/64 RAM + ~256MB overhead, which is not counted as "normal" memory, which qmemman doesn't consider when calling prefmem on dom0, instead just multiplying by 1.3 and adding a fixed 350MB.

This means that on machines with at least 32-64GB RAM and no dom0 mem limit, qmemman should consistently OOM dom0 when asking to balloon the maximum possible memory.

PV VMs also have the same problem, although it seems that less RAM is reserved there so the issue is less visible.

The fundamental problem is that meminfo-writer should find out the amount of overhead RAM and it should be added to MemTotal in qmemman's calculations (either by having meminfo-writer add it to MemTotal, or adding a new MemOverhead/MemReserved that is added by qmemman).

Also the Linux kernel should be fixed so that it doesn't unnecessarily keep metadata around (I assume it's "struct page" structs) for memory that has not been plugged in. It seems quite possible that the Linux kernel already supports this (since doing otherwise is not that smart), but is not properly configured, or maybe the Xen developers failed to properly implement it.

Setting dom0 maxmen obviously mitigates the issue, but it shouldn't be required for things to work.

So the problem is that it seems that the kernel has something like 1/64 RAM + ~256MB overhead, which is not counted as "normal" memory, which qmemman doesn't consider when calling prefmem on dom0, instead just multiplying by 1.3 and adding a fixed 350MB.

This means that on machines with at least 32-64GB RAM and no dom0 mem limit, qmemman should consistently OOM dom0 when asking to balloon the maximum possible memory.

PV VMs also have the same problem, although it seems that less RAM is reserved there so the issue is less visible.

The fundamental problem is that meminfo-writer should find out the amount of overhead RAM and it should be added to MemTotal in qmemman's calculations (either by having meminfo-writer add it to MemTotal, or adding a new MemOverhead/MemReserved that is added by qmemman).

Also the Linux kernel should be fixed so that it doesn't unnecessarily keep metadata around (I assume it's "struct page" structs) for memory that has not been plugged in. It seems quite possible that the Linux kernel already supports this (since doing otherwise is not that smart), but is not properly configured, or maybe the Xen developers failed to properly implement it.

Setting dom0 maxmen obviously mitigates the issue, but it shouldn't be required for things to work.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

So the problem is that it seems that the kernel has something like 1/64 RAM + ~256MB overhead, which is not counted as "normal" memory, which qmemman doesn't consider when calling prefmem on dom0, instead just multiplying by 1.3 and adding a fixed 350MB.

1.3 is greater than 1/64, so theoretically shouldn't be a problem. Am I missing something?

The fundamental problem is that meminfo-writer should find out the amount of overhead RAM and it should be added to MemTotal in qmemman's calculations (either by having meminfo-writer add it to MemTotal, or adding a new MemOverhead/MemReserved that is added by qmemman).

Since this looks like Linux specific thing, it should be handled by meminfo-writer. But since I don't know exact formula, I don't want to introduce some estimation which could negatively affect users with smaller systems (majority of them).

Also the Linux kernel should be fixed so that it doesn't unnecessarily keep metadata around (I assume it's "struct page" structs) for memory that has not been plugged in. It seems quite possible that the Linux kernel already supports this (since doing otherwise is not that smart), but is not properly configured, or maybe the Xen developers failed to properly implement it.

I think it is supported as memory hotplug (CONFIG_XEN_BALLOON_MEMORY_HOTPLUG), but last time I've tested it (AFAIR around 3.14), I've got some crashes.
Since on most desktop/laptop systems - Qubes OS target use case - the impact isn't such big (total RAM <= 16GB), I haven't investigated it further.

Setting dom0 maxmen obviously mitigates the issue, but it shouldn't be required for things to work.

This is recommended configuration by Xen best practices - actually it is recommended to set fixed memory size for dom0. And there is also explanation why - along the lines we've got here.

Member

marmarek commented Oct 9, 2015

So the problem is that it seems that the kernel has something like 1/64 RAM + ~256MB overhead, which is not counted as "normal" memory, which qmemman doesn't consider when calling prefmem on dom0, instead just multiplying by 1.3 and adding a fixed 350MB.

1.3 is greater than 1/64, so theoretically shouldn't be a problem. Am I missing something?

The fundamental problem is that meminfo-writer should find out the amount of overhead RAM and it should be added to MemTotal in qmemman's calculations (either by having meminfo-writer add it to MemTotal, or adding a new MemOverhead/MemReserved that is added by qmemman).

Since this looks like Linux specific thing, it should be handled by meminfo-writer. But since I don't know exact formula, I don't want to introduce some estimation which could negatively affect users with smaller systems (majority of them).

Also the Linux kernel should be fixed so that it doesn't unnecessarily keep metadata around (I assume it's "struct page" structs) for memory that has not been plugged in. It seems quite possible that the Linux kernel already supports this (since doing otherwise is not that smart), but is not properly configured, or maybe the Xen developers failed to properly implement it.

I think it is supported as memory hotplug (CONFIG_XEN_BALLOON_MEMORY_HOTPLUG), but last time I've tested it (AFAIR around 3.14), I've got some crashes.
Since on most desktop/laptop systems - Qubes OS target use case - the impact isn't such big (total RAM <= 16GB), I haven't investigated it further.

Setting dom0 maxmen obviously mitigates the issue, but it shouldn't be required for things to work.

This is recommended configuration by Xen best practices - actually it is recommended to set fixed memory size for dom0. And there is also explanation why - along the lines we've got here.

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

1.3 is greater than 1/64, so theoretically shouldn't be a problem. Am I missing something?

It's 1/64 of the maximum possible dom0 RAM, e.g. 1GB for 64GB RAM without dom0_mem max set plus on my machine around 300MB.

The 1.3 factor is applied to the used RAM instead, so in a typical scenario where dom0 is using 1GB it will be 300MB.

And then 300MB + 350MB bonus = 750MB < 1.3GB, which results in either OOM or swapping if qmemman is asked to balloon dom0 exactly to the minimum possible.

total RAM <= 16GB
Laptops are indeed usually <=16GB, with some 32GB models, but desktops/workstations specifically built for VM-based workloads like running Qubes are more likely to have 32/64 GB.

it should be handled by meminfo-writer

Yes, problem is, I can't find a good way to get the info at the moment.

In theory one just needs to get the domain size from Xen and subtract it from the total memory from /proc/meminfo, but the problem is that if ballooning is happening, this is racy.

1.3 is greater than 1/64, so theoretically shouldn't be a problem. Am I missing something?

It's 1/64 of the maximum possible dom0 RAM, e.g. 1GB for 64GB RAM without dom0_mem max set plus on my machine around 300MB.

The 1.3 factor is applied to the used RAM instead, so in a typical scenario where dom0 is using 1GB it will be 300MB.

And then 300MB + 350MB bonus = 750MB < 1.3GB, which results in either OOM or swapping if qmemman is asked to balloon dom0 exactly to the minimum possible.

total RAM <= 16GB
Laptops are indeed usually <=16GB, with some 32GB models, but desktops/workstations specifically built for VM-based workloads like running Qubes are more likely to have 32/64 GB.

it should be handled by meminfo-writer

Yes, problem is, I can't find a good way to get the info at the moment.

In theory one just needs to get the domain size from Xen and subtract it from the total memory from /proc/meminfo, but the problem is that if ballooning is happening, this is racy.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

On Fri, Oct 09, 2015 at 02:18:24PM -0700, qubesuser wrote:

It's 1/64 of the maximum possible dom0 RAM,

Ah, indeed.

total RAM <= 16GB
Laptops are indeed usually <=16GB, with some 32GB models, but desktops/workstations specifically built for VM-based workloads like running Qubes are more likely to have 32/64 GB.

Still, setting dom0 max mem to 4GB seems to be a sensible move.

it should be handled by meminfo-writer

Yes, problem is, I can't find a good way to get the info at the moment.

Looking at /proc/meminfo I see DirectMap4k entry, which seems to be
exactly maxmem. Not sure if that helps anything.

In theory one just needs to get the domain size from Xen and subtract it from the total memory from /proc/meminfo, but the problem is that if ballooning is happening, this is racy.

The race at least can be detected - by reading ballooning target before
and after such operation. Also not sure if that helps.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 9, 2015

On Fri, Oct 09, 2015 at 02:18:24PM -0700, qubesuser wrote:

It's 1/64 of the maximum possible dom0 RAM,

Ah, indeed.

total RAM <= 16GB
Laptops are indeed usually <=16GB, with some 32GB models, but desktops/workstations specifically built for VM-based workloads like running Qubes are more likely to have 32/64 GB.

Still, setting dom0 max mem to 4GB seems to be a sensible move.

it should be handled by meminfo-writer

Yes, problem is, I can't find a good way to get the info at the moment.

Looking at /proc/meminfo I see DirectMap4k entry, which seems to be
exactly maxmem. Not sure if that helps anything.

In theory one just needs to get the domain size from Xen and subtract it from the total memory from /proc/meminfo, but the problem is that if ballooning is happening, this is racy.

The race at least can be detected - by reading ballooning target before
and after such operation. Also not sure if that helps.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

I think I found a way:

  1. Read /sys/devices/system/xen_memory/xen_memory0/info/current_kb
  2. Read /proc/meminfo
  3. Read again /sys/devices/system/xen_memory/xen_memory0/info/current_kb. If the value changed, retry at step 2

The value at /sys/devices/system/xen_memory/xen_memory0/info/current_kb is the "xl list" value and can then be exported by meminfo-writer as "DomTotal" and used by qmemman instead of MemTotal if available.

I think I found a way:

  1. Read /sys/devices/system/xen_memory/xen_memory0/info/current_kb
  2. Read /proc/meminfo
  3. Read again /sys/devices/system/xen_memory/xen_memory0/info/current_kb. If the value changed, retry at step 2

The value at /sys/devices/system/xen_memory/xen_memory0/info/current_kb is the "xl list" value and can then be exported by meminfo-writer as "DomTotal" and used by qmemman instead of MemTotal if available.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 9, 2015

Member

On Fri, Oct 09, 2015 at 02:53:14PM -0700, qubesuser wrote:

I think I found a way:

  1. Read /sys/devices/system/xen_memory/xen_memory0/info/current_kb
  2. Read /proc/meminfo
  3. Read again /sys/devices/system/xen_memory/xen_memory0/info/current_kb. If the value changed, retry at step 2

The value at /sys/devices/system/xen_memory/xen_memory0/info/current_kb is the "xl list" value and can then be exported by meminfo-writer as "DomTotal" and used by qmemman instead of MemTotal if available.

Generally I think meminfo-writer should export just one number: used
memory (instead of current dict). This should include all the in-VM
overhead - #1312

Anyway in the current shape, it should be added to to "MemTotal". No new
field.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 9, 2015

On Fri, Oct 09, 2015 at 02:53:14PM -0700, qubesuser wrote:

I think I found a way:

  1. Read /sys/devices/system/xen_memory/xen_memory0/info/current_kb
  2. Read /proc/meminfo
  3. Read again /sys/devices/system/xen_memory/xen_memory0/info/current_kb. If the value changed, retry at step 2

The value at /sys/devices/system/xen_memory/xen_memory0/info/current_kb is the "xl list" value and can then be exported by meminfo-writer as "DomTotal" and used by qmemman instead of MemTotal if available.

Generally I think meminfo-writer should export just one number: used
memory (instead of current dict). This should include all the in-VM
overhead - #1312

Anyway in the current shape, it should be added to to "MemTotal". No new
field.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

Pull request for meminfo-writer at marmarek/qubes-linux-utils#1

Haven't done much testing yet on all the patches combined.

Pull request for meminfo-writer at marmarek/qubes-linux-utils#1

Haven't done much testing yet on all the patches combined.

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 9, 2015

It seems to work (after fixing marmarek/qubes-core-admin#7).

There's a final minor issue: on my machine, "xl list" reports that the stubdomain takes 44 MB, but the QubesHVm code only reserved 32 MB, which should thus probably be raised.

It would also probably be a good test to write a system that does automatic stress tests of VM creation/destruction (not planning to do it myself).

Also added a commit to marmarek/qubes-linux-utils#1 to use 64-bit integers in meminfo-writer, so that it works on machines with more than 2 TB RAM (of course other changes might be needed since I assume no one ever tested).

It seems to work (after fixing marmarek/qubes-core-admin#7).

There's a final minor issue: on my machine, "xl list" reports that the stubdomain takes 44 MB, but the QubesHVm code only reserved 32 MB, which should thus probably be raised.

It would also probably be a good test to write a system that does automatic stress tests of VM creation/destruction (not planning to do it myself).

Also added a commit to marmarek/qubes-linux-utils#1 to use 64-bit integers in meminfo-writer, so that it works on machines with more than 2 TB RAM (of course other changes might be needed since I assume no one ever tested).

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 10, 2015

Member

On Fri, Oct 09, 2015 at 04:34:14PM -0700, qubesuser wrote:

It seems to work (after fixing marmarek/qubes-core-admin#7).

Thanks!

There's a final minor issue: on my machine, "xl list" reports that the stubdomain takes 44 MB, but the QubesHVm code only reserved 32 MB, which should thus probably be raised.

That was changed recently and indeed not all the places was updated.
Will do.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 10, 2015

On Fri, Oct 09, 2015 at 04:34:14PM -0700, qubesuser wrote:

It seems to work (after fixing marmarek/qubes-core-admin#7).

Thanks!

There's a final minor issue: on my machine, "xl list" reports that the stubdomain takes 44 MB, but the QubesHVm code only reserved 32 MB, which should thus probably be raised.

That was changed recently and indeed not all the places was updated.
Will do.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 11, 2015

Merge remote-tracking branch 'origin/pr/6'
* origin/pr/6:
  Support large VMs by removing the fixed balloon iteration limit

QubesOS/qubes-issues#1136

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 11, 2015

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 11, 2015

Merge remote-tracking branch 'origin/pr/7' into HEAD
* origin/pr/7:
  Properly account for Xen memory overhead to fix large VMs

QubesOS/qubes-issues#1136

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 11, 2015

Update stubdom memory allocation to 44MB
Since 677a79b "hvm: change default graphics to std vga ('xen')",
stubdomain uses 44MB RAM

QubesOS/qubes-issues#1136

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 11, 2015

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 11, 2015

Update stubdom memory allocation to 44MB
Since 677a79b "hvm: change default graphics to std vga ('xen')",
stubdomain uses 44MB RAM

QubesOS/qubes-issues#1136

(cherry picked from commit 820e5c0)

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Oct 11, 2015

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't
meant to use that much memory (most should be assigned to AppVMs), so on
big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 20, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 21, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313

fepitre added a commit to fepitre/anaconda that referenced this issue Sep 22, 2017

anaconda: limit dom0 maxmem to 4GB to limit its overhead on big systems
Linux kernel have some memory overhead depending on maxmem. Dom0 isn't meant to use that much memory (most should be assigned to AppVMs), so on big systems this will be pure waste.

QubesOS/qubes-issues#1136
Fixes QubesOS/qubes-issues#1313
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment