Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow VM startup because of ballooning out unused memory pages #4736

Closed
marmarek opened this issue Jan 18, 2019 · 5 comments
Closed

Slow VM startup because of ballooning out unused memory pages #4736

marmarek opened this issue Jan 18, 2019 · 5 comments
Assignees
Labels
C: kernel P: minor Priority: minor. The lowest priority, below "default." T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@marmarek
Copy link
Member

Qubes OS version:

R4.0

Affected component(s):

kernel


Steps to reproduce the behavior:

  1. Configure memory property much smaller than maxmem. Default 400 / 4000 will do.
  2. Start the VM.
  3. Observer kernel startup log.

Expected behavior:

Kernel startup almost instantaneously.

Actual behavior:

There is a delay, even before starting initramfs:

[    0.532934] xenbus_probe_frontend: Device with no driver: device/vbd/51712
[    0.532946] xenbus_probe_frontend: Device with no driver: device/vbd/51728
[    0.532958] xenbus_probe_frontend: Device with no driver: device/vbd/51744
[    0.532969] xenbus_probe_frontend: Device with no driver: device/vbd/51760
[    0.532981] xenbus_probe_frontend: Device with no driver: device/vif/0
[    0.532998]   Magic number: 1:252:3141
[    0.533048] hctosys: unable to open rtc device (rtc0)
[    0.535819] Freeing unused kernel memory: 2544K
[    1.119026] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x29dc0f4e9d8, max_idle_ns: 440795260823 ns
[    2.517036] random: crng init done
[    5.612059] Write protecting the kernel read-only data: 18432k
[    5.613360] Freeing unused kernel memory: 2024K
[    5.613946] Freeing unused kernel memory: 324K
[    5.613960] rodata_test: all tests were successful
[    5.618932] Invalid max_queues (4), will use default max: 2.
[    5.755270] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    5.765339]  xvda: xvda1 xvda2 xvda3

This is especially bad on DispVM start, which should be as fast as possible.

General notes:

This issue was initially discovered when debugging Qubes running in KVM (on OpenQA), where the effect was much worse, enough to cause startup timeout (60s). Problem and solution discussed here: https://markmail.org/thread/jlj4cxz5e33ile43

Applying this fix needs to be done carefully. There are two parts:

  1. Kernel parameter disabling scrubbing ballooned out pages.
  2. Initramfs module re-enabling it after initial balloon-down.

Applying only the first point but not the second could have bad security implications (leaking VM's memory into Xen or potentially other VMs). So, this needs to be done in a way guaranteeing either both points being applied or none. Applying only the second one is harmless (for example when the kernel is too old to support it).
There are two cases:

  • kernel managed by dom0
  • kernel managed by VM

In both cases the same entity control both kernel command line and initramfs, so it shouldn't be that hard to handle dependencies between those two actions.

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: kernel P: minor Priority: minor. The lowest priority, below "default." labels Jan 18, 2019
@marmarek marmarek added this to the Release 4.0 updates milestone Jan 18, 2019
@marmarek marmarek self-assigned this Jan 18, 2019
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 25, 2019
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.

QubesOS/qubes-issues#4839
QubesOS/qubes-issues#4736
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 25, 2019
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.

QubesOS/qubes-issues#4839
QubesOS/qubes-issues#4736
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 27, 2019
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.

QubesOS/qubes-issues#4839
QubesOS/qubes-issues#4736
marmarek added a commit to marmarek/qubes-linux-kernel that referenced this issue Mar 15, 2019
…tions

Default kernel options like root= or plymouth.enable are specific to the
kernel package (and initrd bundled with it). Start migrating away from
built-in defaults in core-admin by adding a file in kernel package
containing those options.

Also, if new enough initramfs is included, add xen_scrub_pages=0 which
will speed up the domain start.

QubesOS/qubes-issues#4839
QubesOS/qubes-issues#4736
marmarek added a commit to QubesOS/qubes-linux-kernel that referenced this issue Mar 19, 2019
…tions

Default kernel options like root= or plymouth.enable are specific to the
kernel package (and initrd bundled with it). Start migrating away from
built-in defaults in core-admin by adding a file in kernel package
containing those options.

Also, if new enough initramfs is included, add xen_scrub_pages=0 which
will speed up the domain start.

QubesOS/qubes-issues#4839
QubesOS/qubes-issues#4736

(cherry picked from commit 9cfa9a9)
@andrewdavidwong
Copy link
Member

I've noticed that VM startup and shutdown recently became extremely slow on my system: from <5s before to over 30s now, in some cases even longer (#2963 (comment)). I wonder if this could be the cause.

@marmarek
Copy link
Member Author

Check /var/log/xen/guest-VMNAME.log to see what takes the most time.

@marmarek
Copy link
Member Author

(I've got a log from @andrewdavidwong)
It doesn't look related to this very issue, as early kernel start finished quite quickly (under 1s), so I'll provide full analysis in #2963.

@andrewdavidwong
Copy link
Member

(I've got a log from @andrewdavidwong)
It doesn't look related to this very issue, as early kernel start finished quite quickly (under 1s), so I'll provide full analysis in #2963.

Ok, thanks. I'm not sure if my problem is the same as #2963. If not, let me know, and I can create a separate issue (so as not to hijack that one).

@marmarek
Copy link
Member Author

This is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: kernel P: minor Priority: minor. The lowest priority, below "default." T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

2 participants