Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 5.2.x breaks VM start - qubes-db fails to start #5199

Closed
marmarek opened this issue Jul 28, 2019 · 10 comments
Closed

Linux 5.2.x breaks VM start - qubes-db fails to start #5199

marmarek opened this issue Jul 28, 2019 · 10 comments
Labels
C: kernel P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@marmarek
Copy link
Member

marmarek commented Jul 28, 2019

Qubes OS version
4.0

Affected component(s) or functionality
Linux / vchan

Brief summary
With update to Linux 5.2.x, qubes-db service fails to start with gnttab: error: mmap failed: No such device or address error.

To Reproduce
Steps to reproduce the behavior:

  1. Install kernel-latest-qubes-vm-5.2.3
  2. Switch a VM to that kernel, make sure virt_mode is set to pvh
  3. Try to start the VM

Expected behavior
VM starts normally

Actual behavior
VM either crash on start (report on IRC by @xaki23), or just qubes-db fails to start leaving VM half-configured.
Logs from service start, with xen_gntdev debugging enabled:

Jul 28 17:18:49 localhost systemd[1]: Started Cleanup of Temporary Directories.
Jul 28 18:02:35 localhost systemd[1]: Starting Qubes DB agent...
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be, add 1
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: map 0+1 at 725a8e4ad000 (pgoff 0)
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: map 0+1
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be, add 1
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: map 1+1 at 725a8e47b000 (pgoff 1)
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: map 1+1
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be, del 4096+1
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: unmap 1+1 [0+1]
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: unmap handle=257 st=0
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be, offset for vaddr 725a8e4ad000
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: gntdev_vma_close 000000007cc48ca0
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be, del 0+1
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: unmap 0+1 [0+1]
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: unmap handle=256 st=0
Jul 28 18:02:35 localhost kernel: xen:xen_gntdev: priv 00000000689318be
Jul 28 18:02:35 localhost qubesdb-daemon[843]: gnttab: error: mmap failed: No such device or address
Jul 28 18:02:35 localhost qubesdb-daemon[843]: FATAL: vchan initialization failed
Jul 28 18:02:35 localhost systemd[1]: qubes-db.service: Main process exited, code=exited, status=1/FAILURE
Jul 28 18:02:35 localhost systemd[1]: qubes-db.service: Failed with result 'exit-code'.
Jul 28 18:02:35 localhost systemd[1]: Failed to start Qubes DB agent.

strace fragment:

[pid   857] openat(AT_FDCWD, "/dev/xen/evtchn", O_RDWR|O_CLOEXEC) = 8
[pid   857] ioctl(8, IOCTL_EVTCHN_BIND_INTERDOMAIN, 0x7ffd3407b778) = 32
[pid   857] write(8, " \0\0\0", 4)      = 4
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) = 0x777f1211b000
[pid   857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0x1000) = -1 ENXIO (No such device or address)
[pid   857] write(2, "gnttab: ", 8)     = 8
[pid   857] write(2, "error: ", 7)      = 7
[pid   857] write(2, "mmap failed", 11) = 11
[pid   857] write(2, ": No such device or address", 27) = 27
[pid   857] write(2, "\n", 1)           = 1
[pid   857] ioctl(7, IOCTL_GNTDEV_UNMAP_GRANT_REF, 0x7ffd3407b710) = 0
[pid   857] ioctl(7, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR, 0x7ffd3407b750) = 0
[pid   857] munmap(0x777f1211b000, 4096) = 0
[pid   857] ioctl(7, IOCTL_GNTDEV_UNMAP_GRANT_REF, 0x7ffd3407b740) = 0
[pid   857] close(8)                    = 0
[pid   857] close(7)                    = 0

cc @m-v-b

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Jul 28, 2019
@marmarek marmarek added this to the Release 4.0 updates milestone Jul 28, 2019
@marmarek
Copy link
Member Author

As @m-v-b found, it's most likely caused by https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=df9bde015a72ffd978e39a750662c7cf579b1715

I suspect it's about vma->vm_pgoff usage - previously it was used only for finding map index, now vm_map_pages uses it for starting offset in map->pages too.
If I understand this code correctly, gntdev driver abuses offset mmap argument for an grant index (multiplied by page size) instead of real offset. I think the proper solution is to revert this commit.

@m-v-b
Copy link

m-v-b commented Jul 28, 2019

Hello again,

My kernel build has finally successfully finished. I carried a few basic boot-up tests with AppVMs, and I confirm that reverting the commit at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=df9bde015a72ffd978e39a750662c7cf579b1715 resolves this issue.

Please note that I did not use the kernel I built with dom0 yet. I will test that in a moment.

Addendum: dom0 works as expected as well, after reverting the aforementioned commit.

@andrewdavidwong andrewdavidwong added C: kernel P: major Priority: major. Between "default" and "critical" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Jul 28, 2019
@marmarek
Copy link
Member Author

Relevant discussion: https://lore.kernel.org/lkml/CAFqt6zZN+6r6wYJY+f15JAjj8dY+o30w_+EWH9Vy2kUXCKSBog@mail.gmail.com/

@m-v-b could you test replacing vm_map_pages with vm_map_pages_zero instead of the revert?

@m-v-b
Copy link

m-v-b commented Jul 30, 2019

@marmarek, of course, I do not mind helping with that. Please note that I need about 12 to 16 hours before I can report back with the test results, mostly due to my employment/work. If such a time frame would be too late, please let me know.

@marmarek
Copy link
Member Author

I'm testing it already.

@marmarek
Copy link
Member Author

It works.

@marmarek
Copy link
Member Author

marmarek commented Aug 5, 2019

Fix queued in 5.2-stable already.

@DemiMarie
Copy link

@marmarek would it be possible to push out a new kernel-latest-qubes-vm package that includes the fix?

@marmarek
Copy link
Member Author

marmarek commented Aug 6, 2019

As soon as it will be released (5.2.7), which should happen this week.

Whissi pushed a commit to Whissi/linux-stable that referenced this issue Aug 6, 2019
commit 8d1502f upstream.

'commit df9bde0 ("xen/gntdev.c: convert to use vm_map_pages()")'
breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages()
will:
 - use map->pages starting at vma->vm_pgoff instead of 0
 - verify map->count against vma_pages()+vma->vm_pgoff instead of just
   vma_pages().

In practice, this breaks using a single gntdev FD for mapping multiple
grants.

relevant strace output:
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
0x777f1211b000
[pid   857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
0x1000) = -1 ENXIO (No such device or address)

details here:
QubesOS/qubes-issues#5199

The reason is -> ( copying Marek's word from discussion)

vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically using this parameter for "which grant reference to map".
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map->pages[0] should be mapped at
vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.

When trying to map grant with index (aka vma->vm_pgoff) > 1,
__vm_map_pages() will refuse to map it because it will expect map->count
to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
vma_pages(vma).

Converting vm_map_pages() to use vm_map_pages_zero() will fix the
problem.

Marek has tested and confirmed the same.

Cc: stable@vger.kernel.org # v5.2+
Fixes: df9bde0 ("xen/gntdev.c: convert to use vm_map_pages()")

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@marmarek
Copy link
Member Author

marmarek commented Aug 8, 2019

kernel-latest-qubes-vm-5.2.7 is in current-testing: QubesOS/updates-status#1248

@marmarek marmarek closed this as completed Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: kernel P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

4 participants