New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppVM GUI crash: U2MFN_GET_MFN_FOR_PAGE: get_user_pages failed #2617

Open
pdinoto opened this Issue Feb 2, 2017 · 21 comments

Comments

Projects
None yet
9 participants
@pdinoto

pdinoto commented Feb 2, 2017

Qubes OS version (e.g., R3.2):

R3.2

Affected TemplateVMs (e.g., fedora-23, if applicable):

debian-9


Expected behavior:

Graphic applications work as usual.

Actual behavior:

AppVMs based on this template work fine until a concrete graphic action triggers a crash of the GUI component in the AppVM, closing all windows.

Steps to reproduce the behavior:

Create a new AppVM based on a debian-9 template (template specs follow)

General notes:

On the console, at the time of the crash lots of

U2MFN_GET_MFN_FOR_PAGE: get_user_pages failed, ret=0xfffffffffffffff2

are shown on the console.

Attempting to open new windows, like running gnome-terminal result in a brief display of the new window, all the others that were opened, and then all dissapear at the same time.

Applications are running, I can shutdown the AppVM from the console.

So far, graphical actions that seem to trigger the crash are:

  • Resizing a terminal window fast: just before the crash, there are color lines and artifacts on the screen.
  • Opening a existing spreadsheet on LibreOffice (the file opens, a partial draw of its contents is shown, then disappears and crashes all other windows).

Related issues:

Maybe this comment is related?

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Feb 2, 2017

The debian-9 template was created by dist-upgrade-ing a plain and working debian-8 template.

Qubes repositories enabled are:

# Main qubes updates repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch main

# Qubes updates candidates repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch-testing main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch-testing main

# Qubes security updates testing repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch-securitytesting main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch-securitytesting main

pdinoto commented Feb 2, 2017

The debian-9 template was created by dist-upgrade-ing a plain and working debian-8 template.

Qubes repositories enabled are:

# Main qubes updates repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch main

# Qubes updates candidates repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch-testing main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch-testing main

# Qubes security updates testing repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch-securitytesting main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch-securitytesting main
@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Feb 11, 2017

Member

sys-whonix (stretch based) randomly crashing when using konsole (with rather regular, non-fancy output).

Member

adrelanos commented Feb 11, 2017

sys-whonix (stretch based) randomly crashing when using konsole (with rather regular, non-fancy output).

@emdete

This comment has been minimized.

Show comment
Hide comment
@emdete

emdete Feb 15, 2017

i see this with my debian-9 based VMs as well, mostly on startup of programs. it happens for example on startup of rxvt and gvim. it seems to be depending on how many/what other programs are already running in other VMs. it is currently my showstopper to use qubes. #2455 mentions logfiles but i dont see anything other than the given message. the VM does not crash and can still be accessed via virsh.

emdete commented Feb 15, 2017

i see this with my debian-9 based VMs as well, mostly on startup of programs. it happens for example on startup of rxvt and gvim. it seems to be depending on how many/what other programs are already running in other VMs. it is currently my showstopper to use qubes. #2455 mentions logfiles but i dont see anything other than the given message. the VM does not crash and can still be accessed via virsh.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 19, 2017

Member

@HW42 any idea?

Member

marmarek commented Feb 19, 2017

@HW42 any idea?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 19, 2017

Member

Copying my comment from #2455:

0xfffffffffffffff2 is EFAULT returned from get_user_pages call, which suggests that the window composition buffer is no longer in memory, or maybe even getting its address failed. Check logs from gui-agent (should be in journalctl inside of VM) and X server logs (~/.local/share/xorg/Xorg.0.log). If nothing specific there, try enabling debug mode in the VM settings.

Member

marmarek commented Feb 19, 2017

Copying my comment from #2455:

0xfffffffffffffff2 is EFAULT returned from get_user_pages call, which suggests that the window composition buffer is no longer in memory, or maybe even getting its address failed. Check logs from gui-agent (should be in journalctl inside of VM) and X server logs (~/.local/share/xorg/Xorg.0.log). If nothing specific there, try enabling debug mode in the VM settings.

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Feb 20, 2017

My actual setup can be crashed this way quite predictably.
Will try to catch logs.

Maybe it does not provide any insightful info, but once the VM has its gui crashed, I tried logging out of dom0 and logging back (which in my view would provide a new and clean X.org session), and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

pdinoto commented Feb 20, 2017

My actual setup can be crashed this way quite predictably.
Will try to catch logs.

Maybe it does not provide any insightful info, but once the VM has its gui crashed, I tried logging out of dom0 and logging back (which in my view would provide a new and clean X.org session), and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Feb 20, 2017

Contributor

I tried logging out of dom0 and logging back, and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

FWIW, this is the behavior I always observe for all VMs with logging out/in of dom0, regardless of crashed state.

Contributor

jpouellet commented Feb 20, 2017

I tried logging out of dom0 and logging back, and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

FWIW, this is the behavior I always observe for all VMs with logging out/in of dom0, regardless of crashed state.

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Feb 20, 2017

I tried logging out of dom0 and logging back, and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

FWIW, this is the behavior I always observe for all VMs with logging out/in of dom0, regardless of crashed state.

Weird: I am used to fix pulseaudio issues in dom0 (weird state after docking my notebook) by loogging out/in, without losing any work being done on the VMs, as all windows appear back once log in; I just lose their screen position as all come back in the same XFCE workspace.

This appear/dissappear only happens on these crashed VMs.

pdinoto commented Feb 20, 2017

I tried logging out of dom0 and logging back, and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

FWIW, this is the behavior I always observe for all VMs with logging out/in of dom0, regardless of crashed state.

Weird: I am used to fix pulseaudio issues in dom0 (weird state after docking my notebook) by loogging out/in, without losing any work being done on the VMs, as all windows appear back once log in; I just lose their screen position as all come back in the same XFCE workspace.

This appear/dissappear only happens on these crashed VMs.

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Feb 22, 2017

I already tried to reproduce this a few days ago.

I tried it today again. But even if I carefully try to replicate the circumstances @emdete described on IRC I'm not able to reproduce it.

My blind guess is

a) The provided address is invalid and does not affects a normal X (for example something might try to map a NULL pointer for a short moment).
b) It's a special address which u2mfn can't map to a page. For example something allocated via an device file.

Given the unreproducibility b) is not very likely. And of course it's quite likely something completely different.

HW42 commented Feb 22, 2017

I already tried to reproduce this a few days ago.

I tried it today again. But even if I carefully try to replicate the circumstances @emdete described on IRC I'm not able to reproduce it.

My blind guess is

a) The provided address is invalid and does not affects a normal X (for example something might try to map a NULL pointer for a short moment).
b) It's a special address which u2mfn can't map to a page. For example something allocated via an device file.

Given the unreproducibility b) is not very likely. And of course it's quite likely something completely different.

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Feb 22, 2017

FWIW: I also tried other other cases like what @jpouellet describes, unfortunately without success.

HW42 commented Feb 22, 2017

FWIW: I also tried other other cases like what @jpouellet describes, unfortunately without success.

@joyfulmantis

This comment has been minimized.

Show comment
Hide comment
@joyfulmantis

joyfulmantis Mar 7, 2017

I am very reliably being able to reproduce this issue. The prime culprits are emacs (25) and gnome-terminal, although I vaguely remember other applications causing the crash too. emacs and gnome-terminal are both applications that rely on a different size (and resizing) system than other applications, noticeable in that if you slowly drag the window open there will be a little on top of the window bubble that tells you it's width and height in lines, they also can't usually resize by pixels, so occasionally even if you slowly drag the window across your full screen you may find that while it fills up most of the screen, there is a little bit that is smaller than a line's worth of pixels, and the window will not grow to fill that space. This is however not the case when they get maximized by clicking the maximization button or by for dragging it to the top of the screen on standard linux desktops.

The prime way for me to reproduce it is when one of those windows are open, be it either emacs or gnome terminal, and I try to resize it either on accident or on purpose by pulling it against the top bar (maximizing it) or against one of the sides (resizing it into either half the screen or a quarter of the screen). The gui almost always crashes for me under this scenario.

I am very reliably being able to reproduce this issue. The prime culprits are emacs (25) and gnome-terminal, although I vaguely remember other applications causing the crash too. emacs and gnome-terminal are both applications that rely on a different size (and resizing) system than other applications, noticeable in that if you slowly drag the window open there will be a little on top of the window bubble that tells you it's width and height in lines, they also can't usually resize by pixels, so occasionally even if you slowly drag the window across your full screen you may find that while it fills up most of the screen, there is a little bit that is smaller than a line's worth of pixels, and the window will not grow to fill that space. This is however not the case when they get maximized by clicking the maximization button or by for dragging it to the top of the screen on standard linux desktops.

The prime way for me to reproduce it is when one of those windows are open, be it either emacs or gnome terminal, and I try to resize it either on accident or on purpose by pulling it against the top bar (maximizing it) or against one of the sides (resizing it into either half the screen or a quarter of the screen). The gui almost always crashes for me under this scenario.

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Mar 21, 2017

Well, after experiencing this issue consistently but unable to capture any significant log, there are two things that may provide some pointers:

  • The issue dissapears if I enable the option "Hide content of windows..." both for "[ ] When moving" and "[ ] When resizing" into "Window Manager" for XFCE.
  • In one of the opportunities I could see that there was a an error for xrandr, about file not found IIRC. I noticed that when resizing there are lots of calls to xset and maybe xrandr, and there seems to be a race condition that makes the qubes-gui-agent crash. Could not copy the content the logs in that case, I am afraid.

pdinoto commented Mar 21, 2017

Well, after experiencing this issue consistently but unable to capture any significant log, there are two things that may provide some pointers:

  • The issue dissapears if I enable the option "Hide content of windows..." both for "[ ] When moving" and "[ ] When resizing" into "Window Manager" for XFCE.
  • In one of the opportunities I could see that there was a an error for xrandr, about file not found IIRC. I noticed that when resizing there are lots of calls to xset and maybe xrandr, and there seems to be a race condition that makes the qubes-gui-agent crash. Could not copy the content the logs in that case, I am afraid.
@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Mar 21, 2017

Contributor

Could not copy the content the logs in that case, I am afraid.

Perhaps you already know this and it is not the issue, but if the data you want is indeed in the logs, just you cannot retrieve it because the GUI has crashed and the logs do not persist across VM reboots, note that you can still get a console in the VM with:

[user@dom0 ~]$ sudo xl console your-vm-name

and log in as root with no password, and use qvm-copy-to-vm or similar to extract the relevant log files.

Contributor

jpouellet commented Mar 21, 2017

Could not copy the content the logs in that case, I am afraid.

Perhaps you already know this and it is not the issue, but if the data you want is indeed in the logs, just you cannot retrieve it because the GUI has crashed and the logs do not persist across VM reboots, note that you can still get a console in the VM with:

[user@dom0 ~]$ sudo xl console your-vm-name

and log in as root with no password, and use qvm-copy-to-vm or similar to extract the relevant log files.

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Mar 22, 2017

I finally found a reliable way to reproduce it :]

Will debug this further later today.

HW42 commented Mar 22, 2017

I finally found a reliable way to reproduce it :]

Will debug this further later today.

@unman

This comment has been minimized.

Show comment
Hide comment
@unman

unman Mar 22, 2017

Member
Member

unman commented Mar 22, 2017

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Mar 22, 2017

Don't be coy - how do you reproduce it?

I'm sorry didn't had the time yet to write it up and wanted to avoid duplicated work.

How I can reproduce it:

One time preparation:

  • create AppVM based on Debian stretch or sid
  • start evince
  • open /usr/share/doc/fontconfig/fontconfig-user.pdf.gz

Try to trigger trash:

  • shutdown VM if already running
  • qvm-run -a test evince
  • click on the pdf in the last recently used grid
  • repeat if gui-agent has not crashed

This triggers the crash for me most of the time in the first or second try ("worst" case 9 tries so far).

Given that the bug is time critical (see below) I would not be surprised if this does not work for you.

It seems that when opening the pdf there are two configure events. One with the old size and very shortly after it one with the new window size. When processing the first configure event the pointer for the window memory sometimes points to no longer mapped memory. Therefore mlock/u2mfn returns an error. I do not know yet what's the cause and if it's a bug in qubes-gui-agent, Xorg, or gtk.

HW42 commented Mar 22, 2017

Don't be coy - how do you reproduce it?

I'm sorry didn't had the time yet to write it up and wanted to avoid duplicated work.

How I can reproduce it:

One time preparation:

  • create AppVM based on Debian stretch or sid
  • start evince
  • open /usr/share/doc/fontconfig/fontconfig-user.pdf.gz

Try to trigger trash:

  • shutdown VM if already running
  • qvm-run -a test evince
  • click on the pdf in the last recently used grid
  • repeat if gui-agent has not crashed

This triggers the crash for me most of the time in the first or second try ("worst" case 9 tries so far).

Given that the bug is time critical (see below) I would not be surprised if this does not work for you.

It seems that when opening the pdf there are two configure events. One with the old size and very shortly after it one with the new window size. When processing the first configure event the pointer for the window memory sometimes points to no longer mapped memory. Therefore mlock/u2mfn returns an error. I do not know yet what's the cause and if it's a bug in qubes-gui-agent, Xorg, or gtk.

@HW42 HW42 referenced this issue in QubesOS/qubes-gui-agent-linux Mar 23, 2017

Merged

xf86-input-mfndev: don't access windows in input thread #12

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Mar 23, 2017

This seems to be a bug in our code. Newer Xsevers have a separate thread for input processing. So when we access the window object to get the memory pages with the image data we have a race condition with the main thread if the client changes the Pixmap.

QubesOS/qubes-gui-agent-linux@5ea68d2 should fix this.

HW42 commented Mar 23, 2017

This seems to be a bug in our code. Newer Xsevers have a separate thread for input processing. So when we access the window object to get the memory pages with the image data we have a race condition with the main thread if the client changes the Pixmap.

QubesOS/qubes-gui-agent-linux@5ea68d2 should fix this.

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Mar 24, 2017

Marek just uploaded a new version of the gui-agent (version 3.2.15, xserver-xorg-input-qubes in Debian and qubes-gui-vm in Fedora) which includes my patch to the testing repositories.

Please test if,
a) this actually resolves this issue for you,
b) don't cause any other new problems.

Thanks.

HW42 commented Mar 24, 2017

Marek just uploaded a new version of the gui-agent (version 3.2.15, xserver-xorg-input-qubes in Debian and qubes-gui-vm in Fedora) which includes my patch to the testing repositories.

Please test if,
a) this actually resolves this issue for you,
b) don't cause any other new problems.

Thanks.

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Mar 25, 2017

Thanks, @jpouellet. In that case, I was unable to transfers the logs because the issue makes the U2MFN_GET_MFN_FOR_PAGE error appear several times per second on Xorg.0 log, which if you are not fast enough makes the VM unresposive as /tmp fills up quickly.

Great, @HW42! I will check for the update and test it.

pdinoto commented Mar 25, 2017

Thanks, @jpouellet. In that case, I was unable to transfers the logs because the issue makes the U2MFN_GET_MFN_FOR_PAGE error appear several times per second on Xorg.0 log, which if you are not fast enough makes the VM unresposive as /tmp fills up quickly.

Great, @HW42! I will check for the update and test it.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Mar 25, 2017

Member

Marek just uploaded a new version of the gui-agent (version 3.2.15, xserver-xorg-input-qubes in Debian and qubes-gui-vm in Fedora) which includes my patch to the testing repositories.

Please test if,
a) this actually resolves this issue for you,
b) don't cause any other new problems.

Thanks.

Possible new problem:
QubesOS/updates-status#18 (comment)

Member

andrewdavidwong commented Mar 25, 2017

Marek just uploaded a new version of the gui-agent (version 3.2.15, xserver-xorg-input-qubes in Debian and qubes-gui-vm in Fedora) which includes my patch to the testing repositories.

Please test if,
a) this actually resolves this issue for you,
b) don't cause any other new problems.

Thanks.

Possible new problem:
QubesOS/updates-status#18 (comment)

@HW42 HW42 referenced this issue in QubesOS/qubes-gui-agent-linux Apr 1, 2017

Merged

xf86-input-mfndev: don't use QueueWorkProc() in input thread #13

@marmarek

This comment has been minimized.

Show comment
Hide comment
Member

marmarek commented Apr 1, 2017

Updated package: QubesOS/updates-status#20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment