Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign up[3.2 current-testing] Screen rendering freezes after few seconds, some actions can refresh it #3829
Comments
v6ak
changed the title from
[3.2 xurrent-testing] Rendering of almost any window freezes after a moment
to
[3.2 current-testing] Rendering of almost any window freezes after a moment
Apr 16, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
What kernel version you run in dom0? What GPU? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
commented
Apr 16, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Any kernel messages in dom0? Or X server log? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 16, 2018
dmesg
- There are many audit-related messages, the most interesting ones look like this:
[10237.696697] kauditd_printk_skb: 9 callbacks suppressed. For the rest, I have used grep -v audit. - There are many messages like this:
[17537.426417] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[17537.426434] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[17537.426437] pcieport 0000:00:1c.0: device [8086:9d14] error status/mask=00000001/00002000
[17537.426439] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
- I have also seen this:
[12556.055938] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: id=00e0
[12556.055972] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Transmitter ID)
[12556.055975] pcieport 0000:00:1c.0: device [8086:9d14] error status/mask=00001001/00002000
[12556.055976] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
[12556.055977] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
[ 38.837317] pciback 0000:01:00.0: Driver tried to write to a read-only configuration space field at offset 0x110, size 4. This may be harmless, but if you have problems with your
device:
1) see permissive attribute in sysfs
2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.
[ 39.360523] pciback 0000:01:00.0: enabling device (0000 -> 0002)
[ 39.360783] xen: registering gsi 16 triggering 0 polarity 1
[ 39.360792] Already setup the GSI :16
- Many messages like this:
[ 49.026321] xen-blkback: backend/vbd/2/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants
- Tried grepping for bug, error and warn:
$ grep -i warn /tmp/dmesg.txt
[ 1.043464] Warning: Processor Platform Limit not supported.
[ 1.051299] i8042: Warning: Keylock active
$ grep -i error /tmp/dmesg.txt
[ 1.062335] RAS: Correctable Errors collector initialized.
(plus many AERs as reported above)
$ grep -i bug /tmp/dmesg.txt
[ 12.821958] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x201] vs fed40080 f80
[ 12.821967] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x201] vs fed40080 f80
- Of course, there are many boot-time-related messages, but I see nothing interesting there.
Xorg
For Xorg server, I have tried to compare Xorg.0.log{,.old}. I have removed times and sorted all the unique lines in order to get a good idea what has changed without much noise: https://gist.github.com/v6ak/b67ce6f501c74f7e617b4a12b38820cb
v6ak
commented
Apr 16, 2018
dmesg
XorgFor Xorg server, I have tried to compare Xorg.0.log{,.old}. I have removed times and sorted all the unique lines in order to get a good idea what has changed without much noise: https://gist.github.com/v6ak/b67ce6f501c74f7e617b4a12b38820cb |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Apr 16, 2018
Member
What is connected to 0000:00:1c.0? You can get it using lspci -t and see bus number next to it, then back to lspci and see what device it is. For example on my system:
[marmarek@dom0 ~]$ lspci -t
-[0000:00]-+-00.0
(...)
+-1c.0-[02]----00.0
(...)
[marmarek@dom0 ~]$ lspci
(...)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
(...)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
But I guess it's unrelated to graphics, which probably is at 0000:00:02.0.
Can you compare kernel boot messages from pre-upgrade and the current one?
|
What is connected to 0000:00:1c.0? You can get it using
But I guess it's unrelated to graphics, which probably is at 0000:00:02.0. Can you compare kernel boot messages from pre-upgrade and the current one? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 16, 2018
# lspci | grep 1c
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
But I guess it's unrelated to graphics, which probably is at 0000:00:02.0.
You are right.
Can you compare kernel boot messages from pre-upgrade and the current one?
Is there an easy way to do it without modifying /boot/efi/EFI/qubes/xen.cfg? With legacy BIOS, I was able to choose another GRUB option. With UEFI boot, it automatically chooses the default one. I prefer not changing the config file in order to prevent potential stupid mistake that makes the system non-bootable.
v6ak
commented
Apr 16, 2018
You are right.
Is there an easy way to do it without modifying /boot/efi/EFI/qubes/xen.cfg? With legacy BIOS, I was able to choose another GRUB option. With UEFI boot, it automatically chooses the default one. I prefer not changing the config file in order to prevent potential stupid mistake that makes the system non-bootable. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Apr 16, 2018
Member
You need to choose different xen.efi file, not kernel. If your UEFI firmware allows you to choose boot entry, you can add additional one using efibootmgr. Something like this:
efibootmgr -v -c -d /dev/sda -p 1 -L "Qubes old" -l '\EFI\qubes\xen-old.efi'
BTW this way you can boot also older kernel - provide xen.cfg section name as extra argument to efibootmgr.
|
You need to choose different xen.efi file, not kernel. If your UEFI firmware allows you to choose boot entry, you can add additional one using
BTW this way you can boot also older kernel - provide xen.cfg section name as extra argument to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
HW42
Apr 16, 2018
FWIW: I can't observe this on a R3.2, updated to testing, on a similar CPU (i7-7600U). (Although I didn't tried yet if using UEFI instead of BIOS mode changes something).
This has happened after updating to qubes-dom0-current-testing (and reboot), because I wanted to fix for issue #3711.
What packages have been upgraded? (Take a look at /var/log/dnf.log*)
HW42
commented
Apr 16, 2018
|
FWIW: I can't observe this on a R3.2, updated to testing, on a similar CPU (i7-7600U). (Although I didn't tried yet if using UEFI instead of BIOS mode changes something).
What packages have been upgraded? (Take a look at |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 16, 2018
@marmarek I am not sure if I got you right. Not even sure if you want me to compare Linux kernels or Xens. Both were upgraded and I currently guess it is rather Linux kernel issue than Xen issue.
I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:
# efibootmgr -v -c -d /dev/sda -p 1 -L "Qubes old 4.9.56-21" -l /EFI/qubes/xen.efi 4.9.56-21.pvops.qubes.x86_64
(Hmm, I have used forward slashes instead of backslashes… But efibootmgr -v shows backslashes anyway. The only difference (except label) between those two items is the parameter with kernel identifier.)
Note that I don't see older xen.efi, it seems older Xen is not kept there.
@HW42 This is probably the part you have asked about:
Apr 15 22:11:30 DEBUG Completion plugin: Generating completion cache...
Apr 15 22:11:30 DEBUG --> Starting dependency resolution
Apr 15 22:11:30 DEBUG ---> Package kernel.x86_64 1000:4.14.18-1.pvops.qubes will be installed
Apr 15 22:11:30 DEBUG ---> Package kernel-qubes-vm.x86_64 1000:4.14.18-1.pvops.qubes will be installed
Apr 15 22:11:30 DEBUG ---> Package qubes-gpg-split-dom0.x86_64 2.0.28-1.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package qubes-gpg-split-dom0.x86_64 2.0.30-1.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package qubes-usb-proxy-dom0.noarch 1.0.16-1.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package qubes-usb-proxy-dom0.noarch 1.0.17-1.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-runtime.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-runtime.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-libs.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-libs.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-hvm.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-hvm.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-hypervisor.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-hypervisor.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-licenses.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-licenses.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package kernel.x86_64 1000:4.9.35-20.pvops.qubes will be erased
Apr 15 22:11:30 DEBUG --> Finished dependency resolution
v6ak
commented
Apr 16, 2018
•
|
@marmarek I am not sure if I got you right. Not even sure if you want me to compare Linux kernels or Xens. Both were upgraded and I currently guess it is rather Linux kernel issue than Xen issue. I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:
(Hmm, I have used forward slashes instead of backslashes… But efibootmgr -v shows backslashes anyway. The only difference (except label) between those two items is the parameter with kernel identifier.) Note that I don't see older xen.efi, it seems older Xen is not kept there. @HW42 This is probably the part you have asked about:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Apr 16, 2018
Member
I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:
Try adding "placeholder 4.9.56-21.pvops.qubes.x86_64" instead, in some cases first argument is ignored... Also, I don't remember what encoding is expected (with -u or not).
Try adding "placeholder 4.9.56-21.pvops.qubes.x86_64" instead, in some cases first argument is ignored... Also, I don't remember what encoding is expected (with |
andrewdavidwong
added
bug
C: kernel
labels
Apr 17, 2018
andrewdavidwong
added this to the Release 3.2 updates milestone
Apr 17, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 17, 2018
I was unable to do it using various variants of efibootmgr command. I have created plenty boot entries (with and without -u, with and without placeholder, with two arguments or with one argument ("placeholder 4.9.56-21.pvops.qubes.x86_64")). Everything boots the new kernel. So I have finally modified /boot/efi/EFI/qubes/xen.cfg and compared dmesgs. I have noticed that my original dmesg capture is a bit truncated, so I have captured freshly both of them. I have created the comparison using two methods, one deduplicates (so it is considerably shorter) and one preserves order: https://gist.github.com/v6ak/1b334e677c05be395d83c2d484c6857a
With the older kernel, it works well.
I have also noticed some interesting behvior with the new kernel and two monitors: Left monitor was primary and external (HDMI), right was secondary and internal. Without the hack, everything has worked OK on the primary monitor, but the secondary monitor behavior was buggy as described. I have tried to move a window to be on both of them, the result is a bit expected: Left part of the window was rendered properly, right part was buggy. I emphasize: The laptop has just one GPU. Absence of a dedicated GPU (and absence of troubles related to them) was in my selection criteria.
Should I dig deeper in the multimonitor behavior?
Maybe it stops rendering on some screens, but some actions (window movement, rendering of a new (or previously hidden) window* etc.) can refresh it. So, some applications (like XFCE terminal if run in dom0) can work well, while some others cannot.
*) In this case, “window” includes menus.
v6ak
commented
Apr 17, 2018
|
I was unable to do it using various variants of efibootmgr command. I have created plenty boot entries (with and without -u, with and without placeholder, with two arguments or with one argument ( With the older kernel, it works well. I have also noticed some interesting behvior with the new kernel and two monitors: Left monitor was primary and external (HDMI), right was secondary and internal. Without the hack, everything has worked OK on the primary monitor, but the secondary monitor behavior was buggy as described. I have tried to move a window to be on both of them, the result is a bit expected: Left part of the window was rendered properly, right part was buggy. I emphasize: The laptop has just one GPU. Absence of a dedicated GPU (and absence of troubles related to them) was in my selection criteria. Should I dig deeper in the multimonitor behavior? Maybe it stops rendering on some screens, but some actions (window movement, rendering of a new (or previously hidden) window* etc.) can refresh it. So, some applications (like XFCE terminal if run in dom0) can work well, while some others cannot. *) In this case, “window” includes menus. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Apr 17, 2018
Member
I'm looking into this diff and can't find anything obvious...
- AER messages appears now probably because AER wasn't supported/enabled in the old kernel at all
- 4.14 kernel have Spectre (some of) mitigations - if the problem would be purely about performance, I'd say it's related, but in this case I'm not so sure
- there is a message about unknown 'preliminary_hw_support' option, which was renamed to 'alpha_support'; but it is also probably not a problem, because what was a preliminary support in 4.9, is already "stable" in 4.14.
Do you have iommu=no-igfx option for Xen set (options= in xen.cfg)? I've seen similar problem fixed by adding it (but in that case, there were multiple error messages in dmesg...).
Is your screen rotate (I've seen something about it in your log)? If so, it might be similar to #3558 - it's a long shot, as the hardware is very different, but maybe... Try removing rotation setting.
Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process.
|
I'm looking into this diff and can't find anything obvious...
Do you have Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 18, 2018
4.14 kernel have Spectre (some of) mitigations - if the problem would be purely about performance, I'd say it's related, but in this case I'm not so sure
It definitely isn't purely about performance. If it was, my workaround would make it rather worse.
Do you have iommu=no-igfx option for Xen set (options= in xen.cfg)?
No, I haven't.
Is your screen rotate (I've seen something about it in your log)? If so, it might be similar to #3558 - it's a long shot, as the hardware is very different, but maybe... Try removing rotation setting.
Well, no. I sometimes rotate the screen, but not this time.
Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process.
I'll try this first. If it does not help, I'll try iommu=no-igfx. In any case, I'll report the result here.
v6ak
commented
Apr 18, 2018
It definitely isn't purely about performance. If it was, my workaround would make it rather worse.
No, I haven't.
Well, no. I sometimes rotate the screen, but not this time.
I'll try this first. If it does not help, I'll try iommu=no-igfx. In any case, I'll report the result here. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 18, 2018
- New kernel behaves the same.
- I have tried noretpoline (maybe the performance hit has caused previously unreachable race conditions), same behavior.
- I have tried i915.alpha_support (with noretpoline), same behavior.
- Option iommu=no-igfx creates bootloop (laptop reboots quickly, before any output by Xen/Linux is done).
For all my experiments (except iommu=no-igfx), I have recorded dmesg. If you believe there is something I should look for, I can.
v6ak
commented
Apr 18, 2018
For all my experiments (except iommu=no-igfx), I have recorded dmesg. If you believe there is something I should look for, I can. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
v6ak
Apr 19, 2018
I have some new findings, which might make it better reproducible:
Qubes 3.2, 4.14.34-1.pvops.qubes.x86_64 with noretpoline and i915.alpha_support=1:
- Works well with both Kwin and Xfwm4 if compositor is enabled.
- If I disable compositor, it starts being buggy, regardless I use Xfwm or Kwin.
- By default, compositor seems to be enabled in both, so this might be the reason why I am the first one that reports the issue.
- It really looks like per-screen issue rather than per-window issue. Window movement/etc. is something that forces redraw.
Qubes 4, 4.14.18-1.pvops.qubes.x86_64
- Although I had troubles with this exact kernel version with 3.2, I had no troubles with Qubes 4, trying both Xfwm and Kwin, both without compositor and with compositor.
- Maybe some other component version (Xorg?) also influences the issue.
I ran Qubes 4 on the same hardware. I am aware of few differences that are IMHO not likely to have an impact there:
- R3.2 boots from SSD, R4 boots from USB drive (and without sys-usb).
- I had some troubles with getting R4 to boot. No UEFI boot entry was created automatically. I had to create one using UEFI GUI. I had to choose xen-(version).efi rather than just xen.efi.
v6ak
commented
Apr 19, 2018
|
I have some new findings, which might make it better reproducible: Qubes 3.2, 4.14.34-1.pvops.qubes.x86_64 with noretpoline and i915.alpha_support=1:
Qubes 4, 4.14.18-1.pvops.qubes.x86_64
I ran Qubes 4 on the same hardware. I am aware of few differences that are IMHO not likely to have an impact there:
|
v6ak
changed the title from
[3.2 current-testing] Rendering of almost any window freezes after a moment
to
[3.2 current-testing] Screen rendering freezes after few seconds, some actions can refresh it
Apr 19, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
HW42
Apr 19, 2018
If I disable compositor, it starts being buggy, regardless I use Xfwm or Kwin.
Good find. By disabling the "compositor" I can reproduce the problem.
HW42
commented
Apr 19, 2018
Good find. By disabling the "compositor" I can reproduce the problem. |
v6ak commentedApr 16, 2018
•
edited
Edited 1 time
-
v6ak
edited Apr 19, 2018 (most recent)
Qubes OS version:
R3.2
Affected component(s):
GUI
Steps to reproduce the behavior:
Try to interact with UI of some VM or Qubes Manager. (Some dom0 apps do work correctly, not sure why.)
Expected behavior:
UI is rendered.
Actual behavior:
After a short while (matter of seconds at most), the UI stops rendering. It accepts events, but it does not render the changes within the window.
When I switch to another window and back (using alt+tab), it temporarily fixes the issue – for few seconds or so.
General notes:
Crazy workaround: run
while true; do notify-send x; sleep 1; donein dom0. At least in xfce, it causes notification balloons to appear, which fixes rendering. Well, UI has some minor hiccups (fractions of second).This has happened after updating to qubes-dom0-current-testing (and reboot), because I wanted to fix for issue #3711.
EDIT: Summary of further findings: