New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.2 current-testing] Screen rendering freezes after few seconds, some actions can refresh it #3829

Open
v6ak opened this Issue Apr 16, 2018 · 16 comments

Comments

Projects
None yet
4 participants
@v6ak

v6ak commented Apr 16, 2018

Qubes OS version:

R3.2

Affected component(s):

GUI


Steps to reproduce the behavior:

Try to interact with UI of some VM or Qubes Manager. (Some dom0 apps do work correctly, not sure why.)

Expected behavior:

UI is rendered.

Actual behavior:

After a short while (matter of seconds at most), the UI stops rendering. It accepts events, but it does not render the changes within the window.

When I switch to another window and back (using alt+tab), it temporarily fixes the issue – for few seconds or so.

General notes:

Crazy workaround: run while true; do notify-send x; sleep 1; done in dom0. At least in xfce, it causes notification balloons to appear, which fixes rendering. Well, UI has some minor hiccups (fractions of second).

This has happened after updating to qubes-dom0-current-testing (and reboot), because I wanted to fix for issue #3711.

EDIT: Summary of further findings:

  • It is screen rendering what freezes, not the window.
  • It does not affect all the screens in multiscreen setup. (Not researched further.)
  • Related to kernel version.
  • Related to compositor settings in Kwin/Xfwm: With compositor enabled, it works well.
  • Bug appears regardless of choice of Kwin/Xfwm (provided that both have compositor disabled).
  • Unable to reproduce it with R4.

@v6ak v6ak changed the title from [3.2 xurrent-testing] Rendering of almost any window freezes after a moment to [3.2 current-testing] Rendering of almost any window freezes after a moment Apr 16, 2018

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 16, 2018

Member

What kernel version you run in dom0? What GPU?
cc @HW42

Member

marmarek commented Apr 16, 2018

What kernel version you run in dom0? What GPU?
cc @HW42

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 16, 2018

$ uname -a
Linux dom0 4.14.18-1.pvops.qubes.x86_64 #1 SMP Thu Feb 8 19:37:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

GPU is Intel® HD Graphics 620 (in i7 7500U). The laptop has no dedicated GPU.

v6ak commented Apr 16, 2018

$ uname -a
Linux dom0 4.14.18-1.pvops.qubes.x86_64 #1 SMP Thu Feb 8 19:37:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

GPU is Intel® HD Graphics 620 (in i7 7500U). The laptop has no dedicated GPU.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 16, 2018

Member

Any kernel messages in dom0? Or X server log?

Member

marmarek commented Apr 16, 2018

Any kernel messages in dom0? Or X server log?

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 16, 2018

dmesg

  1. There are many audit-related messages, the most interesting ones look like this: [10237.696697] kauditd_printk_skb: 9 callbacks suppressed. For the rest, I have used grep -v audit.
  2. There are many messages like this:

[17537.426417] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[17537.426434] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[17537.426437] pcieport 0000:00:1c.0: device [8086:9d14] error status/mask=00000001/00002000
[17537.426439] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)

  1. I have also seen this:

[12556.055938] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: id=00e0
[12556.055972] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Transmitter ID)
[12556.055975] pcieport 0000:00:1c.0: device [8086:9d14] error status/mask=00001001/00002000
[12556.055976] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
[12556.055977] pcieport 0000:00:1c.0: [12] Replay Timer Timeout

[ 38.837317] pciback 0000:01:00.0: Driver tried to write to a read-only configuration space field at offset 0x110, size 4. This may be harmless, but if you have problems with your
device:
1) see permissive attribute in sysfs
2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.
[ 39.360523] pciback 0000:01:00.0: enabling device (0000 -> 0002)
[ 39.360783] xen: registering gsi 16 triggering 0 polarity 1
[ 39.360792] Already setup the GSI :16

  1. Many messages like this:

[ 49.026321] xen-blkback: backend/vbd/2/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants

  1. Tried grepping for bug, error and warn:

$ grep -i warn /tmp/dmesg.txt
[ 1.043464] Warning: Processor Platform Limit not supported.
[ 1.051299] i8042: Warning: Keylock active
$ grep -i error /tmp/dmesg.txt
[ 1.062335] RAS: Correctable Errors collector initialized.
(plus many AERs as reported above)
$ grep -i bug /tmp/dmesg.txt
[ 12.821958] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x201] vs fed40080 f80
[ 12.821967] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x201] vs fed40080 f80

  1. Of course, there are many boot-time-related messages, but I see nothing interesting there.

Xorg

For Xorg server, I have tried to compare Xorg.0.log{,.old}. I have removed times and sorted all the unique lines in order to get a good idea what has changed without much noise: https://gist.github.com/v6ak/b67ce6f501c74f7e617b4a12b38820cb

v6ak commented Apr 16, 2018

dmesg

  1. There are many audit-related messages, the most interesting ones look like this: [10237.696697] kauditd_printk_skb: 9 callbacks suppressed. For the rest, I have used grep -v audit.
  2. There are many messages like this:

[17537.426417] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[17537.426434] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[17537.426437] pcieport 0000:00:1c.0: device [8086:9d14] error status/mask=00000001/00002000
[17537.426439] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)

  1. I have also seen this:

[12556.055938] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: id=00e0
[12556.055972] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Transmitter ID)
[12556.055975] pcieport 0000:00:1c.0: device [8086:9d14] error status/mask=00001001/00002000
[12556.055976] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
[12556.055977] pcieport 0000:00:1c.0: [12] Replay Timer Timeout

[ 38.837317] pciback 0000:01:00.0: Driver tried to write to a read-only configuration space field at offset 0x110, size 4. This may be harmless, but if you have problems with your
device:
1) see permissive attribute in sysfs
2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.
[ 39.360523] pciback 0000:01:00.0: enabling device (0000 -> 0002)
[ 39.360783] xen: registering gsi 16 triggering 0 polarity 1
[ 39.360792] Already setup the GSI :16

  1. Many messages like this:

[ 49.026321] xen-blkback: backend/vbd/2/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants

  1. Tried grepping for bug, error and warn:

$ grep -i warn /tmp/dmesg.txt
[ 1.043464] Warning: Processor Platform Limit not supported.
[ 1.051299] i8042: Warning: Keylock active
$ grep -i error /tmp/dmesg.txt
[ 1.062335] RAS: Correctable Errors collector initialized.
(plus many AERs as reported above)
$ grep -i bug /tmp/dmesg.txt
[ 12.821958] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x201] vs fed40080 f80
[ 12.821967] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x201] vs fed40080 f80

  1. Of course, there are many boot-time-related messages, but I see nothing interesting there.

Xorg

For Xorg server, I have tried to compare Xorg.0.log{,.old}. I have removed times and sorted all the unique lines in order to get a good idea what has changed without much noise: https://gist.github.com/v6ak/b67ce6f501c74f7e617b4a12b38820cb

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 16, 2018

Member

What is connected to 0000:00:1c.0? You can get it using lspci -t and see bus number next to it, then back to lspci and see what device it is. For example on my system:

[marmarek@dom0 ~]$ lspci -t
-[0000:00]-+-00.0
(...)
           +-1c.0-[02]----00.0
(...)
[marmarek@dom0 ~]$ lspci
(...)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
(...)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)

But I guess it's unrelated to graphics, which probably is at 0000:00:02.0.

Can you compare kernel boot messages from pre-upgrade and the current one?

Member

marmarek commented Apr 16, 2018

What is connected to 0000:00:1c.0? You can get it using lspci -t and see bus number next to it, then back to lspci and see what device it is. For example on my system:

[marmarek@dom0 ~]$ lspci -t
-[0000:00]-+-00.0
(...)
           +-1c.0-[02]----00.0
(...)
[marmarek@dom0 ~]$ lspci
(...)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
(...)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)

But I guess it's unrelated to graphics, which probably is at 0000:00:02.0.

Can you compare kernel boot messages from pre-upgrade and the current one?

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 16, 2018

# lspci | grep 1c
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)

But I guess it's unrelated to graphics, which probably is at 0000:00:02.0.

You are right.

Can you compare kernel boot messages from pre-upgrade and the current one?

Is there an easy way to do it without modifying /boot/efi/EFI/qubes/xen.cfg? With legacy BIOS, I was able to choose another GRUB option. With UEFI boot, it automatically chooses the default one. I prefer not changing the config file in order to prevent potential stupid mistake that makes the system non-bootable.

v6ak commented Apr 16, 2018

# lspci | grep 1c
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)

But I guess it's unrelated to graphics, which probably is at 0000:00:02.0.

You are right.

Can you compare kernel boot messages from pre-upgrade and the current one?

Is there an easy way to do it without modifying /boot/efi/EFI/qubes/xen.cfg? With legacy BIOS, I was able to choose another GRUB option. With UEFI boot, it automatically chooses the default one. I prefer not changing the config file in order to prevent potential stupid mistake that makes the system non-bootable.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 16, 2018

Member

You need to choose different xen.efi file, not kernel. If your UEFI firmware allows you to choose boot entry, you can add additional one using efibootmgr. Something like this:

efibootmgr -v -c -d /dev/sda -p 1 -L "Qubes old" -l '\EFI\qubes\xen-old.efi'

BTW this way you can boot also older kernel - provide xen.cfg section name as extra argument to efibootmgr.

Member

marmarek commented Apr 16, 2018

You need to choose different xen.efi file, not kernel. If your UEFI firmware allows you to choose boot entry, you can add additional one using efibootmgr. Something like this:

efibootmgr -v -c -d /dev/sda -p 1 -L "Qubes old" -l '\EFI\qubes\xen-old.efi'

BTW this way you can boot also older kernel - provide xen.cfg section name as extra argument to efibootmgr.

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Apr 16, 2018

FWIW: I can't observe this on a R3.2, updated to testing, on a similar CPU (i7-7600U). (Although I didn't tried yet if using UEFI instead of BIOS mode changes something).

@v6ak:

This has happened after updating to qubes-dom0-current-testing (and reboot), because I wanted to fix for issue #3711.

What packages have been upgraded? (Take a look at /var/log/dnf.log*)

HW42 commented Apr 16, 2018

FWIW: I can't observe this on a R3.2, updated to testing, on a similar CPU (i7-7600U). (Although I didn't tried yet if using UEFI instead of BIOS mode changes something).

@v6ak:

This has happened after updating to qubes-dom0-current-testing (and reboot), because I wanted to fix for issue #3711.

What packages have been upgraded? (Take a look at /var/log/dnf.log*)

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 16, 2018

@marmarek I am not sure if I got you right. Not even sure if you want me to compare Linux kernels or Xens. Both were upgraded and I currently guess it is rather Linux kernel issue than Xen issue.

I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:

# efibootmgr -v -c -d /dev/sda -p 1 -L "Qubes old 4.9.56-21" -l /EFI/qubes/xen.efi 4.9.56-21.pvops.qubes.x86_64

(Hmm, I have used forward slashes instead of backslashes… But efibootmgr -v shows backslashes anyway. The only difference (except label) between those two items is the parameter with kernel identifier.)

Note that I don't see older xen.efi, it seems older Xen is not kept there.

@HW42 This is probably the part you have asked about:

Apr 15 22:11:30 DEBUG Completion plugin: Generating completion cache...
Apr 15 22:11:30 DEBUG --> Starting dependency resolution
Apr 15 22:11:30 DEBUG ---> Package kernel.x86_64 1000:4.14.18-1.pvops.qubes will be installed
Apr 15 22:11:30 DEBUG ---> Package kernel-qubes-vm.x86_64 1000:4.14.18-1.pvops.qubes will be installed
Apr 15 22:11:30 DEBUG ---> Package qubes-gpg-split-dom0.x86_64 2.0.28-1.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package qubes-gpg-split-dom0.x86_64 2.0.30-1.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package qubes-usb-proxy-dom0.noarch 1.0.16-1.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package qubes-usb-proxy-dom0.noarch 1.0.17-1.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-runtime.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-runtime.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-libs.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-libs.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-hvm.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-hvm.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-hypervisor.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-hypervisor.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-licenses.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-licenses.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package kernel.x86_64 1000:4.9.35-20.pvops.qubes will be erased
Apr 15 22:11:30 DEBUG --> Finished dependency resolution

v6ak commented Apr 16, 2018

@marmarek I am not sure if I got you right. Not even sure if you want me to compare Linux kernels or Xens. Both were upgraded and I currently guess it is rather Linux kernel issue than Xen issue.

I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:

# efibootmgr -v -c -d /dev/sda -p 1 -L "Qubes old 4.9.56-21" -l /EFI/qubes/xen.efi 4.9.56-21.pvops.qubes.x86_64

(Hmm, I have used forward slashes instead of backslashes… But efibootmgr -v shows backslashes anyway. The only difference (except label) between those two items is the parameter with kernel identifier.)

Note that I don't see older xen.efi, it seems older Xen is not kept there.

@HW42 This is probably the part you have asked about:

Apr 15 22:11:30 DEBUG Completion plugin: Generating completion cache...
Apr 15 22:11:30 DEBUG --> Starting dependency resolution
Apr 15 22:11:30 DEBUG ---> Package kernel.x86_64 1000:4.14.18-1.pvops.qubes will be installed
Apr 15 22:11:30 DEBUG ---> Package kernel-qubes-vm.x86_64 1000:4.14.18-1.pvops.qubes will be installed
Apr 15 22:11:30 DEBUG ---> Package qubes-gpg-split-dom0.x86_64 2.0.28-1.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package qubes-gpg-split-dom0.x86_64 2.0.30-1.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package qubes-usb-proxy-dom0.noarch 1.0.16-1.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package qubes-usb-proxy-dom0.noarch 1.0.17-1.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-runtime.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-runtime.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-libs.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-libs.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-hvm.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-hvm.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-hypervisor.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-hypervisor.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package xen-licenses.x86_64 2001:4.6.6-37.fc23 will be upgraded
Apr 15 22:11:30 DEBUG ---> Package xen-licenses.x86_64 2001:4.6.6-38.fc23 will be an upgrade
Apr 15 22:11:30 DEBUG ---> Package kernel.x86_64 1000:4.9.35-20.pvops.qubes will be erased
Apr 15 22:11:30 DEBUG --> Finished dependency resolution

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 16, 2018

Member

I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:

Try adding "placeholder 4.9.56-21.pvops.qubes.x86_64" instead, in some cases first argument is ignored... Also, I don't remember what encoding is expected (with -u or not).

Member

marmarek commented Apr 16, 2018

I have tried the following command, but it boots 4.14.18-1, despite of manually choosing the 4.9 option on boot:

Try adding "placeholder 4.9.56-21.pvops.qubes.x86_64" instead, in some cases first argument is ignored... Also, I don't remember what encoding is expected (with -u or not).

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 17, 2018

I was unable to do it using various variants of efibootmgr command. I have created plenty boot entries (with and without -u, with and without placeholder, with two arguments or with one argument ("placeholder 4.9.56-21.pvops.qubes.x86_64")). Everything boots the new kernel. So I have finally modified /boot/efi/EFI/qubes/xen.cfg and compared dmesgs. I have noticed that my original dmesg capture is a bit truncated, so I have captured freshly both of them. I have created the comparison using two methods, one deduplicates (so it is considerably shorter) and one preserves order: https://gist.github.com/v6ak/1b334e677c05be395d83c2d484c6857a

With the older kernel, it works well.

I have also noticed some interesting behvior with the new kernel and two monitors: Left monitor was primary and external (HDMI), right was secondary and internal. Without the hack, everything has worked OK on the primary monitor, but the secondary monitor behavior was buggy as described. I have tried to move a window to be on both of them, the result is a bit expected: Left part of the window was rendered properly, right part was buggy. I emphasize: The laptop has just one GPU. Absence of a dedicated GPU (and absence of troubles related to them) was in my selection criteria.

Should I dig deeper in the multimonitor behavior?

Maybe it stops rendering on some screens, but some actions (window movement, rendering of a new (or previously hidden) window* etc.) can refresh it. So, some applications (like XFCE terminal if run in dom0) can work well, while some others cannot.

*) In this case, “window” includes menus.

v6ak commented Apr 17, 2018

I was unable to do it using various variants of efibootmgr command. I have created plenty boot entries (with and without -u, with and without placeholder, with two arguments or with one argument ("placeholder 4.9.56-21.pvops.qubes.x86_64")). Everything boots the new kernel. So I have finally modified /boot/efi/EFI/qubes/xen.cfg and compared dmesgs. I have noticed that my original dmesg capture is a bit truncated, so I have captured freshly both of them. I have created the comparison using two methods, one deduplicates (so it is considerably shorter) and one preserves order: https://gist.github.com/v6ak/1b334e677c05be395d83c2d484c6857a

With the older kernel, it works well.

I have also noticed some interesting behvior with the new kernel and two monitors: Left monitor was primary and external (HDMI), right was secondary and internal. Without the hack, everything has worked OK on the primary monitor, but the secondary monitor behavior was buggy as described. I have tried to move a window to be on both of them, the result is a bit expected: Left part of the window was rendered properly, right part was buggy. I emphasize: The laptop has just one GPU. Absence of a dedicated GPU (and absence of troubles related to them) was in my selection criteria.

Should I dig deeper in the multimonitor behavior?

Maybe it stops rendering on some screens, but some actions (window movement, rendering of a new (or previously hidden) window* etc.) can refresh it. So, some applications (like XFCE terminal if run in dom0) can work well, while some others cannot.

*) In this case, “window” includes menus.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 17, 2018

Member

I'm looking into this diff and can't find anything obvious...

  • AER messages appears now probably because AER wasn't supported/enabled in the old kernel at all
  • 4.14 kernel have Spectre (some of) mitigations - if the problem would be purely about performance, I'd say it's related, but in this case I'm not so sure
  • there is a message about unknown 'preliminary_hw_support' option, which was renamed to 'alpha_support'; but it is also probably not a problem, because what was a preliminary support in 4.9, is already "stable" in 4.14.

Do you have iommu=no-igfx option for Xen set (options= in xen.cfg)? I've seen similar problem fixed by adding it (but in that case, there were multiple error messages in dmesg...).
Is your screen rotate (I've seen something about it in your log)? If so, it might be similar to #3558 - it's a long shot, as the hardware is very different, but maybe... Try removing rotation setting.

Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process.

Member

marmarek commented Apr 17, 2018

I'm looking into this diff and can't find anything obvious...

  • AER messages appears now probably because AER wasn't supported/enabled in the old kernel at all
  • 4.14 kernel have Spectre (some of) mitigations - if the problem would be purely about performance, I'd say it's related, but in this case I'm not so sure
  • there is a message about unknown 'preliminary_hw_support' option, which was renamed to 'alpha_support'; but it is also probably not a problem, because what was a preliminary support in 4.9, is already "stable" in 4.14.

Do you have iommu=no-igfx option for Xen set (options= in xen.cfg)? I've seen similar problem fixed by adding it (but in that case, there were multiple error messages in dmesg...).
Is your screen rotate (I've seen something about it in your log)? If so, it might be similar to #3558 - it's a long shot, as the hardware is very different, but maybe... Try removing rotation setting.

Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process.

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 18, 2018

4.14 kernel have Spectre (some of) mitigations - if the problem would be purely about performance, I'd say it's related, but in this case I'm not so sure

It definitely isn't purely about performance. If it was, my workaround would make it rather worse.

Do you have iommu=no-igfx option for Xen set (options= in xen.cfg)?

No, I haven't.

Is your screen rotate (I've seen something about it in your log)? If so, it might be similar to #3558 - it's a long shot, as the hardware is very different, but maybe... Try removing rotation setting.

Well, no. I sometimes rotate the screen, but not this time.

Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process.

I'll try this first. If it does not help, I'll try iommu=no-igfx. In any case, I'll report the result here.

v6ak commented Apr 18, 2018

4.14 kernel have Spectre (some of) mitigations - if the problem would be purely about performance, I'd say it's related, but in this case I'm not so sure

It definitely isn't purely about performance. If it was, my workaround would make it rather worse.

Do you have iommu=no-igfx option for Xen set (options= in xen.cfg)?

No, I haven't.

Is your screen rotate (I've seen something about it in your log)? If so, it might be similar to #3558 - it's a long shot, as the hardware is very different, but maybe... Try removing rotation setting.

Well, no. I sometimes rotate the screen, but not this time.

Today we've uploaded updated 4.14 kernel (4.14.34), you may want to try it. But be careful to not remove 4.9 in the process.

I'll try this first. If it does not help, I'll try iommu=no-igfx. In any case, I'll report the result here.

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 18, 2018

  • New kernel behaves the same.
  • I have tried noretpoline (maybe the performance hit has caused previously unreachable race conditions), same behavior.
  • I have tried i915.alpha_support (with noretpoline), same behavior.
  • Option iommu=no-igfx creates bootloop (laptop reboots quickly, before any output by Xen/Linux is done).

For all my experiments (except iommu=no-igfx), I have recorded dmesg. If you believe there is something I should look for, I can.

v6ak commented Apr 18, 2018

  • New kernel behaves the same.
  • I have tried noretpoline (maybe the performance hit has caused previously unreachable race conditions), same behavior.
  • I have tried i915.alpha_support (with noretpoline), same behavior.
  • Option iommu=no-igfx creates bootloop (laptop reboots quickly, before any output by Xen/Linux is done).

For all my experiments (except iommu=no-igfx), I have recorded dmesg. If you believe there is something I should look for, I can.

@v6ak

This comment has been minimized.

Show comment
Hide comment
@v6ak

v6ak Apr 19, 2018

I have some new findings, which might make it better reproducible:

Qubes 3.2, 4.14.34-1.pvops.qubes.x86_64 with noretpoline and i915.alpha_support=1:

  • Works well with both Kwin and Xfwm4 if compositor is enabled.
  • If I disable compositor, it starts being buggy, regardless I use Xfwm or Kwin.
  • By default, compositor seems to be enabled in both, so this might be the reason why I am the first one that reports the issue.
  • It really looks like per-screen issue rather than per-window issue. Window movement/etc. is something that forces redraw.

Qubes 4, 4.14.18-1.pvops.qubes.x86_64

  • Although I had troubles with this exact kernel version with 3.2, I had no troubles with Qubes 4, trying both Xfwm and Kwin, both without compositor and with compositor.
  • Maybe some other component version (Xorg?) also influences the issue.

I ran Qubes 4 on the same hardware. I am aware of few differences that are IMHO not likely to have an impact there:

  • R3.2 boots from SSD, R4 boots from USB drive (and without sys-usb).
  • I had some troubles with getting R4 to boot. No UEFI boot entry was created automatically. I had to create one using UEFI GUI. I had to choose xen-(version).efi rather than just xen.efi.

v6ak commented Apr 19, 2018

I have some new findings, which might make it better reproducible:

Qubes 3.2, 4.14.34-1.pvops.qubes.x86_64 with noretpoline and i915.alpha_support=1:

  • Works well with both Kwin and Xfwm4 if compositor is enabled.
  • If I disable compositor, it starts being buggy, regardless I use Xfwm or Kwin.
  • By default, compositor seems to be enabled in both, so this might be the reason why I am the first one that reports the issue.
  • It really looks like per-screen issue rather than per-window issue. Window movement/etc. is something that forces redraw.

Qubes 4, 4.14.18-1.pvops.qubes.x86_64

  • Although I had troubles with this exact kernel version with 3.2, I had no troubles with Qubes 4, trying both Xfwm and Kwin, both without compositor and with compositor.
  • Maybe some other component version (Xorg?) also influences the issue.

I ran Qubes 4 on the same hardware. I am aware of few differences that are IMHO not likely to have an impact there:

  • R3.2 boots from SSD, R4 boots from USB drive (and without sys-usb).
  • I had some troubles with getting R4 to boot. No UEFI boot entry was created automatically. I had to create one using UEFI GUI. I had to choose xen-(version).efi rather than just xen.efi.

@v6ak v6ak changed the title from [3.2 current-testing] Rendering of almost any window freezes after a moment to [3.2 current-testing] Screen rendering freezes after few seconds, some actions can refresh it Apr 19, 2018

@HW42

This comment has been minimized.

Show comment
Hide comment
@HW42

HW42 Apr 19, 2018

If I disable compositor, it starts being buggy, regardless I use Xfwm or Kwin.

Good find. By disabling the "compositor" I can reproduce the problem.

HW42 commented Apr 19, 2018

If I disable compositor, it starts being buggy, regardless I use Xfwm or Kwin.

Good find. By disabling the "compositor" I can reproduce the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment