Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upX.org high CPU usage after kernel upgrade #2839
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
commented
Jun 1, 2017
|
What is your hardware? CPU/GPU/RAM, etc. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 2, 2017
Also, is this Xorg CPU utilization happening in dom0 or in VMs? More details needed, please.
rtiangha
commented
Jun 2, 2017
|
Also, is this Xorg CPU utilization happening in dom0 or in VMs? More details needed, please. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rveldhoven
Jun 2, 2017
The CPU utilization happens in dom0. Sorry for not including all this beforehand.
I've not checked to see if it was happening in VMs, I could barely open a terminal to check it in dom0.
Shall I attach lspci -v output too?
rveldhoven
commented
Jun 2, 2017
•
|
The CPU utilization happens in dom0. Sorry for not including all this beforehand. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
commented
Jun 2, 2017
|
Sure, that'd be helpful. Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rveldhoven
commented
Jun 2, 2017
•
|
Appended lspci output here and in the issue description. |
andrewdavidwong
added
bug
C: kernel
labels
Jun 2, 2017
andrewdavidwong
added this to the Release 3.2 updates milestone
Jun 2, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 2, 2017
Thanks; this is helpful. Since your machine is super new, please try the 4.9.29 kernel that's available in current-testing first to see if the issue goes away:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing kernel
You could also try upping the Intel video card's video memory in the BIOS to something higher than 64MB; some users on the mail list have reported better success with 256 or 512MB (actually, 512 seems to be the consensus, especially for newer machines that wouldn't boot).
Try those things first and report back.
rtiangha
commented
Jun 2, 2017
•
|
Thanks; this is helpful. Since your machine is super new, please try the 4.9.29 kernel that's available in current-testing first to see if the issue goes away: sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing kernel You could also try upping the Intel video card's video memory in the BIOS to something higher than 64MB; some users on the mail list have reported better success with 256 or 512MB (actually, 512 seems to be the consensus, especially for newer machines that wouldn't boot). Try those things first and report back. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rveldhoven
Jun 2, 2017
I will do this tonight, right now I'm at work.
I don't recall being able to increase the video memory, I'll try to see if it's possible somewhere but I don't have much hope.
FWIW the current setup works really well, with some fiddling I got everything to work on this laptop. Even the wireless card, which needs kernel 4.6, was detected in sys-net after I guessed(since my dom0 kernel does not recognize it) which pci I needed to add to sys-net.
rveldhoven
commented
Jun 2, 2017
•
|
I will do this tonight, right now I'm at work. I don't recall being able to increase the video memory, I'll try to see if it's possible somewhere but I don't have much hope. FWIW the current setup works really well, with some fiddling I got everything to work on this laptop. Even the wireless card, which needs kernel 4.6, was detected in sys-net after I guessed(since my dom0 kernel does not recognize it) which pci I needed to add to sys-net. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rveldhoven
Jun 3, 2017
Sadly, no success.
The 4.9 kernel has exactly the same issue as the 4.4.67 one.
I've compared the Xorg logs from the working 4.4 and the 4.9, and they are identical, kernel command line was identical.
I do see several stack traces in dmesg output that are not present in the output from the working 4.4 kernel. I've attached them below (also getting files out of dom0 is a PITA).
I've also attached the complete dmesg output.
I was unable to find an option in my bios to change the amount of available video memory.
dmesg_output_kernel_4.9.txt
dmesg_output_kernel_4.9_stacktraces_only.txt
rveldhoven
commented
Jun 3, 2017
•
|
Sadly, no success. The 4.9 kernel has exactly the same issue as the 4.4.67 one. I've compared the Xorg logs from the working 4.4 and the 4.9, and they are identical, kernel command line was identical. I do see several stack traces in dmesg output that are not present in the output from the working 4.4 kernel. I've attached them below (also getting files out of dom0 is a PITA). I've also attached the complete dmesg output. I was unable to find an option in my bios to change the amount of available video memory. dmesg_output_kernel_4.9.txt |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 4, 2017
The W+X warnings can be ignored; it's a known issue with Xen and the debug option for that will disappear with the next kernel release.
It's strange that this X stuff is happening in dom0 and other Kaby Lake/Skylake users haven't reported the same symptoms (but it could be that they just haven't noticed it or there just aren't enough Kaby Lake Qubes users since the platform is so new). Would you be wiling to attach an Xorg log? The easiest way to do that would be to use the qvm-copy-to-vm command in dom0; that should work fine.
Also, if you could attach the dom0 kernel options from the EFI config file, that'd be helpful as a sanity check (in legacy systems with GRUB, the line is GRUB_CMDLINE_LINUX, but I don't have a system that works with EFI so I don't know what the equivalent is called there or where it's located). It'd be the same place where you'd put stuff like nomodeset i915.modeset=0 and nouveau.modeset=0 to make it persist between reboots.
Finally, this is just a shot in the dark, but have you checked with DELL for any BIOS updates? I've seen weirder things get fixed simply by updating the BIOS. If there is an update, it might be worth trying.
rtiangha
commented
Jun 4, 2017
•
|
The W+X warnings can be ignored; it's a known issue with Xen and the debug option for that will disappear with the next kernel release. It's strange that this X stuff is happening in dom0 and other Kaby Lake/Skylake users haven't reported the same symptoms (but it could be that they just haven't noticed it or there just aren't enough Kaby Lake Qubes users since the platform is so new). Would you be wiling to attach an Xorg log? The easiest way to do that would be to use the qvm-copy-to-vm command in dom0; that should work fine. Also, if you could attach the dom0 kernel options from the EFI config file, that'd be helpful as a sanity check (in legacy systems with GRUB, the line is GRUB_CMDLINE_LINUX, but I don't have a system that works with EFI so I don't know what the equivalent is called there or where it's located). It'd be the same place where you'd put stuff like nomodeset i915.modeset=0 and nouveau.modeset=0 to make it persist between reboots. Finally, this is just a shot in the dark, but have you checked with DELL for any BIOS updates? I've seen weirder things get fixed simply by updating the BIOS. If there is an update, it might be worth trying. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rveldhoven
Jun 4, 2017
Here are the logfiles. The 4.4 logfile is the one that performs normaly. Also attached is my xen.cfg, which is where the kernel commandline can be specified.
Note that although the 4.4.67 kernel (the one that also has performance issues), does not have the 'nomodeset ...' options, that's because I tried to see if the issue would go away if I left those out (they didn't). After that I didn't bother to put them back, because I was staying on 4.4.14 anyway.
I'm hesitant to apply bios updates, I'm convinced I can recover from any problem, except failed bios upgrades(since this thing is new, and one of those 'everything-is-soldered-on' types of laptops) and hardware failure. I'll see if I can find any success stories and if it looks safe I will apply any available updates.
rveldhoven
commented
Jun 4, 2017
•
|
Here are the logfiles. The 4.4 logfile is the one that performs normaly. Also attached is my xen.cfg, which is where the kernel commandline can be specified. Note that although the 4.4.67 kernel (the one that also has performance issues), does not have the 'nomodeset ...' options, that's because I tried to see if the issue would go away if I left those out (they didn't). After that I didn't bother to put them back, because I was staying on 4.4.14 anyway. I'm hesitant to apply bios updates, I'm convinced I can recover from any problem, except failed bios upgrades(since this thing is new, and one of those 'everything-is-soldered-on' types of laptops) and hardware failure. I'll see if I can find any success stories and if it looks safe I will apply any available updates. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 4, 2017
Thanks. I'll look at these later when I have more time.
I only ask about possible BIOS updates because with the Intel stuff, that's how Video BIOS updates are also distributed. Like I implied above, if this were a general Kaby Lake or Skylake problem, there should be more reports of it, but yours is the only one thus far. So it might be something with your system, and maybe it could be with the firmware that was shipped with it that got fixed later on since the platform is still so new. Then again, it could also be that the sample size is still small. But like I said, it's just a shot in the dark.
rtiangha
commented
Jun 4, 2017
•
|
Thanks. I'll look at these later when I have more time. I only ask about possible BIOS updates because with the Intel stuff, that's how Video BIOS updates are also distributed. Like I implied above, if this were a general Kaby Lake or Skylake problem, there should be more reports of it, but yours is the only one thus far. So it might be something with your system, and maybe it could be with the firmware that was shipped with it that got fixed later on since the platform is still so new. Then again, it could also be that the sample size is still small. But like I said, it's just a shot in the dark. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 4, 2017
Hmm, the logs look normal, at least on the surface.
Can you do what @v6ak suggested in #976 and post the results to the following questions using both the working kernels and non-working kernels in dom0:
- Is GPU acceleration used? With the new laptop, I have no GPU acceleration on Qubes with any kernel I have tried. You can verify it via glxinfo | grep renderer.
- Is a specific GPU kernel module used? If you have an Intel GPU, try sudo rmmod i915. If you are using the module, it should fail to remove the module with an appropriate error message. If it proceeds, the output is handled by some other (probably generic) module. My experience: Kernel 4.4 does not use it with new laptop, kernel 4.8 and 4.9 does.
rtiangha
commented
Jun 4, 2017
|
Hmm, the logs look normal, at least on the surface. Can you do what @v6ak suggested in #976 and post the results to the following questions using both the working kernels and non-working kernels in dom0:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 4, 2017
Regarding BIOS updates, you might want to update yours regardless. Looks like there was an update released on May 12 to address that nasty Intel ME exploit that was recently disclosed and Dell marks this one as URGENT (which it really is for those who are affected):
rtiangha
commented
Jun 4, 2017
•
|
Regarding BIOS updates, you might want to update yours regardless. Looks like there was an update released on May 12 to address that nasty Intel ME exploit that was recently disclosed and Dell marks this one as URGENT (which it really is for those who are affected): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rveldhoven
Jun 5, 2017
I've installed the bios update, no dice. The issue is unchanged. (EDIT: I managed to find a working configuration, more info below.)
The output of the suggestions of your second comment since my last comment are identical and included below.
On the working 4.4.14 kernel:
glxinfo | grep renderer
GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer,
GLX_MESA_query_renderer, GLX_OML_swap_method, GLX_SGIS_multisample,
OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.7, 256 bits)
rmmod i915 -> no output
On the non-working 4.9 kernel:
glxinfo | grep renderer
GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer,
GLX_MESA_query_renderer, GLX_OML_swap_method, GLX_SGIS_multisample,
OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.7, 256 bits)
rmmod i915 -> no output
EDIT
Since @v6ak mentioned in their second point that kernel 4.8 and up try to use i915 I decided to remove the nomodesetting related options from my 4.9 kernel commandline. The 4.9 kernel is now just as responsive as the 4.4-14 one.
Xorg CPU utilization is back to normal. Everything seems to still be working (USB, pci passthrough, wireless).
I'm still available for testing if you want to track down the source of this issue. Because this feels more like a mitigation to me than a fix seeing as the commandline for 4.4.14, 4.4.67 and 4.9 were at one point identical.
Anyway, this is my current kernel commandline for the 4.9 kernel that works for the Dell Precision 5520:
root=/dev/mapper/qubes_dom0-root rd.luks.uuid=luks-554a03c8-9599-4c1a-8741-75962d289c74 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap rhgb quiet
rveldhoven
commented
Jun 5, 2017
•
|
I've installed the bios update, no dice. The issue is unchanged. (EDIT: I managed to find a working configuration, more info below.) The output of the suggestions of your second comment since my last comment are identical and included below. On the working 4.4.14 kernel:
rmmod i915 -> no output On the non-working 4.9 kernel:
rmmod i915 -> no output EDIT Since @v6ak mentioned in their second point that kernel 4.8 and up try to use i915 I decided to remove the nomodesetting related options from my 4.9 kernel commandline. The 4.9 kernel is now just as responsive as the 4.4-14 one. Xorg CPU utilization is back to normal. Everything seems to still be working (USB, pci passthrough, wireless). I'm still available for testing if you want to track down the source of this issue. Because this feels more like a mitigation to me than a fix seeing as the commandline for 4.4.14, 4.4.67 and 4.9 were at one point identical. Anyway, this is my current kernel commandline for the 4.9 kernel that works for the Dell Precision 5520:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 6, 2017
The only thing left that I can think of to try is to boot with i915.preliminary_hw_support=1 and i915.preliminary_hw_support=0 kernel options and see if either makes a difference. It's supposed to be set to i915.preliminary_hw_support=1 by default in the kernel (i.e. no command line modification needed) in order to work around this:
https://www.phoronix.com/scan.php?page=news_item&px=intel-skl-prelim-support
That is the only meaningful change to the Intel video driver in 4.9 and perhaps a couple of the later 4.4 kernels.
rtiangha
commented
Jun 6, 2017
|
The only thing left that I can think of to try is to boot with i915.preliminary_hw_support=1 and i915.preliminary_hw_support=0 kernel options and see if either makes a difference. It's supposed to be set to i915.preliminary_hw_support=1 by default in the kernel (i.e. no command line modification needed) in order to work around this: https://www.phoronix.com/scan.php?page=news_item&px=intel-skl-prelim-support That is the only meaningful change to the Intel video driver in 4.9 and perhaps a couple of the later 4.4 kernels. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rtiangha
Jun 6, 2017
But if it works for you now, then I think you're good. nomodeset isn't really supposed to be set by default in the first place; it's only needed for those who have modesetting issues before X loads.
rtiangha
commented
Jun 6, 2017
|
But if it works for you now, then I think you're good. nomodeset isn't really supposed to be set by default in the first place; it's only needed for those who have modesetting issues before X loads. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mig5
Jun 18, 2017
Hi,
"It's strange that this X stuff is happening in dom0 and other Kaby Lake/Skylake users haven't reported the same symptoms (but it could be that they just haven't noticed it or there just aren't enough Kaby Lake Qubes users since the platform is so new)"
I reproduce this on a Thinkpad T450S, I was wondering WTH was going on, suspecting hardware failure due to the jump in CPU/IO :)
HCL attached.
Qubes-HCL-LENOVO-20BX0025AU-20170619-090246.txt
I am struggling to upgrade to the qubes-dom0-current-testing 4.9 kernel, I get a conflict error that I've never seen before. Any ideas how to resolve this one? Very keen to see if it fixes it for me too, the freeze ups occur every few minutes and make it very hard to work.
[miguel@dom0 ~]$ sudo /usr/bin/qubes-dom0-update --enablerepo=qubes-dom0-current-testing --clean kernel
ClockVM not started, exiting!
Using sys-firewall as UpdateVM to download updates for Dom0; this may take some time...
Running command on VM: 'sys-firewall'...
Running command on VM: 'sys-firewall'...
Cleaning repos: fedora qubes-dom0-current qubes-dom0-current-testing
: qubes-templates-itl updates
Cleaning up Everything
fedora/metalink | 3.1 kB 00:00
fedora | 3.8 kB 00:00
fedora/primary_db | 18 MB 00:11
qubes-dom0-current | 3.6 kB 00:00
qubes-dom0-current/primary_db | 529 kB 00:07
qubes-dom0-current-testing | 3.6 kB 00:00
qubes-dom0-current-testing/primary_db | 932 kB 00:08
qubes-templates-itl | 2.9 kB 00:00
qubes-templates-itl/primary_db | 5.6 kB 00:00
updates/metalink | 2.9 kB 00:00
updates | 4.7 kB 00:00
updates/primary_db | 9.0 MB 00:05
--> Running transaction check
---> Package kernel.x86_64 1000:4.9.31-17.pvops.qubes will be installed
--> Finished Dependency Resolution
kernel-4.9.31-17.pvops.qubes.x86_64.rpm | 40 MB 08:13
Redirecting to '/usr/bin/dnf --exclude= install kernel' (see 'man yum2dnf')
Qubes OS Repository for Dom0 32 MB/s | 62 kB 00:00
Package kernel-1000:4.4.14-11.pvops.qubes.x86_64 is already installed, skipping.
Package kernel-1000:4.4.67-12.pvops.qubes.x86_64 is already installed, skipping.
Dependencies resolved.
==========================================================================================================================================================
Package Arch Version Repository Size
==========================================================================================================================================================
Skipping packages with conflicts:
(add '--best --allowerasing' to command line to force their upgrade):
kernel x86_64 1000:4.9.31-17.pvops.qubes qubes-dom0-cached 40 M
Transaction Summary
==========================================================================================================================================================
Skip 1 Package
Nothing to do.
Complete!
mig5
commented
Jun 18, 2017
•
|
Hi, "It's strange that this X stuff is happening in dom0 and other Kaby Lake/Skylake users haven't reported the same symptoms (but it could be that they just haven't noticed it or there just aren't enough Kaby Lake Qubes users since the platform is so new)" I reproduce this on a Thinkpad T450S, I was wondering WTH was going on, suspecting hardware failure due to the jump in CPU/IO :) HCL attached. Qubes-HCL-LENOVO-20BX0025AU-20170619-090246.txt I am struggling to upgrade to the qubes-dom0-current-testing 4.9 kernel, I get a conflict error that I've never seen before. Any ideas how to resolve this one? Very keen to see if it fixes it for me too, the freeze ups occur every few minutes and make it very hard to work.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Selenog
Dec 14, 2017
I seem to be having the same issues on a 4.0-R3.0 and 4.0-R2.0 (which was upgraded to the R3). Let me know what info could be useful to provide (at work atm).
Selenog
commented
Dec 14, 2017
|
I seem to be having the same issues on a 4.0-R3.0 and 4.0-R2.0 (which was upgraded to the R3). Let me know what info could be useful to provide (at work atm). |
rveldhoven commentedJun 1, 2017
•
edited
Edited 1 time
-
rveldhoven
edited Jun 2, 2017 (most recent)
Qubes OS version (e.g.,
R3.2): 3.2Affected TemplateVMs (e.g.,
fedora-23, if applicable): dom0Expected behavior:
X.org CPU utilization to remain the same when upgrading from kernel 4.4.14-11 to 4.4.67-12.
The system to remain responsive and sleek.
Actual behavior:
X.org CPU utilization went from an low/high of 1.7%/5% to a low/high of 67%/70%.
The system is very sluggish, the cursor stutters and laggs when moving, window drawing/moving/resizing is visible and slow.
Steps to reproduce the behavior:
/
General notes:
It might be relevant to know that I had issues with X during the installation too. I had run with the following kernel commandline to get the graphical installer to work: 'nomodeset i915.modeset=0 nouveau.modeset=0'. Without these options the graphical installer would crash and revert to the text based installer.
These settings have carried over into my current configuration. Both kernels included these settings and produced the aforementioned results.
System info:
Device: Dell precision 5520
Memory: Memory installed 32768 MB DDR4SDRAM (2 slots of 16 GB)
CPU: Intel(R) Core(TM) i7-7820HQ CPU @2.90GHz
Core Count: 4
L2: 1MB
L3: 8MB
HT: Yes
64-bit: Yes
Video: Intel(R) kabylake Graphics
Video memory: 64MB
And
Video: nvidia Quadro M1200
Video memory: 4096 MB GGDR5
(I think this one is not loaded, since nouveau is blacklisted).
Storage:
M2 PCIe SSD-0 1 TB
Related issues:
#976
Attached files:
lspci_output.txt