Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upKDE frozen on first boot after install / OpenGL causes unrelated applications to crash in dom0? #1680
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 23, 2016
Member
This doesn't look good:
Jan 22 23:55:59 dom0 kernel: [ 312.741783] systemd[10818]: segfault at 5ec6 ip 0000000000005ec6 sp 00007ffe10b52368 error 14 in systemd[557233f05000+10d000]
Jan 22 23:55:59 dom0 kernel: [ 312.742478] BUG: Bad rss-counter state mm:ffff880005e3db00 idx:1 val:30
Jan 22 23:55:59 dom0 kernel: systemd[10818]: segfault at 5ec6 ip 0000000000005ec6 sp 00007ffe10b52368 error 14 in systemd[557233f05000+10d000]
Jan 22 23:55:59 dom0 kernel: BUG: Bad rss-counter state mm:ffff880005e3db00 idx:1 val:30
Jan 22 23:55:59 dom0 kernel: [ 312.747785] BUG: Bad rss-counter state mm:ffff880005e3b800 idx:1 val:4
Jan 22 23:55:59 dom0 kernel: BUG: Bad rss-counter state mm:ffff880005e3b800 idx:1 val:4
Jan 22 23:55:59 dom0 systemd: /usr/lib/systemd/system-generators/systemd-rc-local-generator terminated by signal SEGV.
Not sure what is the cause - it may be a kernel bug, but may be also some Xen bug or even hardware (memory?) problem.
As soon as I logged in there was a black screen with just one KDE button on lower left and the entire GUI frozen.
Disabling or enabling composition may (or may not) help. Alt+Shift+F12 by default. In that system state, probably require some patience to actually being switched...
Back to XFCE again, and 'Qubes VM Manager' didn't want to start, tried it several times from dom0 console too.
qubes-manager has a code to prevent being running in multiple instances. This probably means you have one already running somewhere (maybe hanging or something). If you kill that instance and still have the problem, you probably will get some error message on console during starting the process.
Qubes HCL Files are copied to: 'dom0' Qubes-HCL-ASUSTeK_COMPUTER_INC.-M5A99FX_PRO_R2.0-20160123-180117.yml - HCL InfoThat file would be useful.
It's better to write to qubes-users mailing list about problems related to specific hardware - there are more people, especially somebody might have had similar problem.
|
This doesn't look good:
Not sure what is the cause - it may be a kernel bug, but may be also some Xen bug or even hardware (memory?) problem.
Disabling or enabling composition may (or may not) help. Alt+Shift+F12 by default. In that system state, probably require some patience to actually being switched...
qubes-manager has a code to prevent being running in multiple instances. This probably means you have one already running somewhere (maybe hanging or something). If you kill that instance and still have the problem, you probably will get some error message on console during starting the process.
It's better to write to qubes-users mailing list about problems related to specific hardware - there are more people, especially somebody might have had similar problem. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
edwintorok
Jan 23, 2016
Qubes-HCL-ASUSTeK_COMPUTER_INC.-M5A99FX_PRO_R2.0-20160123-180117.yml
Interestingly I don't have any 'bad rss counter' entries this afternoon in my dmesg, but when I installed it last night it happened quite a lot.
edwintorok
commented
Jan 23, 2016
|
Qubes-HCL-ASUSTeK_COMPUTER_INC.-M5A99FX_PRO_R2.0-20160123-180117.yml Interestingly I don't have any 'bad rss counter' entries this afternoon in my dmesg, but when I installed it last night it happened quite a lot. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
edwintorok
Jan 23, 2016
Logging in to KDE triggers a series of these 'BUG: Bad rss-counter' errors, and pretty much everything segfaults eventually, until the system reboots on its own. Even sudo ls -l crashed with a message about bash memory corruption in libc: https://gist.github.com/edwintorok/179fcb8090f49d8a0163.
Wouldn't be surprised if this is related to OpenGL. AFAIK Xfce doesn't use it and KDE would, so I launched glxgears under Xfce, and did a watch 'dmesg -T|tail'. The crashes started happening soon enough: https://gist.github.com/edwintorok/bee143ef33428084e450.
I stopped glxgears and the crashes stopped happening. So does OpenGL (or its kernel part) corrupt other application's or kernel memory when run under Xen? Could IOMMU be used to somehow limit/diagnose that?
(FWIW OpenGL works fine on debian jessie or debian jessie+backports, the only similar trouble I had with this video card where some GPU hangs long ago with older versions of the drivers, but those would always result in a quite lengthy GPU hang message in dmesg).
edwintorok
commented
Jan 23, 2016
|
Logging in to KDE triggers a series of these 'BUG: Bad rss-counter' errors, and pretty much everything segfaults eventually, until the system reboots on its own. Even Wouldn't be surprised if this is related to OpenGL. AFAIK Xfce doesn't use it and KDE would, so I launched |
edwintorok
changed the title from
KDE frozen on first boot after install
to
KDE frozen on first boot after install / OpenGL causes unrelated applications to crash in dom0?
Jan 23, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
edwintorok
Jan 23, 2016
I wasn't able to reproduce this with Debian + Xen 4.6 and glxgears.
The kernel on Debian is newer than Qubes's but Mesa is a minor version older (10.3.2 vs 10.3.3).
Also on Debian all I've done is run it under Xen as dom0, I didn't start any other VMs (Qubes would always start a sys-net and firewall VM).
$ uname -a
Linux debian 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19) x86_64 GNU/Linux
$ glxinfo|grep OpenGL.*string
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD RV730
OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.3.2
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.0 Mesa 10.3.2
OpenGL shading language version string: 1.30
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 10.3.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.0
$ sudo xl info
host : debian
release : 4.3.0-0.bpo.1-amd64
version : #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19)
machine : x86_64
nr_cpus : 8
max_cpu_id : 7
nr_nodes : 1
cores_per_socket : 4
threads_per_core : 2
cpu_mhz : 4013
hw_caps : 178bf3ff:2fd3fbff:00000000:00001700:36983203:00000000:01ebbfff:00000008
virt_caps : hvm hvm_directio
total_memory : 12186
free_memory : 153
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 6
xen_extra : .0
xen_version : 4.6.0
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder
cc_compiler : gcc (Debian 5.2.1-23) 5.2.1 20151028
cc_compile_by : waldi
cc_compile_domain : debian.org
cc_compile_date : Sun Nov 1 20:52:41 UTC 2015
xend_config_format : 4
edwintorok
commented
Jan 23, 2016
|
I wasn't able to reproduce this with Debian + Xen 4.6 and glxgears.
|
andrewdavidwong
added
the
question
label
Apr 6, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Apr 6, 2016
Member
I'm assuming this issue has been resolved based on the lack of recent activity. If not, please feel free to re-open it.
|
I'm assuming this issue has been resolved based on the lack of recent activity. If not, please feel free to re-open it. |
andrewdavidwong
closed this
Apr 6, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Bufil
commented
May 31, 2016
|
I have the same issue. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Jun 1, 2016
Member
Ok, re-opening since more than one person has this issue. @marmarek, how should this one be labeled?
|
Ok, re-opening since more than one person has this issue. @marmarek, how should this one be labeled? |
andrewdavidwong
reopened this
Jun 1, 2016
marmarek
added
C: kernel
bug
labels
Jun 1, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jun 1, 2016
Member
@edwintorok @Bufil there is pre-R3.2 test image in #1807 (comment) - it has newer dom0 kernel, and X server (+drivers). It is also possible to install kernel 4.4.x in R3.1, from qubes-dom0-unstable repository.
If you could check if any of those solve the problem, that would be great.
|
@edwintorok @Bufil there is pre-R3.2 test image in #1807 (comment) - it has newer dom0 kernel, and X server (+drivers). It is also possible to install kernel 4.4.x in R3.1, from |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Bufil
Jun 4, 2016
Seems to be a AMD-XEN-KERNEL combination bug.
With the same GPU and an Intel Mainboard this does not happen.
Kernel 4.4.x Changes nothing.
Bufil
commented
Jun 4, 2016
|
Seems to be a AMD-XEN-KERNEL combination bug. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
edwintorok
Jun 5, 2016
@marmarek: good news! I installed kernel 4.4.10-9 on Qubes R3.1 and it didn't corrupt memory anymore for me:
- Installed Qubes R3.1 with Xfce (AFAICT it installed with legacy BIOS boot)
- Kernel 4.1.13-9 shows the memory corruption bug in dmesg and segfaults various applications within a few minutes of launching glxgears: dmesg from kernel 4.1.13-9
- Kernel 4.4.10-9 was able to run glxgears for 1.5 hours without any corruption messages in dmesg:
dmesg from kernel 4.4.10-9 - To double-check I rebooted to 4.1.13-9, within about 2 minutes dmesg has shown the corruption message and applications segfaulted
- Rebooted again to 4.1.10-9, and run glxgears for 10 minutes: all OK!
I can't say that the corruption bug has been definetely fixed (maybe it is just harder to reproduce, @Bufil said above that it is still an issue), but I wasn't able to reproduce it anymore.
Here is also a diff of dmesg 4.1.13 and 4.4.10: diff of dmesg 4.1.13-9 vs 4.4.10-9
If I want to test your Qubes-DVD-x86_64-20160518.iso should I perform a clean install, or can I use qubes-dom0-update to get the new X server you were refering to?
edwintorok
commented
Jun 5, 2016
|
@marmarek: good news! I installed kernel 4.4.10-9 on Qubes R3.1 and it didn't corrupt memory anymore for me:
I can't say that the corruption bug has been definetely fixed (maybe it is just harder to reproduce, @Bufil said above that it is still an issue), but I wasn't able to reproduce it anymore. Here is also a diff of dmesg 4.1.13 and 4.4.10: diff of dmesg 4.1.13-9 vs 4.4.10-9 If I want to test your Qubes-DVD-x86_64-20160518.iso should I perform a clean install, or can I use qubes-dom0-update to get the new X server you were refering to? |
andrewdavidwong
removed
the
question
label
Jun 6, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Bufil
Jun 20, 2016
I made a clean install of Qubes-R3.2-rc1.
After a view minutes, same Problem.
"BUG: Bad rss-counter state ..."
Bufil
commented
Jun 20, 2016
•
|
I made a clean install of Qubes-R3.2-rc1. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jun 20, 2016
Member
If I want to test your Qubes-DVD-x86_64-20160518.iso should I perform a clean install, or can I use qubes-dom0-update to get the new X server you were refering to?
Since R3.2-rc1 is out, better try this one.
Since R3.2-rc1 is out, better try this one. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
edwintorok
Jun 26, 2016
@marmarek I just tried R3.2-rc1, and got same results as Bufil: the 'BUG: Bad rss-counter' message is back, but is much harder to reproduce:
I installed R3.2-rc1 (selected KDE+Xfce), and just as I completed the first boot setup it crashed, and I've seen the 'BUG' message on my console. I had to hard reboot, because I was not able to login on any of the consoles or use Ctrl+Alt+Delete even. Here are the logs [*]
After the reboot I tried my usual glxgears test under Xfce and nothing crashed, there was no 'BUG' message. I even logged in to Plasma, repeated the glxgears test: no crash, no BUG message.
[*]: I had to chroot into Qubes to use journalctl to retrieve the logs, since now they are binary, good old /var/log/messages is empty.
edwintorok
commented
Jun 26, 2016
|
@marmarek I just tried R3.2-rc1, and got same results as Bufil: the 'BUG: Bad rss-counter' message is back, but is much harder to reproduce: I installed R3.2-rc1 (selected KDE+Xfce), and just as I completed the first boot setup it crashed, and I've seen the 'BUG' message on my console. I had to hard reboot, because I was not able to login on any of the consoles or use Ctrl+Alt+Delete even. Here are the logs [*] After the reboot I tried my usual glxgears test under Xfce and nothing crashed, there was no 'BUG' message. I even logged in to Plasma, repeated the glxgears test: no crash, no BUG message. [*]: I had to chroot into Qubes to use |
edwintorok commentedJan 23, 2016
I installed Qubes 3.1 RC2 on AMD system (see below for hcl output) using legacy boot (haven't tried installing with UEFI) using the
4.1.13-8.pvops.qubes.x86_64kernel, with KDE+Xfce4 on LVM on a SSD (not encrypted).On first boot I choose the defaults (create the default VMs, do NOT create usbvm), and logged in to KDE.
As soon as I logged in there was a black screen with just one KDE button on lower left and the entire GUI frozen. I could barely move the mouse (i.e. it pointer lagged like 10-30s), and clicking on the KDE button did nothing. After several minutes the 'Desktop' button appeared on upper right corner, but system still unusable.
The CPU fan was quite audible, so I guess it was using the CPU heavily.
I tried some keyboard shortcuts but nothing worked (Ctrl-Alt-F1, Ctrl-Alt,F2, Ctrl-Alt-Del, Ctrl-Alt-Backspace, Alt-PrintScreen-s). Usually I would've SSH-ed in from another machine, but I didn't have chance to set that up yet (this was still first boot).
I rebooted using the physical reset button, and this time logged in to Xfce.
Everything worked fine here, so I logged out, and logged back in to KDE, which again worked fine.
Back to XFCE again, and 'Qubes VM Manager' didn't want to start, tried it several times from dom0 console too.
From 1st boot: /var/log/messages
Are there any other relevant logfiles I could provide for this?