New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dom0 stuck qui tray icons (and sometimes whole xfce desktop) #3440

Closed
na-- opened this Issue Jan 3, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@na--

na-- commented Jan 3, 2018

Qubes OS version:

R4.0 RC3

Affected TemplateVMs:

None (dom0 issue)


Steps to reproduce the behavior:

I'm not sure. Yesterday I updated everything (with qubes-dom0-current-testing enabled) and installed @kde-desktop-qubes (but I'm still using xfce, I've never even started kde in dom0 yet).

Expected behavior:

Everything works like before.

Actual behavior:

The Qubes tray icons (domains and devices) start very slowly. After rebooting it took a few minutes for the icons to show in the tray at all and initially they were unresponsive. After a few minutes more they become somewhat responsive (clicking on them shows the domains/devices). Rebooting again does not fix anything. Sometimes the whole desktop becomes stuck and the only way I managed to fix it is by killing the qui.tray.domains and qui.tray.devices processes from TTY2.

Starting the tray applets manually (by running python3 -mqui.tray.domains for example) after I've killed them produces the same results - takes a few minutes to start and populate and freezes on system change(eg. starting a VM). I've managed to reliably freeze xfce by starting the domains applet, clicking on it to show the active domains and starting a VM after the vm list is shown.

General notes:

I've skimmed through the changes in updates-status and can't find some change that's the obvious culprit. I'd appreciate any help where to look for the issue or which package to downgrade.

For the moment, killing and not using the tray icons (not a huge loss anyway) seems to prevent the desktop freezing, but I'm not sure if the tray icons are the root cause or if it's something else.


Related issues:

None that I could find

@lunarthegrey

This comment has been minimized.

Show comment
Hide comment
@lunarthegrey

lunarthegrey Feb 12, 2018

Been a while since this issue was reported. Did you try upgrading packages recently? Do you still have the same issue?

Been a while since this issue was reported. Did you try upgrading packages recently? Do you still have the same issue?

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Feb 12, 2018

I still have the same issue, but I last updated the dom0 packages around 10 days ago. I'll close this after the next update if everything is fixed.

na-- commented Feb 12, 2018

I still have the same issue, but I last updated the dom0 packages around 10 days ago. I'll close this after the next update if everything is fixed.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 13, 2018

Member

I assume starting it manually does not produce any useful message?
How many VMs do you have? Is that process using some CPU, or just idling during its long startup?

Member

marmarek commented Feb 13, 2018

I assume starting it manually does not produce any useful message?
How many VMs do you have? Is that process using some CPU, or just idling during its long startup?

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Feb 13, 2018

Just updated dom0, including current-testing, and at least part of the issue is sill present - the system tray widgets start very slowly. The domains widget excruciatingly so - it needs a minute to actually show in the tray as a blank space and a few minutes more to show the qubes icon and be responsive. I played around with it for a bit and did not manage to freeze the whole xfce desktop like before, so that part may be fixed, but I'll try running the widgets for longer to check for sure.

After I killed the widgets with pkill -fc qui.tray, I ran python3 -mqui.tray.domains and timed how long it would take to start. It took roughly 3 minutes and 30 seconds... I have ~40 VMs, though of course I run only several at a time. In this particular case there were only 5 VMs running besides dom0.

I did not notice high CPU usage, but when I ran strace on the process I noticed that it regularly got stuck on a wait4 for different process ids. It would get stuck waiting for a process id for some time, then it would continue and a second later get stuck waiting for another process id. Some of the processes to which the IDs belonged were notify-send Domain sys-net is started, notify-send Domain sys-firewall is started and so on. Hope that helps with diagnosing the issue.

na-- commented Feb 13, 2018

Just updated dom0, including current-testing, and at least part of the issue is sill present - the system tray widgets start very slowly. The domains widget excruciatingly so - it needs a minute to actually show in the tray as a blank space and a few minutes more to show the qubes icon and be responsive. I played around with it for a bit and did not manage to freeze the whole xfce desktop like before, so that part may be fixed, but I'll try running the widgets for longer to check for sure.

After I killed the widgets with pkill -fc qui.tray, I ran python3 -mqui.tray.domains and timed how long it would take to start. It took roughly 3 minutes and 30 seconds... I have ~40 VMs, though of course I run only several at a time. In this particular case there were only 5 VMs running besides dom0.

I did not notice high CPU usage, but when I ran strace on the process I noticed that it regularly got stuck on a wait4 for different process ids. It would get stuck waiting for a process id for some time, then it would continue and a second later get stuck waiting for another process id. Some of the processes to which the IDs belonged were notify-send Domain sys-net is started, notify-send Domain sys-firewall is started and so on. Hope that helps with diagnosing the issue.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 13, 2018

Member

Try strace -t -f -e execve python3 -mqui.tray.domains to find what are those processes.

Member

marmarek commented Feb 13, 2018

Try strace -t -f -e execve python3 -mqui.tray.domains to find what are those processes.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 13, 2018

Member

On a system with ~50 VMs, and ~20 of them running, it takes about 2s to start it up. Including all the calls to notify-send to display "Domain ... is started"...

Is it the only slow thing on the system, or is it generally slow?

Member

marmarek commented Feb 13, 2018

On a system with ~50 VMs, and ~20 of them running, it takes about 2s to start it up. Including all the calls to notify-send to display "Domain ... is started"...

Is it the only slow thing on the system, or is it generally slow?

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Feb 13, 2018

The system is generally quite fast, and it seems even faster right now after the latest updates and mostly PHV VMs... The only slow parts are the widgets. Event the newly resurrected qube manager desktop application loads in just a few seconds...

Here's a log of the strace output (and prepended timestamps on every line so it's more obvious where's the slowdown) with only 3 VMs besides dom0:

$ strace -t -f -e execve python3 -mqui.tray.domains 2>&1 | gawk '{ print strftime("[%H:%M:%S]"), $0 }'
[00:34:31] 00:34:31 execve("/usr/bin/python3", ["python3", "-mqui.tray.domains"], 0x7ffed3923158 /* 45 vars */) = 0
[00:34:31] strace: Process 32600 attached
[00:34:31] strace: Process 32601 attached
[00:34:31] strace: Process 32602 attached
[00:34:31] strace: Process 32603 attached
[00:34:31] [pid 32603] 00:34:31 execve("/usr/local/bin/notify-send", ["notify-send", "Domain sys-firewall is started"], 0x7ffd81bbb0e0 /* 45 vars */) = -1 ENOENT (No such file or directory)
[00:34:31] [pid 32603] 00:34:31 execve("/usr/bin/notify-send", ["notify-send", "Domain sys-firewall is started"], 0x7ffd81bbb0e0 /* 45 vars */) = 0
[00:34:31] strace: Process 32604 attached
[00:34:31] strace: Process 32605 attached
[00:34:47] [pid 32602] 00:34:47 +++ exited with 0 +++
[00:35:21] [pid 32605] 00:35:21 +++ exited with 0 +++
[00:35:21] [pid 32604] 00:35:21 +++ exited with 0 +++
[00:35:21] [pid 32603] 00:35:21 +++ exited with 0 +++
[00:35:21] [pid 32599] 00:35:21 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32603, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
[00:35:22] strace: Process 32656 attached
[00:35:22] [pid 32656] 00:35:22 execve("/usr/local/bin/notify-send", ["notify-send", "Domain sys-net is started"], 0x7ffd81bbb0e0 /* 45 vars */) = -1 ENOENT (No such file or directory)
[00:35:22] [pid 32656] 00:35:22 execve("/usr/bin/notify-send", ["notify-send", "Domain sys-net is started"], 0x7ffd81bbb0e0 /* 45 vars */) = 0
[00:35:22] strace: Process 32657 attached
[00:35:22] strace: Process 32658 attached
[00:35:49] [pid 32658] 00:35:49 ????( <unfinished ...>
[00:35:49] [pid 32658] 00:35:49 +++ exited with 0 +++
[00:35:49] [pid 32657] 00:35:49 +++ exited with 0 +++
[00:35:49] [pid 32656] 00:35:49 +++ exited with 0 +++
[00:35:49] [pid 32599] 00:35:49 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32656, si_uid=1000, si_status=0, si_utime=0, si_stime=1} ---
[00:35:49] strace: Process 32667 attached
[00:35:49] [pid 32667] 00:35:49 execve("/usr/local/bin/notify-send", ["notify-send", "Domain sys-usb is started"], 0x7ffd81bbb0e0 /* 45 vars */) = -1 ENOENT (No such file or directory)
[00:35:49] [pid 32667] 00:35:49 execve("/usr/bin/notify-send", ["notify-send", "Domain sys-usb is started"], 0x7ffd81bbb0e0 /* 45 vars */) = 0
[00:35:49] strace: Process 32668 attached
[00:35:49] strace: Process 32669 attached
[00:36:24] [pid 32669] 00:36:24 +++ exited with 0 +++
[00:36:24] [pid 32668] 00:36:24 +++ exited with 0 +++
[00:36:24] [pid 32667] 00:36:24 +++ exited with 0 +++
[00:36:24] [pid 32599] 00:36:24 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32667, si_uid=1000, si_status=0, si_utime=0, si_stime=2} ---
[00:36:24] 
[00:36:24] (domains.py:32599): Gdk-CRITICAL **: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed

Coincidentally, I managed to freeze xfce with the latest widget tests, so that issue is still present as well. I think the problem there may be some kind of synchronization issue with xfce itself, something like it passes the click event to the widget in the main thread and since the widget is frozen xfce is frozen as well, or something like that.

na-- commented Feb 13, 2018

The system is generally quite fast, and it seems even faster right now after the latest updates and mostly PHV VMs... The only slow parts are the widgets. Event the newly resurrected qube manager desktop application loads in just a few seconds...

Here's a log of the strace output (and prepended timestamps on every line so it's more obvious where's the slowdown) with only 3 VMs besides dom0:

$ strace -t -f -e execve python3 -mqui.tray.domains 2>&1 | gawk '{ print strftime("[%H:%M:%S]"), $0 }'
[00:34:31] 00:34:31 execve("/usr/bin/python3", ["python3", "-mqui.tray.domains"], 0x7ffed3923158 /* 45 vars */) = 0
[00:34:31] strace: Process 32600 attached
[00:34:31] strace: Process 32601 attached
[00:34:31] strace: Process 32602 attached
[00:34:31] strace: Process 32603 attached
[00:34:31] [pid 32603] 00:34:31 execve("/usr/local/bin/notify-send", ["notify-send", "Domain sys-firewall is started"], 0x7ffd81bbb0e0 /* 45 vars */) = -1 ENOENT (No such file or directory)
[00:34:31] [pid 32603] 00:34:31 execve("/usr/bin/notify-send", ["notify-send", "Domain sys-firewall is started"], 0x7ffd81bbb0e0 /* 45 vars */) = 0
[00:34:31] strace: Process 32604 attached
[00:34:31] strace: Process 32605 attached
[00:34:47] [pid 32602] 00:34:47 +++ exited with 0 +++
[00:35:21] [pid 32605] 00:35:21 +++ exited with 0 +++
[00:35:21] [pid 32604] 00:35:21 +++ exited with 0 +++
[00:35:21] [pid 32603] 00:35:21 +++ exited with 0 +++
[00:35:21] [pid 32599] 00:35:21 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32603, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
[00:35:22] strace: Process 32656 attached
[00:35:22] [pid 32656] 00:35:22 execve("/usr/local/bin/notify-send", ["notify-send", "Domain sys-net is started"], 0x7ffd81bbb0e0 /* 45 vars */) = -1 ENOENT (No such file or directory)
[00:35:22] [pid 32656] 00:35:22 execve("/usr/bin/notify-send", ["notify-send", "Domain sys-net is started"], 0x7ffd81bbb0e0 /* 45 vars */) = 0
[00:35:22] strace: Process 32657 attached
[00:35:22] strace: Process 32658 attached
[00:35:49] [pid 32658] 00:35:49 ????( <unfinished ...>
[00:35:49] [pid 32658] 00:35:49 +++ exited with 0 +++
[00:35:49] [pid 32657] 00:35:49 +++ exited with 0 +++
[00:35:49] [pid 32656] 00:35:49 +++ exited with 0 +++
[00:35:49] [pid 32599] 00:35:49 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32656, si_uid=1000, si_status=0, si_utime=0, si_stime=1} ---
[00:35:49] strace: Process 32667 attached
[00:35:49] [pid 32667] 00:35:49 execve("/usr/local/bin/notify-send", ["notify-send", "Domain sys-usb is started"], 0x7ffd81bbb0e0 /* 45 vars */) = -1 ENOENT (No such file or directory)
[00:35:49] [pid 32667] 00:35:49 execve("/usr/bin/notify-send", ["notify-send", "Domain sys-usb is started"], 0x7ffd81bbb0e0 /* 45 vars */) = 0
[00:35:49] strace: Process 32668 attached
[00:35:49] strace: Process 32669 attached
[00:36:24] [pid 32669] 00:36:24 +++ exited with 0 +++
[00:36:24] [pid 32668] 00:36:24 +++ exited with 0 +++
[00:36:24] [pid 32667] 00:36:24 +++ exited with 0 +++
[00:36:24] [pid 32599] 00:36:24 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32667, si_uid=1000, si_status=0, si_utime=0, si_stime=2} ---
[00:36:24] 
[00:36:24] (domains.py:32599): Gdk-CRITICAL **: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed

Coincidentally, I managed to freeze xfce with the latest widget tests, so that issue is still present as well. I think the problem there may be some kind of synchronization issue with xfce itself, something like it passes the click event to the widget in the main thread and since the widget is frozen xfce is frozen as well, or something like that.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 13, 2018

Member

Do you see those notifications? Maybe the problem is that each call to notify-send needs to wait until some timeout, because of crashed notification service? Check if you have xfce4-notifyd running.

Member

marmarek commented Feb 13, 2018

Do you see those notifications? Maybe the problem is that each call to notify-send needs to wait until some timeout, because of crashed notification service? Check if you have xfce4-notifyd running.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Feb 13, 2018

Yeah, that was the problem... Running systemctl --user start xfce4-notifyd fixes the issue, the domains widget starts in a flash. I'm not sure why xfce4-notifyd was not running, I think that maybe installing @kde-desktop-qubes messed something up (since knotify4 for some reason was already running in xfce). Thank you very much for the help!

na-- commented Feb 13, 2018

Yeah, that was the problem... Running systemctl --user start xfce4-notifyd fixes the issue, the domains widget starts in a flash. I'm not sure why xfce4-notifyd was not running, I think that maybe installing @kde-desktop-qubes messed something up (since knotify4 for some reason was already running in xfce). Thank you very much for the help!

@na-- na-- closed this Feb 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment