Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI thread gets stuck on osx (at random?) #855

Closed
totaam opened this issue May 11, 2015 · 24 comments
Closed

UI thread gets stuck on osx (at random?) #855

totaam opened this issue May 11, 2015 · 24 comments

Comments

@totaam
Copy link
Collaborator

totaam commented May 11, 2015

Issue migrated from trac ticket # 855

component: platforms | priority: major | resolution: wontfix | keywords: osx

2015-05-11 17:06:16: antoine created the issue


Reported in #669#comment:40 and #1002, it looks like this:

UI thread is now blocked
UI thread is running again, resuming

The weird thing is that it isn't just the UI thread, the polling thread also gets stopped.
Funnily enough, sound keeps on playing, so not all threads are blocked!?

We should figure out why things gets stuck, and what we can do about it.
At least we probably shouldn't suspend then resume immediately: if the polling thread got stuck, we can just skip the events.

See also: #1003, #986

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:54:41: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:54:41: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:59:13: antoine changed priority from minor to major

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:59:13: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:59:13: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:59:13: antoine changed owner from antoine to afarr

@totaam
Copy link
Collaborator Author

totaam commented Oct 23, 2015

2015-10-23 07:59:13: antoine commented


@afarr: please provide the information requested in #1002#comment:1 here.

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2015

2015-10-31 00:36:24: afarr changed owner from afarr to antoine

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2015

2015-10-31 00:36:24: afarr commented


Is the gap between the "blocked" and "running again" always this low? (here just 1ms)
[[BR]]
Not always, but at least 9 times out of 10 the gap is between 0 and 23 ms.

Occasionally, when going through about 10 hours of logs with -d sound on to see if I could spot any cause for sound apparently stopping on sessions left idle overnight, I did see the UI thread is now blocked message with no indication of it resuming for 20 seconds or so, after which I see another UI thread is now blocked message followed by an immediate resumption message. (Which might be the result of overnight spinners, client trying to go to sleep, or who knows what, I suppose.)

Trying to connect with -d util, though, I see a never-ending stream of output repeating:

2015-10-30 17:02:53,207 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:53,207 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:53,208 UI_thread_wakeup()
2015-10-30 17:02:53,271 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,331 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,397 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,455 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,509 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,583 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,639 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,708 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:53,709 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:53,709 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,709 UI_thread_wakeup()
2015-10-30 17:02:53,769 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,826 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,878 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,954 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,006 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,082 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,137 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,209 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:54,209 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:54,210 UI_thread_wakeup()

... I'm not seeing any UI thread messages while the ... oops, typed too soon. I eventually saw a UI thread message, despite the util output:

2015-10-30 17:21:32,921 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,000 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,080 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,131 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,154 poll_UI_loop() last_UI_thread_time was 10.3 seconds ago (max 1), UI_blocked=False
2015-10-30 17:21:33,154 UI thread is now blocked
2015-10-30 17:21:33,155 UI_thread_wakeup()
2015-10-30 17:21:33,155 UI thread is running again, resuming

There was a longer than usual collection of 'info' callbacks before the UI thread was blocked, but it looks like that's included in the message (had nothing running at the time but an idle gedit window).

(Passing back to you so it doesn't get lost in the pile of my tickets.)

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2015

2015-10-31 10:02:57: antoine changed owner from antoine to afarr

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2015

2015-10-31 10:02:57: antoine commented


  • r11103 will log how long it actually takes to do the probing, maybe everything gets delayed on the system and not just the UI thread. This would then show up as warning looking like this:
Warning: long waiting time:
 UI thread polling waited 1.0 longer than intended (2.0 vs 1.0)
  • since 9 out of 10 are close to 1 second, maybe the machine goes into a sleep or low power mode and does not service timers as regularly as normal, if that's the case then maybe we just need to increase the polling timer, try running the client with:
XPRA_UI_THREAD_POLLING=2000 ...

(or even higher values?)
Which will increase the polling delay to 2 seconds (2000 milliseconds).


Just for the record:

  • these warnings also show up when the user activates the global menu - that's what this polling timer is for, we can't avoid this
  • I sometimes forget that the logger can use the module name to restrict logging to particular file, in this case -d xpra.platform.ui_thread_watcher is a much more restricted logging subset of -d util

@totaam
Copy link
Collaborator Author

totaam commented Nov 2, 2015

2015-11-02 23:42:33: afarr commented


Testing with the 0.16.0 r11118 osx client against a 0.16.0 r11118 fedora 21 server, with -d xpra.platform.ui_thread_watcher I'm not seeing any sign of the UI thread blocked warnings.

Ran for about 20 minutes, only thing I saw (aside from the expected UI message when opening the Session Info) was a lot of messages like:

2015-11-02 14:14:59,213 wait(0.5000) actually waited 0.5011
2015-11-02 14:14:59,213 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-11-02 14:14:59,213 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x9141bb0>]
2015-11-02 14:14:59,213 UI_thread_wakeup()

... which I'm guessing is expected.

  • Running without the debug flag, it took nearly 20 minutes before I got output...
2015-11-02 14:25:21,900 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-11-02 14:53:41,143 Warning: long timer waiting time,
2015-11-02 14:53:41,150  UI thread polling waited 9.9 seconds longer than intended (10.4 vs 0.5)
2015-11-02 14:53:41,157 UI thread is now blocked
2015-11-02 14:53:41,158 UI thread is running again, resuming
2015-11-02 14:54:02,176 server is not responding, drawing spinners over the windows
2015-11-02 14:54:02,434 server is OK again
2015-11-02 14:55:11,094 Warning: long timer waiting time,
2015-11-02 14:55:11,094  UI thread polling waited 9.8 seconds longer than intended (10.3 vs 0.5)
2015-11-02 14:55:11,094 UI thread is now blocked
2015-11-02 14:55:11,098 UI thread is running again, resuming
2015-11-02 14:56:11,303 Warning: long timer waiting time,
2015-11-02 14:56:11,303  UI thread polling waited 9.6 seconds longer than intended (10.1 vs 0.5)
2015-11-02 14:56:11,303 UI thread is now blocked
2015-11-02 14:56:11,303 UI thread is running again, resuming
2015-11-02 14:57:01,900 UI thread is now blocked
2015-11-02 14:57:01,904 UI thread is running again, resuming

It looks like there were several such messages in the wake of a spot of spinners? After a few more minutes I got a couple more.

2015-11-02 14:59:11,374 Warning: long timer waiting time,
2015-11-02 14:59:11,374  UI thread polling waited 9.8 seconds longer than intended (10.3 vs 0.5)
2015-11-02 14:59:11,374 UI thread is now blocked
2015-11-02 14:59:11,375 UI thread is running again, resuming

Looking at these numbers ("10.3 vs. 0.5"), I'm guessing that just upping the poling period to 2.0 seconds will be enough.

  • Trying, just to see what would happen, with XPRA_UI_THREAD_POLLING=3000, I saw no sign of the UI thread blocked messages after letting the session run for half an hour ... but I also had no instances of spinners - so maybe that's what's causing the polling delays at this point?

@totaam
Copy link
Collaborator Author

totaam commented Nov 3, 2015

2015-11-03 15:40:17: antoine commented


Testing with the 0.16.0 r11118 osx client against a 0.16.0 r11118 fedora 21 server, with -d xpra.platform.ui_thread_watcher I'm not seeing any sign of the UI thread blocked warnings.
[[BR]]
Has it magically fixed itself, or is it the logging that prevents it from sleeping: does it re-appear if you turn off all debugging?

@totaam
Copy link
Collaborator Author

totaam commented Nov 3, 2015

2015-11-03 19:19:03: afarr commented


Running with no debugging, the message is back again (oddly, a relief?)... but my "quick" test didn't show any sign of the messages for a good 45 minutes... in fact, again, they didn't start until I had some spinners...

2015-11-03 09:39:39,769 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-11-03 10:26:50,270 server is not responding, drawing spinners over the windows
2015-11-03 10:26:54,341 server is OK again
2015-11-03 10:43:17,456 Warning: long timer waiting time,
2015-11-03 10:43:17,456  UI thread polling waited 10.0 seconds longer than intended (10.5 vs 0.5)
2015-11-03 10:43:17,457 UI thread is now blocked
2015-11-03 10:43:17,457 UI thread is running again, resuming
2015-11-03 10:44:14,816 Warning: long timer waiting time,
2015-11-03 10:44:14,816  UI thread polling waited 7.1 seconds longer than intended (7.6 vs 0.5)
2015-11-03 10:44:14,816 UI thread is now blocked
2015-11-03 10:44:14,819 UI thread is running again, resuming
...

Once they start, they continue to repeat every 1-3 minutes, give or take. (I'll try a few more times as opportunities present themselves, see if this pattern continues.)

@totaam
Copy link
Collaborator Author

totaam commented Dec 5, 2015

2015-12-05 11:27:00: antoine commented


It would be interesting to see if having the debugging turned on helps with the sound stopping when left unattended for too long (#986?) - it might.

@totaam
Copy link
Collaborator Author

totaam commented Dec 9, 2015

2015-12-09 01:18:45: afarr commented


I actually have logs for a nine hour space of unattended video playing... sound seems to break whether or not the logs are running... and it looks like the UI thread gets blocked like this periodically throughout the whole period.

Of course, those tests were run a little while back. I'll try again with the latest, and see what I see. Since there was no sign of any impact, however, based on my attempts to grep through the logs for the UI thread blocking while searching for something to explain what was happening with the sound.

@totaam
Copy link
Collaborator Author

totaam commented Dec 11, 2015

2015-12-11 21:37:29: afarr commented


Interesting.

When I connect with -d sound on both server and client (0.16.0 r11304 osx client against 0.16.0 r11342 fedora 21 server)... I do get UI thread is now blocked messages when I leave the session unattended overnight (and the sleep handlers start kicking in, etc.) - but when I ran and left it idle playing sound during the day (periodically fondling it back awake when the screensaver tried to start), I didn't see any of the UI thread messages.

I will try to repeat with an r11206 osx client (noticed the r11304 and r11261 from your beta repo have no mp3 codecs, which is probably not related to the issue of sound stopping after several hours, #986, but I'll go ahead and run yet another test just to see... and comment in #986 as well).

@totaam
Copy link
Collaborator Author

totaam commented Jan 12, 2016

2016-01-12 19:16:48: afarr changed owner from afarr to antoine

@totaam
Copy link
Collaborator Author

totaam commented Jan 12, 2016

2016-01-12 19:16:48: afarr commented


Testing with osx 0.17.0 r11640 (with working vorbis) against a fedora 23 0.17.0 r11649 server (also with working vorbis, coincidentally)... I am still seeing UI thread messages when running semi-idle video tests with -d sound— but it takes about half an hour before they begin to appear.

There don't seem to be any ill effects, however. With vorbis, the sound lasts all night, despite approximately 8000 instances of the UI thread error over about a 15 hour session of running video while doing other things.

Seem to come as much as several times a minute after about half an hour, despite the energy saver settings set to not turn off the display until after an hour, so I don't think it's just a matter of sleep settings inducing them (I've seen them start after a spinner in many cases, not sure it hasn't been a bunch of coincidences though).

In any case, since the clients appear to always recover, and the messages don't even seem liable to be related to sound anymore... I'm inclined to close this as worksfor... us? Passing it back to you to think over (will also comment on the results of vorbis vs. mp3/wavpack in #986).

@totaam
Copy link
Collaborator Author

totaam commented Jan 13, 2016

2016-01-13 05:42:54: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Jan 13, 2016

2016-01-13 05:42:54: antoine set resolution to wontfix

@totaam
Copy link
Collaborator Author

totaam commented Jan 13, 2016

2016-01-13 05:42:54: antoine commented


Closing as "wontfix", we'll just ignore the warnings for now.

I am not removing them because of if we get other problems, they could be related and having these warnings nearby might be a useful clue.

@totaam totaam closed this as completed Jan 13, 2016
@totaam
Copy link
Collaborator Author

totaam commented Jun 16, 2016

2016-06-16 13:28:45: antoine commented


See also: #249#comment:15

@totaam
Copy link
Collaborator Author

totaam commented Sep 16, 2016

2016-09-16 05:37:12: antoine commented


It's also possible for OSX to sleep in between our checks for the slow UI thread, which would manifest itself with just this message:

Warning: long timer waiting time,
 UI thread polling waited 10120.2 seconds longer than intended (10120.7 vs 0.5)

As of r13749 we now also fire the "resume" callbacks in this case, even though we never fired the "fail" callbacks prior to that.
It will log an extra UI thread is running again, resuming.
This means that the display should be refreshed with a lossless update at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant