Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speaker forwarding may cause server to hang #669

Closed
totaam opened this issue Sep 4, 2014 · 49 comments
Closed

speaker forwarding may cause server to hang #669

totaam opened this issue Sep 4, 2014 · 49 comments
Labels

Comments

@totaam
Copy link
Collaborator

totaam commented Sep 4, 2014

Issue migrated from trac ticket # 669

component: sound | priority: critical | resolution: fixed

2014-09-04 10:42:32: onlyjob created the issue


Attaching/disconnecting with speaker forwarding sometimes leaves server in unresponsive state.
Here are fragments ofom server log taken with "-d sound":

Server started:

2014-09-04 19:10:39,969 xpra server version 0.14.4 (7467)
2014-09-04 19:10:39,969 running with pid 9207
E: [pulseaudio] pid.c: Daemon already running.
2014-09-04 19:10:40,012 child 'pulseaudio' with pid 9219 has terminated
2014-09-04 19:10:40,098 xpra is ready.
2014-09-04 19:10:41,965 Warning: pulseaudio has terminated. Either fix the pulseaudio command line or use --no-pulseaudio to avoid this warning.
2014-09-04 19:10:41,965  usually, only a single pulseaudio instance can be running for each user account, and one may be running already

Client attached and disconnected (1st time, all good):

2014-09-04 19:11:28,295 new_sound_buffer(<SoundSource object at 0x7f2fea99f7d0 (xpra+sound+src+SoundSource at 0x237fec0)>, 60, {'duration': 500000000L, 'timestamp': 285000
00000L}) source=<SoundSource object at 0x7f2fea99f7d0 (xpra+sound+src+SoundSource at 0x237fec0)>, suspended=False, sequence=-1
2014-09-04 19:11:28,793 new_sound_buffer(<SoundSource object at 0x7f2fea99f7d0 (xpra+sound+src+SoundSource at 0x237fec0)>, 60, {'duration': 500000000L, 'timestamp': 290000
00000L}) source=<SoundSource object at 0x7f2fea99f7d0 (xpra+sound+src+SoundSource at 0x237fec0)>, suspended=False, sequence=-1
2014-09-04 19:11:29,005 Connection lost
2014-09-04 19:11:29,006 stop_sending_sound() sound_source=<SoundSource object at 0x7f2fea99f7d0 (xpra+sound+src+SoundSource at 0x237fec0)>
2014-09-04 19:11:29,006 xpra client disconnected.
2014-09-04 19:11:29,006 stop_sending_sound_thread(())
2014-09-04 19:11:29,007 SoundPipeline.cleanup()
2014-09-04 19:11:29,007 SoundPipeline.stop()
2014-09-04 19:11:29,008 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from wavpackenc0 at 0x2659530>
2014-09-04 19:11:29,008 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from volume at 0x26594b0>
2014-09-04 19:11:29,008 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from audioconvert0 at 0x2659430>
2014-09-04 19:11:29,008 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from pulsesrc0 at 0x2659330>
2014-09-04 19:11:29,008 new-state=PAUSED
2014-09-04 19:11:29,009 SoundPipeline.stop() done
2014-09-04 19:11:29,009 SoundPipeline.cleanup() done
2014-09-04 19:11:29,009 stop_sending_sound_thread(()) done

Client attached and disconnected (2nd try, still good):

2014-09-04 19:11:47,894 new_sound_buffer(<SoundSource object at 0x7f2fea99f870 (xpra+sound+src+SoundSource at 0x7f2fa8002500)>, 60, {'duration': 500000000L, 'timestamp': 7
000000000L}) source=<SoundSource object at 0x7f2fea99f870 (xpra+sound+src+SoundSource at 0x7f2fa8002500)>, suspended=False, sequence=-1
2014-09-04 19:11:48,393 new_sound_buffer(<SoundSource object at 0x7f2fea99f870 (xpra+sound+src+SoundSource at 0x7f2fa8002500)>, 60, {'duration': 500000000L, 'timestamp': 7
500000000L}) source=<SoundSource object at 0x7f2fea99f870 (xpra+sound+src+SoundSource at 0x7f2fa8002500)>, suspended=False, sequence=-1
2014-09-04 19:11:48,841 client Protocol(SocketConnection(/home/hawk/.xpra/debstor-22)) has requested disconnection: exit on signal SIGINT
2014-09-04 19:11:48,841 Disconnecting client Protocol(SocketConnection(/home/hawk/.xpra/debstor-22)): client request
2014-09-04 19:11:48,842 stop_sending_sound() sound_source=<SoundSource object at 0x7f2fea99f870 (xpra+sound+src+SoundSource at 0x7f2fa8002500)>
2014-09-04 19:11:48,842 stop_sending_sound_thread(())
2014-09-04 19:11:48,842 SoundPipeline.cleanup()
2014-09-04 19:11:48,842 xpra client disconnected.
2014-09-04 19:11:48,843 SoundPipeline.stop()
2014-09-04 19:11:48,844 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from wavpackenc1 at 0x7f2fa4001ec0>
2014-09-04 19:11:48,844 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from volume at 0x7f2fa4001f40>
2014-09-04 19:11:48,844 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from audioconvert1 at 0x2674850>
2014-09-04 19:11:48,844 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_VOID_PENDING; from pulsesrc1 at 0x26748d0>
2014-09-04 19:11:48,845 new-state=PAUSED
2014-09-04 19:11:48,845 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_PLAYING, new-state=(GstState)GST_STATE_PAUSED, pending-state=(GstState)G
ST_STATE_READY; from sink at 0x26749d0>
2014-09-04 19:11:48,845 SoundPipeline.stop() done
2014-09-04 19:11:48,845 SoundPipeline.cleanup() done
2014-09-04 19:11:48,845 stop_sending_sound_thread(()) done

3rd attach, renders empty windows:

2014-09-04 19:11:58,224 New connection received: SocketConnection(/home/hawk/.xpra/debstor-22)
2014-09-04 19:12:02,211 Handshake complete; enabling connection
2014-09-04 19:12:02,215 pulseaudio id=, server=, sound decoders=['wavpack'], sound encoders=['wavpack', 'flac', 'wav', 'speex'], receive=True, send=True
2014-09-04 19:12:02,216 Python/Gtk2 Linux client version 0.14.4 connected from 'deblab' as 'hawk' ('hawk')
2014-09-04 19:12:02,216 client supplied an mmap_file: /tmp/xpra.xdWpza.mmap but we cannot find it
2014-09-04 19:12:02,216 using rgb as primary encoding, also available: h264, vp8, png, png/P, png/L, webp, rgb24, jpeg, rgb32
2014-09-04 19:12:02,216 client root window size is 4400x2560 with 1 displays:
2014-09-04 19:12:02,216   ':0.0' (1161x675 mm) workarea: 4400x2527 at 0x33
2014-09-04 19:12:02,216     DisplayPort-1 1280x1024 at 0x1536 (360x270 mm)
2014-09-04 19:12:02,217     DVI-0 1440x2560 at 1280x0 (597x336 mm)
2014-09-04 19:12:02,217     DisplayPort-0 1680x1050 at 2720x1510 (474x296 mm)
2014-09-04 19:12:02,217 best resolution matching 4400x2560 is unchanged: 5120x3200
2014-09-04 19:12:02,219 setting key repeat rate from client: 660ms delay / 40ms interval
2014-09-04 19:12:02,221 keyboard mapping already configured (skipped)
2014-09-04 19:12:02,232 pactl_output('list',) returned 0
2014-09-04 19:12:02,242 pactl_output('list',) returned 0
2014-09-04 19:12:02,252 pactl_output('list',) returned 0
2014-09-04 19:12:02,264 pactl_output('list',) returned 0
2014-09-04 19:12:02,314 sound_control(start, ('wavpack',))
2014-09-04 19:12:02,314 start_sending_sound(wavpack)
2014-09-04 19:12:02,314 using sound codec wavpack
2014-09-04 19:12:02,327 pactl_output('stat',) returned 0
2014-09-04 19:12:02,327 get_pactl_stat_line(Server String:)=/tmp/pulse-8XiBnTK3trYQ/native
2014-09-04 19:12:02,328 start sound, remote pulseaudio server=, local pulseaudio server=/tmp/pulse-8XiBnTK3trYQ/native
2014-09-04 19:12:02,328 start sound, client id=, server id=
2014-09-04 19:12:02,341 pactl_output('list',) returned 0
2014-09-04 19:12:02,341 found pulseaudio monitor devices: {'auto_null.monitor': 'Monitor of Dummy Output'}
2014-09-04 19:12:02,353 pactl_output('set-source-mute', 'auto_null.monitor', '0') returned 0
2014-09-04 19:12:02,354 set_source_mute: output=
2014-09-04 19:12:02,354 pipeline elements=['pulsesrc device=auto_null.monitor', 'audioconvert', 'volume name=volume volume=1.0', 'wavpackenc', None, 'appsink name=sink']
2014-09-04 19:12:02,354 pipeline=pulsesrc device=auto_null.monitor ! audioconvert ! volume name=volume volume=1.0 ! wavpackenc ! appsink name=sink
2014-09-04 19:12:02,357 starting sound capture using pulseaudio device: Monitor of Dummy Output
2014-09-04 19:12:02,357 start_sending_sound() sound source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>
2014-09-04 19:12:02,357 SoundPipeline.start()
2014-09-04 19:12:02,362 SoundPipeline.start() done
2014-09-04 19:12:02,363 state changed: <gst.Message GstMessageState, old-state=(GstState)GST_STATE_NULL, new-state=(GstState)GST_STATE_READY, pending-state=(GstState)GST_S
TATE_VOID_PENDING; from sink at 0x26749d0>
2014-09-04 19:12:05,188 new preroll0: 112 bytes
2014-09-04 19:12:05,189 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 112, {'duration': 500000000L, 'timestamp': 
0L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:05,189 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 112, {'duration': 500000000L, 'timestamp': 
0L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:05,688 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
00000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:06,186 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 1
000000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:06,685 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 1
500000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1

After disconnect server keeps logging:

2014-09-04 19:12:57,173 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
2000000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:57,682 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
2500000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:58,181 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
3000000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:58,680 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
3500000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:59,178 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
4000000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:12:59,677 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
4500000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:13:00,176 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
5000000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1
2014-09-04 19:13:00,674 new_sound_buffer(<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, 60, {'duration': 500000000L, 'timestamp': 5
5500000000L}) source=<SoundSource object at 0x7f2fe822c6e0 (xpra+sound+src+SoundSource at 0x7f2fdc0021a0)>, suspended=False, sequence=-1

At this point server do not respond to new connections so I terminate it with "kill -9".

Workaround: attach with --no-speaker.

This is a very old issue which I can reproduce easily with 0.14.4.

@totaam
Copy link
Collaborator Author

totaam commented Sep 5, 2014

Can you check if maybe we're leaking something every time we start + stop sound? Maybe run it without limits, and / or even as root to try?
This would cause us to hit some limit eventually (file descriptors?), which could explain why some things still run (new_sound_buffer) but others become unresponsive.

@totaam
Copy link
Collaborator Author

totaam commented Sep 23, 2014

See also: #663, #400#comment:8.

@totaam
Copy link
Collaborator Author

totaam commented Sep 23, 2014

2014-09-23 14:34:29: totaam uploaded file clipboard-loop.txt (86.2 KiB)

backtrace from a deep clipboard loop

@totaam
Copy link
Collaborator Author

totaam commented Sep 23, 2014

Well, well, maybe I can reproduce this after all.
I was investigating something completely different and I think I managed to reproduce it semi reliably with:

  • linux laptop with normal cinnamon session, start shadow:
xpra shadow :0 --bind-tcp=0.0.0.0:PORT
  • connect to it from a ms windows xp vm:
xpra_cmd.exe attach tcp:HOST:PORT

The trick here was to also have synergy syncing the clipboard between the laptop and the host running the xp vm, as well as virtualbox doing more clipboard synchronization between the host and the xp vm, then finally we have xpra doing clipboard sync between the laptop and xp... full circle.

At one point, I saw 90+ kworker threads in my process list - they eventually cleared themselves.

But I also managed to lockup the laptop's display, requiring a reboot (killing X11 would have done it too I suppose).

Interestingly, killing a stuck server made it spit out:

2014-09-23 20:30:43,823 xpra client disconnected.
2014-09-23 20:30:43,827 loop nesting too deep: 10
2014-09-23 20:30:43,827 you may have a clipboard forwarding loop, disabling the clipboard
2014-09-23 20:30:43,828 New connection received: SocketConnection(('192.168.42.103', 10000) - ('192.168.42.195', 1316))
2014-09-23 20:30:43,829 New connection received: SocketConnection(/home/antoine/.xpra/laptop-0)
2014-09-23 20:30:43,830 New connection received: SocketConnection(/home/antoine/.xpra/laptop-0)
2014-09-23 20:30:43,831 New connection received: SocketConnection(/home/antoine/.xpra/laptop-0)
2014-09-23 20:30:43,831 New connection received: SocketConnection(/home/antoine/.xpra/laptop-0)
2014-09-23 20:30:43,832 New connection received: SocketConnection(/home/antoine/.xpra/laptop-0)
2014-09-23 20:30:43,833 New connection received: SocketConnection(/home/antoine/.xpra/laptop-0)
2014-09-23 20:30:43,849 client Protocol(SocketConnection(('192.168.42.103', 10000) - ('192.168.42.195', 1315))) has requested disconnection: exit on signal SIGINT
2014-09-23 20:30:43,849 Disconnecting client Protocol(SocketConnection(('192.168.42.103', 10000) - ('192.168.42.195', 1315))): client request
2014-09-23 20:30:43,850 internal error: read connection SocketConnection(('192.168.42.103', 10000) - ('192.168.42.195', 1316)) reset: [Errno 104] Connection reset by peer
2014-09-23 20:30:43,856 Handshake complete; enabling connection
2014-09-23 20:30:43,857 Python/Gtk2 Microsoft Windows XP client version 0.15.0 connected from 'XP-Pro-Dev-VBOX' as 'XPPro' ('XPPro')
2014-09-23 20:30:43,857 using h264 as primary encoding, also available: vp8, png, png/P, png/L, webp, rgb24, jpeg, rgb32
2014-09-23 20:30:43,857 client root window size is 1575x1177
2014-09-23 20:30:43,858 shadow server: setting default keymap translation
2014-09-23 20:30:43,907 Connection lost
2014-09-23 20:30:43,907 Connection lost
2014-09-23 20:30:43,907 Connection lost
2014-09-23 20:30:43,907 Connection lost
2014-09-23 20:30:43,907 Connection lost
2014-09-23 20:30:43,907 Connection lost

@onlyjob: can you reproduce your problem with clipboard disabled? (I cannot so far)

@totaam
Copy link
Collaborator Author

totaam commented Sep 23, 2014

I've now connected well over a hundred times (scripted detach, with attach in a while true loop). No problem as long as the clipboard is disabled.
I think that sound just makes it worse as it relies on the main loop to be running to get the data out of gstreamer + pulseaudio in time. But maybe there are other ways of getting the main loop stuck than the clipboard?

Maybe I made things worse with r7751? As it will cause a timeout on the main loop waiting for the reply to the message that causes us to stop the clipboard. I don't think so, as the main loop should still be running, just nested. But maybe this nesting doesn't play well with gstreamer?
Worst case scenario: we have to rewrite the ugly clipboard nested loop code, or move the sound code to its own process (using multiprocessing).

@onlyjob: if you can reproduce this problem with the clipboard disabled, please provide a gdb backtrace.

@totaam
Copy link
Collaborator Author

totaam commented Oct 2, 2014

2014-10-02 18:14:29: onlyjob commented


That's right, I can't reproduce when attaching with --no-clipboard.

@totaam
Copy link
Collaborator Author

totaam commented Oct 17, 2014

I have finally managed to reproduce this one. I had to kill -9 the server and use xpra start --use-display to recover the session.

It took a laptop running the Fedora 21 alpha as server, and just my regular Fedora 20 client. Just a shame I am going away (hence the laptop upgrade)... this will have to wait until I get back.

@totaam
Copy link
Collaborator Author

totaam commented Oct 27, 2014

I have an idea: until we rewrite the clipboard using native calls (which would be nice as it would get rid of the ugly nested loop hack - but this is going to take a long time), we can move the sound code to its own process using multiprocessing - which means it will get its own main loop, independent of the clipboard code. It is Python 2.6+ only, which is fine for v0.15+.
We can push sound data using a multiprocessing queue, and to get the events back out maybe we have to keep a thread around, just waiting for the out queue.
It also means that this process will be less subjected to latency issues found in the main process, so less likely to get overruns.

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2014

2014-11-19 04:04:30: onlyjob commented


I love this idea. :) Debian dropped Python-2.6 and 2.7 is in Wheezy for a while so I believe it is worth abandoning support for 2.6 in order to benefit from multiprocessing.

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2014

multiprocessing is available since python 2.6

Unfortunately, it just doesn't work with pygst, which relies on the gobject main loop and we cannot run two copies of the main loop...

The only solution that I can see would be to use a distinct subprocess that we spawn and talk to using pipes: we could use its stdin to send it messages (set_volume and stop? anything else? stop can be done with sigint) and get the compressed sound data from its stdout? Better yet would be to use mmap and re-use our video mmap code.

@totaam
Copy link
Collaborator Author

totaam commented Feb 16, 2015

2015-02-16 07:08:28: antoine uploaded file sound-separate-process.patch (11.6 KiB)

moves the sound capture / play to a separate process - hackish

@totaam
Copy link
Collaborator Author

totaam commented Feb 16, 2015

The patch above is very hackish and would need cleaning up. It only handles sound capture via a separate process, not playback.

It doesn't handle any message exchange, we just assume the data is the compressed sound buffers.. (don't want to add the network layer there if it can be avoided)

It doesn't handle mmap transfers which would cave us a lot of unnecessary events and process context switching, if only we had an easy way of signalling the parent that some data is available without writing to the pipe. Can't see how.

That said, it works very well and start + stop are very reliable.

@totaam
Copy link
Collaborator Author

totaam commented Mar 10, 2015

Things left to do:

  • cleanups
  • get list of supported codecs by execing the helper? (and allow python3 to be used..)
  • hangle get_info with timer + receiver caching
  • binary files on win32 need special coding?
  • use child reaper?
  • implement speaker side
  • remove / replace / fix the win32 executables?

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2015

Latest status: the win32 client seems to suffer from clock drift. But maybe this isn't a gstreamer pipeline clock drift but rather a system clock problem?? (added the "time" metadata attribute to see how this changes over time)

Notes:

  • we forward the volume calls using ints, so we use the 0 to 100 range instead of 0.0 to 1.0 (which is a little confusing.. maybe we should just expose 0 to 100 only?)
  • we send the "info" data every second, and use this to pass back values like "volume", "codec description", etc..
  • the "time" metadata attribute is only used for debugging

Latest TODO list:

  • underruns: seems that the pipeline stops in some cases if we get too many underruns... (need a reliable way of dealing with this)
  • got a sound codec change not supported! (from flac to wavpack) - not sure why! (disabled wav to get going - will need fixing properly)
  • verify start / stop from tray, from xpra control, etc
  • verify error handling (ie: killing sound process) - looks OK

Later:

  • support gstreamer 1.0 via python3 with a python2 client (implies checking for codec availability probing via exec)
  • also split the wrapper into a superclass and a subclass for each of sink and src
  • prioritize sound packets in the network layer so we can jump the queue
  • maybe handle wav + lz4 via network layer (keep track of which container formats are actually compressed in the CODEC_OPTIONS data, then compress those that aren't in the network thread of the sound process) - because lz4 is fast! (but only lz4 or lzo, we don't want to use zlib on sound data)
  • support mmap transfers: move the mmap code to the protocol layer rather than having it grafted on top
  • pass-through to client (for mmap or UDP..)
  • sound improvements - better queue handling, refactoring and cleanups #400: encoder options, stereo vs 5.1, quality tuning
  • volume control somewhere

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2015

Finally applied in r8786. See long commit message for details.
(only very minor differences since v11 patch)

Some new environment variables (which should not be needed - I hope):

  • XPRA_SOUND_INFO_TIME allows us to specify how often we export the pipeline info from the subprocess back to the main xpra process, default is 1000 (every second)
  • XPRA_SOUND_SUBPROCESS_DEBUG can be used to enable debugging flags in the subprocess exclusively (ie: XPRA_SOUND_SUBPROCESS_DEBUG=sound,win32,util xpra attach ..)
  • XPRA_SOUND_DEBUG=1 enables verbose debugging in hot paths (which is why it isn't a standard -d sound debugging option)

@afarr: what needs testing:

  • start and stop sound (we should not be leaving any zombies behind on Linux), should work reliably on all platforms
  • killing the sound process should also trigger the change in status so the sound can be restarted from the tray (you can find the process using ps -ef | grep _sound, or as of r8787 the pid will be shown in session-info features)
  • underruns and overruns (see update to sound improvements - better queue handling, refactoring and cleanups #400)

Please re-assign to me so that I can try to close the remaining issues (as per comment:13 - though many are likely to be re-scheduled unless critical)

@totaam
Copy link
Collaborator Author

totaam commented Mar 18, 2015

@afarr: as of r8793, we can tune the minimum queue time (the amount of data we buffer before we start feeding the sound card) as well as the overall buffer size.

ie:

XPRA_SOUND_QUEUE_TIME=300 XPRA_SOUND_QUEUE_MINTIME=100 xpra attach -d sound

shows:

pipeline elements=['appsrc name=src max-bytes=32768 emit-signals=0 block=0 is-live=0 stream-type=stream format=4', \
    'mp3parse', 'mad', 'audioconvert', 'audioresample', 'volume name=volume volume=1.0', \
    'queue name=queue min-threshold-time=100000000 max-size-buffers=0 max-size-bytes=0 max-size-time=300000000 leaky=2', \
    'autoaudiosink async=True qos=True sync=False']

I don't think we want to reduce the buffer size too much, or we will get more overruns. But maybe a little? (better to get overruns than having the sound go too far out of sync)
To see the level of the sound buffer, you can

  • look at session-info (updated every second)
  • run with -d sound (which is better because it will also show underruns and overruns as events)

We should be able to reduce the min queue size. The default was to use half the buffer size, which was 225ms. r8794 reduces this to just a quarter, 112ms. This should be enough so that we don't get underruns (which can cause some crackling), without getting too much out of sync with the picture.

Note: a great idea that Lonnie has had: we should be able to see what range of values the queue buffer is using and self tune it.
ie: if the queue has always between 200ms and 250ms worth of sound data in it, then we don't need such a large buffer: having just 150ms total and 50ms minimum would be enough to have 50ms safety on each side. Then we also need to take the video latency into account to try to sync things up... but at least we then know the range (or target?) we can aim for.

@totaam
Copy link
Collaborator Author

totaam commented Mar 20, 2015

r8805 makes the code much more generic and robust.
This shouldn't affect things (and I've tested it on multiple platforms, and it adds a unit test which should help us catch regressions in the future), but I have to mention it.

PS: the idea of prioritizing sound packets was a bad one, they are already queued up ahead of the pixel data, and there is very little else anyway - so no changes needed in that area.

See also: #790#comment:20 which seems to have a way of causing sound restarts.

@totaam
Copy link
Collaborator Author

totaam commented Mar 23, 2015

  • some minor updates and cleanups in r8820
  • improved OSX support: ensure we don't create a dock icon for the sound process (sounds easy? it should be, but it was not): done in r8822. More info on this:
  • using the same wrapper for the xpra binary is big no-no, we have to use one of those really horrible OSX nested applications hacks (called Xpra_NoDock.app and found in Xpra.app/Contents)
  • instantiating gtkosx_application, just to access paths information, loads up the dock - no idea why
  • execing another process (like checking for pulseaudio) or probing the display server to read a property also does..

Can you please test on various OSX versions and let me know how it goes?
I am seeing lots of overruns on my 10.5 VM, but not on my 10.6 one.
We can always tune the queue size if we find that it overruns too much (as suggested in comment:15): I think it is better to have sound that is out-of-sync by up to a second rather than having restarts every 2 seconds.. Testing will tell.

@totaam
Copy link
Collaborator Author

totaam commented Mar 23, 2015

If you have problems with stuttering sound and underruns, see #400#comment:15

@totaam
Copy link
Collaborator Author

totaam commented Mar 30, 2015

Looks like something broke py3k (#640) support along the way, yay.
I had to commit some minor fixes in r8864.
It does spawn the new sound process, and the pipeline looks correct, but nothing happens:

2015-03-30 15:37:54,409 initialized CODECS:
2015-03-30 15:37:54,409 * wavpack : ('wavpackenc', None, 'wavpackdec', 'wavpackparse')
2015-03-30 15:37:54,409 * wav : ('wavenc', None, None, 'wavparse')
2015-03-30 15:37:54,409 * flac : ('flacenc', 'oggmux', 'flacdec', 'oggdemux')
2015-03-30 15:37:54,409 * speex : ('speexenc', 'oggmux', 'speexdec', 'oggdemux')
2015-03-30 15:37:54,412 is_pa_installed() pactl_bin=
2015-03-30 15:37:54,412 SoundSink(..) found matching codecs ['flac']
2015-03-30 15:37:54,413 pipeline elements=['appsrc name=src max-bytes=32768 emit-signals=0 block=0 is-live=0 stream-type=stream format=4', \
    'oggdemux', 'flacdec', 'audioconvert', 'audioresample', 'volume name=volume volume=1.0', \
    'queue name=queue min-threshold-time=112000000 max-size-buffers=0 max-size-bytes=0 max-size-time=450000000 leaky=2', \
    'autoaudiosink sync=False async=True qos=True']
2015-03-30 15:37:54,413 pipeline=appsrc name=src max-bytes=32768 emit-signals=0 block=0 is-live=0 stream-type=stream format=4 ! \
    oggdemux ! flacdec ! audioconvert ! audioresample ! volume name=volume volume=1.0 ! \
    queue name=queue min-threshold-time=112000000 max-size-buffers=0 max-size-bytes=0 max-size-time=450000000 leaky=2 ! \
    autoaudiosink sync=False async=True qos=True
2015-03-30 15:37:54,422 SoundPipeline.start()
2015-03-30 15:37:54,427 sound sink queue underrun: level=0
2015-03-30 15:37:54,428 SoundPipeline.start() done
2015-03-30 15:37:54,429 state changed: <GstMessage at 0x1728440>
2015-03-30 15:37:54,429 state changed: <GstMessage at 0x17284c0>
2015-03-30 15:37:54,429 state changed: <GstMessage at 0x1728540>
2015-03-30 15:37:54,429 state changed: <GstMessage at 0x17285c0>
2015-03-30 15:37:54,429 state changed: <GstMessage at 0x1728640>
2015-03-30 15:37:54,430 state changed: <GstMessage at 0x17286c0>
2015-03-30 15:37:54,430 state changed: <GstMessage at 0x1728740>
2015-03-30 15:37:54,430 state changed: <GstMessage at 0x17283c0>
2015-03-30 15:37:54,430 new-state=READY
2015-03-30 15:37:54,430 stream status: <GstMessage at 0x17561c0>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x1756240>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x17562c0>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x1756340>
2015-03-30 15:37:54,431 stream status: <GstMessage at 0x1756640>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x17563c0>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x1756440>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x17564c0>
2015-03-30 15:37:54,431 stream status: <GstMessage at 0x1756540>
2015-03-30 15:37:54,431 state changed: <GstMessage at 0x17565c0>
2015-03-30 15:37:54,432 stream status: <GstMessage at 0x7f5f60002480>
2015-03-30 15:37:54,432 sound_sink_state_changed(sink_subprocess_wrapper(<subprocess.Popen object at 0x7f58d4145f10>), b'ready') \
    on_sink_ready=<function UIXpraClient.start_receiving_sound.<locals>.sink_ready at 0x7f58d47cbef0>

But this won't hold up the release, the GTK3 port is not in a great state anyway.

@totaam
Copy link
Collaborator Author

totaam commented Mar 30, 2015

As of r8870, we lower the min queue time to just 50ms by default.

@totaam
Copy link
Collaborator Author

totaam commented Mar 31, 2015

@onlyjob: thinking of releasing 0.15.x quite soon, now would be a good time to shout if you still have problems... or wait another 6 months!

@totaam
Copy link
Collaborator Author

totaam commented Mar 31, 2015

2015-03-31 11:46:51: onlyjob commented


I would be glad to test but I may not be able to do it for few weeks...
Too many things on my plate at the moment... Give at least a week or two and hopefully I'll be back with you with a feedback. :)

@totaam
Copy link
Collaborator Author

totaam commented Apr 1, 2015

2015-04-01 00:47:37: afarr commented


Some testing with win/osx 0.15.0 r8872 clients against a fedora 20 0.15.0 r8872 server, turning the speakers on/off with the tray/menu controls works reliably.

Playing with the XPRA_SOUND_QUEUE_TIME & XPRA_SOUND_QUEUE_MINTIME I discovered that, with the settings you mentioned in the example above (TIME=300 & MINTIME=100) I was able to cause some speaker mischief by mousing quickly over assorted link widgets (youtube right side tile menu of suggested videos was particularly useful)... the lower the value the TIME was set to, the easier it was to not only cause overruns, but to then disable the speakers entirely (both osx and window clients).

-d sound logging, just FYI, looks like this:

...

2015-03-31 15:16:11,540 sound sink: pushed 417 bytes, new buffer level: 444ms
2015-03-31 15:16:11,541 add_data(522 bytes, {'duration': 26122449, 'timestamp': 14315102040L, 'caps': 'audio/mpeg, mpegversion=(int)1, mpegaudioversion=(int)1, layer=(int)3, channels=(int)2, rate=(int)44100', 'sequence': 2, 'time': 1427840171322L}) queue_state=pushing, src=/GstPipeline:pipeline0/GstAppSrc:src (__main__.GstAppSrc)
2015-03-31 15:16:11,541 sound sink: pushed 522 bytes, new buffer level: 470ms
2015-03-31 15:16:11,541 sound sink queue overrun: level=470
2015-03-31 15:16:11,542 re-starting speaker because of overrun
2015-03-31 15:16:11,542 stop_receiving_sound() sound sink=sink_subprocess_wrapper(<subprocess.Popen object at 0x7dbda90>)
2015-03-31 15:16:11,542 add_data(835 bytes, {'duration': 26122449, 'timestamp': 14341224489L, 'caps': 'audio/mpeg, mpegversion=(int)1, mpegaudioversion=(int)1, layer=(int)3, channels=(int)2, rate=(int)44100', 'sequence': 2, 'time': 1427840171354L}) queue_state=overrun, src=/GstPipeline:pipeline0/GstAppSrc:src (__main__.GstAppSrc)
2015-03-31 15:16:11,543 dropping new data because of overrun: 100%
2015-03-31 15:16:11,543 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,543 internal error: write connection TwoFileConnection() reset: [Errno 32] Broken pipe
2015-03-31 15:16:11,543 the sound sink process has stopped (())
2015-03-31 15:16:11,543 stop_receiving_sound() sound sink=None
2015-03-31 15:16:11,543 SoundPipeline.stop()
2015-03-31 15:16:11,544 speaker is now disabled - dropping packet
2015-03-31 15:16:11,545 speaker is now disabled - dropping packet
2015-03-31 15:16:11,550 speaker is now disabled - dropping packet
2015-03-31 15:16:11,551 speaker is now disabled - dropping packet
2015-03-31 15:16:11,556 speaker is now disabled - dropping packet
2015-03-31 15:16:11,556 speaker is now disabled - dropping packet
2015-03-31 15:16:11,559 speaker is now disabled - dropping packet
2015-03-31 15:16:11,563 speaker is now disabled - dropping packet
2015-03-31 15:16:11,564 speaker is now disabled - dropping packet
2015-03-31 15:16:11,565 speaker is now disabled - dropping packet
2015-03-31 15:16:11,570 SoundPipeline.stop() done
2015-03-31 15:16:11,746 restart() sound_sink=None, codec=mp3, server_sound_sequence=True
2015-03-31 15:16:11,746 send_new_sound_sequence() sequence=3
2015-03-31 15:16:11,746 start_receiving_sound() sound sink=None
2015-03-31 15:16:11,746 start_sound_sink(mp3)
2015-03-31 15:16:11,747 starting mp3 sound sink
2015-03-31 15:16:11,747 start_receiving_sound(mp3)
2015-03-31 15:16:11,761 mp3 sound sink started
2015-03-31 15:16:11,912 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,914 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,914 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,915 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,915 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,915 ignoring sound data with old sequence number 2
2015-03-31 15:16:11,916 ignoring sound data with old sequence number 2

...

Eventually the speakers don't restart without using a tray/menu link.

2015-03-31 15:16:12,012 sound_sink_state_changed(sink_subprocess_wrapper(<subprocess.Popen object at 0x7dbddd0>), ready) on_sink_ready=<function sink_ready at 0xd4d9cb0>
2015-03-31 15:16:12,012 sink_ready(()) codec=mp3
2015-03-31 15:16:12,017 server sent end-of-stream, closing sound pipeline
2015-03-31 15:16:12,018 stop_receiving_sound() sound sink=sink_subprocess_wrapper(<subprocess.Popen object at 0x7dbddd0>)
2015-03-31 15:16:12,019 SoundPipeline.stop()
2015-03-31 15:16:12,020 SoundPipeline.stop() done
2015-03-31 15:16:12,292 the sound sink process has stopped (())
2015-03-31 15:16:12,293 stop_receiving_sound() sound sink=None
2015-03-31 15:16:12,297 speaker is now disabled - dropping packet

With low TIME settings starting sound in a second tab (at the same time as some first tab) will also cause stuttering and make it even easier to disable speakers.

Testing and tweaking, it looks like XPRA_SOUND_QUEUE_TIME=500 and the now default XPRA_SOUND_QUEUE_MINTIME=50 makes it nearly impossible to disable the speakers with this trick.

Seems very stable otherwise.

@totaam
Copy link
Collaborator Author

totaam commented Apr 3, 2015

Some minor improvements and fixes in r8905, r8906, r8907, r8908.
Note: this may actually make things "worse" in the sense that the overruns are now more likely to cause the sound to stop and not restart. I needed to fix that properly anyway.
I still cannot figure out why we get the autorelease warnings (see #533#comment:71) - maybe because we tear down the pipeline without giving it enough time to do it cleanly?

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2015

more minor updates and refactorings: r8951, r8952, r8953.
r8954 should also make the sound stop "cleaner" / more graceful: we try to close the pipeline properly (stopping it then destroying the objects) before killing the process.

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2015

r8956 makes it easier to test overruns without changing the queue settings, just start the client with:

XPRA_FAKE_OVERRUN=N xpra attach tcp:192.168.42.100:10000 

Where N is the number of seconds to wait before faking an overrun which restarts the sound pipeline.

Using this patch and printing the call stack from stop_receiving_sound, I can see that we call it from the sound_sink_overrun callback as we should, but also from sound_process_stopped and we end up killing the new process and telling the server to stop sending sound data.. r8955 fixes that.

I guess we could also restart the sound when the process dies, but this should never happen, so the log warning message should be enough for now.

I am also using the same XPRA_FAKE_OVERRUN=2 env var to trigger lots of restarts on win32, so far so good. No crashes, restarts cleanly everytime.

@afarr: looks ready for more testing. Can you break it again somehow?

@totaam
Copy link
Collaborator Author

totaam commented Apr 8, 2015

2015-04-08 02:29:45: afarr commented


Well... I couldn't get it to break, using a win32 0.15.0 8957 client against a fedora 20 0.15.0 8957 server, but setting the client with XPRA_FAKE_OVERRUN=7 I am noticing that the speaker overruns do cause around a half-second break before the speakers re-start (might just be an issue of the fake overruns in contrast with the "genuine" ones that usually have a much less noticeable "break" in the sound during the re-start process).

The speaker restarts were dependable though.

I did run into a couple of tracebacks server-side though.

2015-04-07 18:15:42,985 sound-source codec: MPEG-1 Layer 3 (MP3)
Exception in thread parse (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  File "/usr/lib64/python2.7/threading.py", line 764, in run
  File "/usr/lib64/python2.7/site-packages/xpra/net/protocol.py", line 592, in 

_read_parse_thread_loop
  File "/usr/lib64/python2.7/site-packages/xpra/net/protocol.py", line 552, in 

_internal_error
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'error'
2015-04-07 18:19:20,947 error processing callback <bound method ServerSource.new_stream of 

<xpra.server.source.ServerSource object at 0x7fe8e40255d0>> for new-stream packet
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/net/subprocess_wrapper.py", line 343, in 

process_packet
    cb(self, *all_args)
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 798, in new_stream
    self.sound_source.codec = codec
AttributeError: 'NoneType' object has no attribute 'codec'
2015-04-07 18:19:21,002 using Pulseaudio device 'Monitor of Null Output'
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
2015-04-07 18:19:21,406 sound-source codec: MPEG-1 Layer 3 (MP3)
2015-04-07 18:19:22,017 using Pulseaudio device 'Monitor of Null Output'
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
2015-04-07 18:19:22,279 using Pulseaudio device 'Monitor of Null Output'
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
2015-04-07 18:19:22,495 error setting up sound: a sound source already exists
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 775, in 

start_sending_sound
    assert self.sound_source is None, "a sound source already exists"
AssertionError: a sound source already exists

@totaam
Copy link
Collaborator Author

totaam commented Apr 8, 2015

The speaker restarts were dependable though.

That's the most important thing.

I am noticing that the speaker overruns do cause around a half-second break before the speakers re-start

Hopefully, there won't be too many restarts then.

We do buffer a tiny bit more than before (the min queue thing), we wait before the pipeline is actually ready before throwing data at it.. (rather than hoping for the best and filling up the queue too quickly, which is what we did before) which does mean that this delay will actually be worse over high latency links.

I am working on making the restarts faster (which we should be able to do now in a different way: we don't have to wait for the previous sound process to terminate to start a new one - as long as it does terminate eventually...). The problem is that restarts tell the server to stop and start again, we get the "end-of-stream" message and that kills the new pipeline rather the old one! Will fix.

As for the stacktraces above (thanks for posting them), they are caused by us killing the sound process before it has a chance of exiting cleanly - I will try to see if I can just raise the kill delay to allow for a more graceful exit.

Taking the ticket back to deal with those small issues.

@totaam
Copy link
Collaborator Author

totaam commented Apr 8, 2015

Lots more tweaks and subtle fixes in r8958, r8959, r8960, r8961, r8962, 8965, r8966. Some of which should be backported (started in 8963 + 8964 - more needed).

Problem is that I get deadlocks on OSX after restarts, sometimes very quickly, sometimes after 20+ restarts. Not sure what causes it, but the UI thread is deadlocked:

2015-04-08 13:37:06,710 sound-sink pushed 522 bytes, new buffer level: 0ms
2015-04-08 13:37:06,711 sound-sink add_data(417 bytes, {'duration': 26122449, 'timestamp': 574693877, 'caps': 'audio/mpeg, mpegversion=(int)1, mpegaudioversion=(int)1, layer=(int)3, channels=(int)2, rate=(int)44100', 'sequence': 38, 'time': 1428496626675L}) queue_state=underrun, src=/GstPipeline:pipeline0/GstAppSrc:src (__main__.GstAppSrc)
2015-04-08 13:37:06,711 sound-sink pushed 417 bytes, new buffer level: 26ms
2015-04-08 13:37:06,712 sound-sink queue running: level=52
2015-04-08 13:37:06,712 sound-sink queue pushing: level=52
2015-04-08 13:37:06,768 sound-sink queue underrun: level=0
2015-04-08 13:37:06,775 poll_UI_loop() last_UI_thread_time was 1.5 seconds ago (max 1000), UI_blocked=False
2015-04-08 13:37:06,775 UI thread is now blocked
2015-04-08 13:37:06,779 check_server_echo(0) last=True, server_ok=True

(..)

2015-04-08 13:37:07,902 sound-sink pushed 417 bytes, new buffer level: 470ms
2015-04-08 13:37:07,902 sound-sink queue overrun: level=470
2015-04-08 13:37:07,903 process_packet callbacks(overrun)=[(<bound method XpraClient.sound_sink_overrun of gtk2.client>, [])]
2015-04-08 13:37:07,903 re-starting speaker because of overrun
2015-04-08 13:37:07,903 stop_receiving_sound(True) sound sink=sink_subprocess_wrapper(<subprocess.Popen object at 0x19586890>)
2015-04-08 13:37:07,903 stop_receiving_sound(True) calling <bound method sink_subprocess_wrapper.cleanup of sink_subprocess_wrapper(<subprocess.Popen object at 0x19586890>)>
2015-04-08 13:37:07,903 cleanup() sending request to cleanup
2015-04-08 13:37:07.904 Xpra[98954:5bfb] *** _NSAutoreleaseNoPool(): Object 0xc900d0 of class NSCFNumber autoreleased with no pool in place - just leaking
Stack: (0x1d5df4f 0x1c6a432 0x64b09c1 0x5b5c387 0x5b5cf07 0x1ac5ff7 0x1ac21ce 0x1add099 0x1adc5ea 0x1adca35 0x1ac894d 0x1ac8d35 0x1ac8ca1 0x5bc47b4 0x1ac4f88 0x1ac2829 0x1ac21ce 0x1adce0f 0x1adc5ea 0x1adca35 0x5c9aac9 0x5bc414e 0x79083ba 0x2b1b80 0x2b431c 0x22d9cb 0x1fba95 0x2aa5ce 0x1fbb40 0x7897684 0x1ac21ce 0x1add099 0x1adb7c7 0x7890484 0x2b1b80 0x2b431c 0x2b20ab 0x2b431c 0x22d9cb 0x1fba95 0x2aea26 0x2b3a9f 0x2b431c 0x2b20ab 0x2b431c 0x22dad2 0x1fba95 0x2aea26 0x2b3a9f 0x2b3a9f 0x2b431c 0x22d9cb 0x1fba95 0x20d29e 0x1fba95 0x2aa5ce 0x2f37ec 0x412055 0x411f12)
2015-04-08 13:37:07.905 Xpra[98954:5bfb] *** _NSAutoreleaseNoPool(): Object 0x116aaae0 of class NSCFDictionary autoreleased with no pool in place - just leaking
Stack: (0x1d5df4f 0x1c6a432 0x6c01be 0x64b09e9 0x5b5c387 0x5b5cf07 0x1ac5ff7 0x1ac21ce 0x1add099 0x1adc5ea 0x1adca35 0x1ac894d 0x1ac8d35 0x1ac8ca1 0x5bc47b4 0x1ac4f88 0x1ac2829 0x1ac21ce 0x1adce0f 0x1adc5ea 0x1adca35 0x5c9aac9 0x5bc414e 0x79083ba 0x2b1b80 0x2b431c 0x22d9cb 0x1fba95 0x2aa5ce 0x1fbb40 0x7897684 0x1ac21ce 0x1add099 0x1adb7c7 0x7890484 0x2b1b80 0x2b431c 0x2b20ab 0x2b431c 0x22d9cb 0x1fba95 0x2aea26 0x2b3a9f 0x2b431c 0x2b20ab 0x2b431c 0x22dad2 0x1fba95 0x2aea26 0x2b3a9f 0x2b3a9f 0x2b431c 0x22d9cb 0x1fba95 0x20d29e 0x1fba95 0x2aa5ce 0x2f37ec 0x412055 0x411f12)
2015-04-08 13:37:07.907 Xpra[98954:5bfb] *** _NSAutoreleaseNoPool(): Object 0xc93c40 of class NSCFNumber autoreleased with no pool in place - just leaking
Stack: (0x1d5df4f 0x1c6a432 0x64b09c1 0x5b5c3a8 0x5b5cf07 0x1ac5ff7 0x1ac21ce 0x1add099 0x1adc5ea 0x1adca35 0x1ac894d 0x1ac8d35 0x1ac8ca1 0x5bc47b4 0x1ac4f88 0x1ac2829 0x1ac21ce 0x1adce0f 0x1adc5ea 0x1adca35 0x5c9aac9 0x5bc414e 0x79083ba 0x2b1b80 0x2b431c 0x2b20ab 0x2b431c 0x2b20ab 0x2b431c 0x22d9cb 0x1fba95 0x2aa5ce 0x1fbb40 0x7897684 0x1ac21ce 0x1add099 0x1adb7c7 0x7890484 0x2b1b80 0x2b431c 0x2b20ab 0x2b431c 0x22d9cb 0x1fba95 0x2aea26 0x2b3a9f 0x2b431c 0x2b20ab 0x2b431c 0x22dad2 0x1fba95 0x2aea26 0x2b3a9f 0x2b3a9f 0x2b431c 0x22d9cb 0x1fba95 0x20d29e 0x1fba95 0x2aa5ce 0x2f37ec 0x412055 0x411f12)
2015-04-08 13:37:07.907 Xpra[98954:5bfb] *** _NSAutoreleaseNoPool(): Object 0x116aab20 of class NSCFDictionary autoreleased with no pool in place - just leaking
Stack: (0x1d5df4f 0x1c6a432 0x6c01be 0x64b09e9 0x5b5c3a8 0x5b5cf07 0x1ac5ff7 0x1ac21ce 0x1add099 0x1adc5ea 0x1adca35 0x1ac894d 0x1ac8d35 0x1ac8ca1 0x5bc47b4 0x1ac4f88 0x1ac2829 0x1ac21ce 0x1adce0f 0x1adc5ea 0x1adca35 0x5c9aac9 0x5bc414e 0x79083ba 0x2b1b80 0x2b431c 0x2b20ab 0x2b431c 0x2b20ab 0x2b431c 0x22d9cb 0x1fba95 0x2aa5ce 0x1fbb40 0x7897684 0x1ac21ce 0x1add099 0x1adb7c7 0x7890484 0x2b1b80 0x2b431c 0x2b20ab 0x2b431c 0x22d9cb 0x1fba95 0x2aea26 0x2b3a9f 0x2b431c 0x2b20ab 0x2b431c 0x22dad2 0x1fba95 0x2aea26 0x2b3a9f 0x2b3a9f 0x2b431c 0x22d9cb 0x1fba95 0x20d29e 0x1fba95 0x2aa5ce 0x2f37ec 0x412055 0x411f12)
2015-04-08 13:37:07,909 stop_receiving_sound(True) done
2015-04-08 13:37:07,909 sound_sink_overrun() will restart in 0ms (server supports eos sequence: True)
...

Could be related to the auto-release stuff?

@totaam
Copy link
Collaborator Author

totaam commented Apr 8, 2015

More tweaks and fixes in r8970 + r8971 + r8972 + r8974. Restarts should be quicker now. Though we still have to spawn a sound sink process client side, wait for it, then we tell the server we're ready, the server spawns its sound source process and we eventually get some data to forward... there will always be a delay. (we could shave off some of this setup cost - but not easily, so not doing it unless we have to)

Done most of the remaining backporting of the fixes to v0.14.x in 8973.

I was struggling to figure out why OSX was being so crap and deadlocking at random... It turns out that fixing the stacktraces from comment:27 also fixes the OSX deadlocks!
Small rant: no idea why an unclean shutdown in one process manages to lockup another process' UI thread, but that's OSX for you.

r8975 gives us a lot more time to shutdown the gstreamer pipeline cleanly: we wait 250 ms in the sound process, 500 ms in the xpra client before we force kill it. (and in the meantime we can start a new process)

@Afarrr: ready for more testing, and OSX should be stable too! (I've just done 350+ overrun restarts without any issue)
I have a few minor things I would like to tidy up, but I can do this in parallel.

@totaam
Copy link
Collaborator Author

totaam commented Apr 9, 2015

2015-04-09 07:02:14: antoine uploaded file sound-wait-for-process-on-exit.patch (7.8 KiB)

on osx, we also need to wait for a clean sound subprocess exit on client exit... this patch attempts to solve that (and causes more problems instead)

@totaam
Copy link
Collaborator Author

totaam commented Apr 9, 2015

It takes AGES to trigger the bug on osx. Not sure r8976, r8977 or r8978 make any difference.

It requires a fairly large number of restarts to hit - not sure if the number of restarts is directly relevant or if it just a rare error condition which takes a lot of attempts to hit.

@totaam
Copy link
Collaborator Author

totaam commented Apr 10, 2015

More fixes and improvements, including to the child reaper (r8982) and adding hooks for it (r8984), misc + cleanups (r8983).
And FINALLY, fixed OSX signal exit in r8986, improved in r8987.

@afarr: the OSX client now exits reliably again, whether you exit from the osx global menu, or using control-C.

@totaam
Copy link
Collaborator Author

totaam commented Apr 10, 2015

I have published beta builds for all platforms with all these changes.

Caveats:

  • the auto-release warnings are still there, going to ignore those for now
  • I think OSX is still leaking something, I can still get the deadlocks after many restarts... (maybe be related to the point above)

@totaam
Copy link
Collaborator Author

totaam commented Apr 12, 2015

r8992 improves crash and error handling and more, see commit message and info below.

Here's a recap of the new env vars that can be used for testing (making it easier to test error handling):

  • XPRA_SOUND_FAKE_OVERRUN=number : emit an overrun message after number of seconds
  • XPRA_SOUND_FAKE_START_FAILURE="1" : make the gstreamer sound pipeline fail to start, the process will be listening on the pipes for commands from its parent process, but the gstreamer pipeline won't be started (simulates what can happen if the installation is broken, missing some plugins or if gstreamer gets stuck)
  • XPRA_SOUND_FAKE_EXIT=number : makes the sound process exit "cleanly" after number of seconds. The process will shutdown cleanly, stopping the sound pipeline properly and telling the parent process about it before closing the pipes and terminating.
  • XPRA_SOUND_FAKE_CRASH : exits "uncleanly" after number of seconds. This may not shutdown the sound pipeline cleanly and may cause problems on some platforms (wink, wink, osx).

We should be handling almost all of these errors gracefully now. In particular, we now tell the other end to stop sending sound when the process died - something that is still a problem in the v0.14.x branch and needs fixing.

Note: see also #835: " synchronize sound with video frames "

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

2015-04-17 02:31:33: afarr commented


Testing on OSX (0.15.0 r9016) against fedora 20 (also 0.15.0 r9016).

XPRA_SOUND_FAKE_OVERRUN works as expected, and the speakers dependably re-start... but I periodically see an internal error: sound-sink internal error: write connection TwoFileConnection() reset: [Errno 32] Broken pipe, and the speaker re-start is still noticeably delayed, about a long .7 seconds... as indicated in the timestamps of the logs:

2015-04-16 17:53:33,985 re-starting speaker because of overrun
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
2015-04-16 17:53:34,489 sound-sink internal error: write connection TwoFileConnection() reset: [Errno 32] Broken pipe
2015-04-16 17:53:34,685 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)

The others all seem to behave as expected, though I did notice that, after the FAKE_CRASH was induced, I was able to restart speakers... but when I then clicked on the Sound application menu item (top left menu), the UI thread blocking then induced the speakers to stop - not then restarting when the UI thread resumes... and caused a traceback server-side.

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 1804, in _process_resume
    ss.resume(ui, wd)
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 450, in resume
    ws.resume(window)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 384, in resume
    self.send_window_icon(window)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 412, in send_window_icon
    surf = window.get_property("icon")
TypeError: object of type `xpra+x11+gtk_x11+window+OverrideRedirectWindowModel' does not have property `icon'

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

but I periodically see an internal error: sound-sink internal error: write connection TwoFileConnection() reset: [Errno 32] Broken pipe

That's OK. The sound process and the main client process are talking to each other over pipes, and when we shutdown the sound process, those pipes are closed causing the warning.
If we continue to get many overruns, I guess I could add some code to silence this warning.

the UI thread blocking then induced the speakers to stop - not then restarting when the UI thread resumes...

Can you provide the -d sound client log output?

and caused a traceback server-side.

Good catch. That's fixed in r9029 and will need backporting.
You need an OR window to trigger this bug (ie: usually menus are OR windows) and suspend + resume (can be triggered using the control channel on platforms other than OSX).

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

2015-04-17 23:53:53: afarr commented


Hmm, looks like the UI thread was a false positive. Tested with no environment variables, and sound was fine. Re-set XPRA_SOUND_FAKE_CRASH=7 and then realized that re-starting speakers just restarted counter for the fake sound crashes (which had just been coincidentally hitting while the UI thread was blocked, apparently.

Just to make it easy to confirm, here's an excerpt for the speaker stopping:

2015-04-17 14:26:56,794 sound-sink add_data(261 bytes, {'duration': 26122449, 'timestamp': 22439183673L, 'caps': 'audio/mpeg, mpegversion=(int)1, mpegaudioversion=(int)1, layer=(int)3, channels=(int)2, rate=(int)44100', 'sequence': 0, 'time': 1429306016665L}) queue_state=pushing, src=/GstPipeline:pipeline0/GstAppSrc:src (__main__.GstAppSrc)
2015-04-17 14:26:56,795 sound-sink pushed 261 bytes, new buffer level: 0ms
2015-04-17 14:26:56,796 info_update: {'pipeline': 'appsrc name=src max-bytes=32768 emit-signals=0 block=0 is-live=0 stream-type=stream format=4 ! mp3parse ! mad ! audioconvert ! audioresample ! volume name=volume volume=1.0 ! queue name=queue min-threshold-time=45000000 max-size-buffers=0 max-size-bytes=0 max-size-time=450000000 leaky=2 ! osxaudiosink async=True qos=True sync=False', 'queue.min_time': 45, 'queue.time': 450, 'queue.state': 'pushing', 'codec_description': 'MPEG 1 Audio, Layer 3 (MP3)', 'bytes': 266380, 'pid': 10225, 'volume': 100, 'state': 'active', 'queue.overruns': 0, 'queue.used_pct': 0, 'speaker.bitrate': 99000, 'codec_mode': 'joint', 'queue.used': 0, 'codec': 'mp3', 'buffers': 861}
2015-04-17 14:26:56,824 the mp3 sound sink has stopped
2015-04-17 14:26:56,825 stop_receiving_sound(True) sound sink=sink_subprocess_wrapper(<subprocess.Popen object at 0x91cb9b0>)
2015-04-17 14:26:56,825 stop_receiving_sound(True) calling <bound method sink_subprocess_wrapper.cleanup of sink_subprocess_wrapper(<subprocess.Popen object at 0x91cb9b0>)>
2015-04-17 14:26:56,825 cleanup() sending cleanup request to sound-sink
2015-04-17 14:26:56,825 sound_sink_overrun() not the current sink, ignoring it
2015-04-17 14:26:56,825 stop_receiving_sound(True) done
2015-04-17 14:26:56,826 sound_process_stopped(sink_subprocess_wrapper(<subprocess.Popen object at 0x91cb9b0>), ()) not the current sink, ignoring it
2015-04-17 14:26:56,828 speaker is now disabled - dropping packet
2015-04-17 14:27:03,585 UI thread is now blocked
2015-04-17 14:27:05,691 UI thread is running again, resuming
2015-04-17 14:27:05,693 start_receiving_sound() sound sink=None
2015-04-17 14:27:05,693 start_sound_sink(mp3)
2015-04-17 14:27:05,693 starting mp3 sound sink
2015-04-17 14:27:05,693 start_receiving_sound(mp3)
2015-04-17 14:27:05,702 mp3 sound sink started

Will test the fix after the backporting I suppose. Otherwise, I think I'll just let some sound stream on osx for a while and see if it has any problems.

@totaam
Copy link
Collaborator Author

totaam commented Apr 20, 2015

The new wrapper was not firing the emulated signals from the UI thread, which was causing crashes on win32 with GTK3, r9079 fixes that. (and this could have caused other problems elsewhere - a nice find!)

@totaam
Copy link
Collaborator Author

totaam commented Apr 30, 2015

r9201 fixes the "microphone" forwarding (sound from client to server).

This should be the last piece of the puzzle, further improvements will go in the next release, see #835 and #849.

@totaam
Copy link
Collaborator Author

totaam commented May 1, 2015

2015-05-01 23:30:00: afarr commented


Testing with osx client 0.15.0 9202 vs. fedora 20 0.15.0 r9137, with just a pair of idle xterms, I'm seeing client-side output indicating that the UI thread is blocked/running again about once per minute.

2015-05-01 14:55:27,066 UI thread is now blocked
2015-05-01 14:55:27,067 UI thread is running again, resuming
2015-05-01 14:56:27,088 UI thread is now blocked
2015-05-01 14:56:27,089 UI thread is running again, resuming
2015-05-01 14:57:11,056 re-starting speaker because of overrun
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
2015-05-01 14:57:11,558 sound-sink internal error: write connection TwoFileConnection() reset: [Errno 32] Broken pipe
2015-05-01 14:57:13,556 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-05-01 14:57:18,915 UI thread is now blocked
2015-05-01 14:57:18,915 UI thread is running again, resuming
2015-05-01 14:58:18,920 UI thread is now blocked
2015-05-01 14:58:18,921 UI thread is running again, resuming

Connecting and waiting patiently for a minute (or doing something else then glancing back at the terminal) with just an xterm or two should reproduce.

Enabling -d sound on the server side I'm not seeing anything that looks interesting, just a lot of these over and over::

2015-05-01 15:01:27,069 info_update: {'pipeline': 'pulsesrc ! audioconvert ! volume name=volume volume=1.0 ! lamemp3enc ! appsink name=sink', 'codec_description': 'MPEG-1 Layer 3 (MP3)', 'bytes': 1012232, 'pid': 10090, 'volume': 100, 'state': 'active', 'codec': 'mp3', 'buffers': 9733}2015-05-01 15:01:27,069 sound_source_info: {'pipeline': 'pulsesrc ! audioconvert ! volume name=volume volume=1.0 ! lamemp3enc ! appsink name=sink', 'codec_description': 'MPEG-1 Layer 3 (MP3)', 'bytes': 1012232, 'pid': 10090, 'volume': 100, 'state': 'active', 'codec': 'mp3', 'time': 1430517687, 'buffers': 9733}
2015-05-01 15:01:27,078 new_sound_buffer(source_subprocess_wrapper(<subprocess.Popen object at 0x2014e90>), 104, {'duration': 26122449, 'timestamp': 254223673469, 'caps': 'audio/mpeg, mpegversion=(int)1, mpegaudioversion=(int)1, layer=(int)3, channels=(int)2, rate=(int)44100', 'time': 1430517687077}) suspended=False

Oddly enough, starting a browser and playing something that actually has sound ends the output. Closing the browser again, it takes about 5 minutes for the UI messages to resume.

@totaam
Copy link
Collaborator Author

totaam commented May 2, 2015

@afarr: is this related to sound? (ie: does this go away if you turn sound off)
If not, then this belongs in another ticket.
Is this a recent regression?
I am obviously not seeing this, I get plenty of overruns when testing in virtualbox but no UI thread issues.

@totaam
Copy link
Collaborator Author

totaam commented May 11, 2015

I have moved the UI thread issue to #855.
I think we can close this.

@totaam
Copy link
Collaborator Author

totaam commented May 12, 2015

Minor fixes (mostly for those that try py3k support):

  • r9313 (backport in 9314)
  • r9315 (backport in 9319)
  • r9316 (backport in 9318)
  • r9320 (backport in 9322)

@totaam
Copy link
Collaborator Author

totaam commented May 12, 2015

One more thing that needs doing: I can get the win32 client to get stuck on control-c...

@totaam
Copy link
Collaborator Author

totaam commented May 13, 2015

Well, that was not easy - or clean (now we can get stacktraces on shutdown...), but at least the client will exit reliably on win32 now.

For v0.15.x, we will need to apply at least: r9334, r9339, r9340
And maybe also: r9335, r9337

@totaam
Copy link
Collaborator Author

totaam commented May 13, 2015

Backports in 9347 + 9350.

@totaam totaam closed this as completed May 13, 2015
@totaam
Copy link
Collaborator Author

totaam commented May 18, 2015

Related tickets:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant