Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lockup condition #169

Closed
eroom1966 opened this issue Nov 8, 2018 · 15 comments
Closed

lockup condition #169

eroom1966 opened this issue Nov 8, 2018 · 15 comments

Comments

@eroom1966
Copy link

eroom1966 commented Nov 8, 2018

I saw another posting regarding deadlock conditions and the advice was to rebuild with debug
here are my results, after a while having issued a number of aplay commands, one command does not return, here is the tail of the logging, I could attach a debugger if that is any help ?
not quite sure what I need to attach to bluealsa or aplay ?

This is running on an RPi 3B running Jessie

Output of stdout
../../../src/asound/bluealsa-pcm.c:342: Prepared
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:214: Starting
../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:110: Starting IO loop
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/../shared/ctl-client.c:468: Requesting PCM drain for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:121: IO thread paused: 4
../../../src/asound/bluealsa-pcm.c:254: Stopping
../../../src/asound/bluealsa-pcm.c:254: Stopping
../../../src/asound/bluealsa-pcm.c:318: Freeing HW
../../../src/asound/../shared/ctl-client.c:417: Closing PCM for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:272: Closing plugin
../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0
../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1
../../../src/asound/bluealsa-pcm.c:534: Setting constraints
../../../src/asound/bluealsa-pcm.c:282: Initializing HW
../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:305: FIFO buffer size: 4096
../../../src/asound/bluealsa-pcm.c:311: Selected HW buffer: 6 periods x 16380 bytes <= 98284 bytes
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:342: Prepared
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:214: Starting
../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:110: Starting IO loop

Output of syslog
Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 2 -> 0
Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1027: Closing PCM: 12
Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:605: Client closed connection: 11
Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:638: Received new connection: 11
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:651: New client accepted: 11
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:353: PCM requested for 94:23:6E:5D:F1:78 type 1 stream 0
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/bluez.c:1058: Signal: PropertiesChanged: org.bluez.MediaTransport1: State
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:874: New transport: 15 (MTU: R:672 W:672)
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 0 -> 2
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:122: Created new IO thread: A2DP: A2DP Source
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:386: Starting IO loop: A2DP Source (SBC)
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 2 -> 2
Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-

@eroom1966
Copy link
Author

OK a follow up, I ran gdb on bluealsa, and it is stuck in a poll()
(gdb) where
Cannot access memory at address 0x1
#0 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x76c42528 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Cannot access memory at address 0x1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info thr
Id Target Id Frame
7 Thread 0x768323f0 (LWP 678) "bactl" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
6 Thread 0x75eff3f0 (LWP 679) "gdbus" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x74cff3f0 (LWP 2754) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
4 Thread 0x754ff3f0 (LWP 7528) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
3 Thread 0x73aff3f0 (LWP 7529) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x742ff3f0 (LWP 30925) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81

  • 1 Thread 0x76f02210 (LWP 675) "bluealsa" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81

@eroom1966
Copy link
Author

I am currently testing with the THREAD envvar workaround in README, not sure if this is related to what I am seeing

@eroom1966
Copy link
Author

eroom1966 commented Nov 9, 2018

setting
export LIBASOUND_THREAD_SAFE=0
made no difference I am afraid.

I have attached gdb to aplay, and this is where I see it locked up.
(gdb) info thr
Id Target Id Frame
2 Thread 0x76a27460 (LWP 27412) "pcm-io" 0x76db0c04 in write () at ../sysdeps/unix/syscall-template.S:81

  • 1 Thread 0x76f81000 (LWP 27408) "aplay" 0x0000db68 in ?? ()

// thread 1
(gdb) where
#0 0x0000db68 in ?? ()
#1 0x76e9f5e4 in snd_pcm_wait () from /usr/lib/arm-linux-gnueabihf/libasound.so.2
#2 0x00000002 in ?? ()

// thread 2
(gdb) where
#0 0x76db0c04 in write () at ../sysdeps/unix/syscall-template.S:81
#1 0x76a546c4 in io_thread (arg=0x1129368) at ../../../src/asound/bluealsa-pcm.c:185
#2 0x76da8e90 in start_thread (arg=0x76a27460) at pthread_create.c:311
#3 0x76d32598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

any help ?

the last messages from the debug I see are as follows
../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0
../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1
../../../src/asound/bluealsa-pcm.c:534: Setting constraints
../../../src/asound/bluealsa-pcm.c:282: Initializing HW
../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:305: FIFO buffer size: 4096
../../../src/asound/bluealsa-pcm.c:311: Selected HW buffer: 6 periods x 16380 bytes <= 98284 bytes
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:342: Prepared
../../../src/asound/bluealsa-pcm.c:326: Initializing SW
../../../src/asound/bluealsa-pcm.c:214: Starting
../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:110: Starting IO loop

No more messages after this point

@arkq
Copy link
Owner

arkq commented Nov 9, 2018

Hmm... strange thing... Can you describe how you are able to reproduce this issue? I'll try this on my box and hopefully it will deadlock too. If not, I will think how to guide you through debugging process.

@eroom1966
Copy link
Author

eroom1966 commented Nov 9, 2018

@arkq
sure let me explain.

My application synthesizes short messages and then plays them via aplay.
So in effect I have a library of words in .wav files, these are concatenated to make a spoken sentence using 'sox'
The resultant .wav file is then played via aplay, using the bluealsa referenced device.
these play at a non deterministic rate of maybe one message every 20 seconds.
The messages are queued, so there is never a (intentional) situation where 2 aplay commands are running concurrently, a new aplay command only runs, after the previous has exited

In order to reproduce, I leave the application running for a couple of hours, at some point I see that messages are no longer being played, and the aplay process has not returned - this is non-deterministic, and cannot be reproduced with the same inputs

There are so many variables involved here 'bluetoothd', 'blualsa' and 'aplay' - I guess any of these could be at fault ?

@eroom1966
Copy link
Author

eroom1966 commented Nov 9, 2018

Additional information, I noticed the following in the logging
../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0
../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1
../../../src/asound/bluealsa-pcm.c:534: Setting constraints
../../../src/asound/bluealsa-pcm.c:282: Initializing HW
../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:287: Couldn't open PCM FIFO: Device or resource busy

the previous message output appeared to end successfully

../../../src/asound/../shared/ctl-client.c:468: Requesting PCM drain for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:121: IO thread paused: 4
../../../src/asound/bluealsa-pcm.c:254: Stopping
../../../src/asound/bluealsa-pcm.c:254: Stopping
../../../src/asound/bluealsa-pcm.c:318: Freeing HW
../../../src/asound/../shared/ctl-client.c:417: Closing PCM for 94:23:6E:5D:F1:78
../../../src/asound/bluealsa-pcm.c:272: Closing plugin

I am also seeing this which may be significant
aplay: set_params:1297: Unable to install hw params:
ACCESS: RW_INTERLEAVED
FORMAT: S16_LE
SUBFORMAT: STD
SAMPLE_BITS: 16
FRAME_BITS: 16
CHANNELS: 1
RATE: 44100
PERIOD_TIME: (85333 85334)
PERIOD_SIZE: (3763 3764)
PERIOD_BYTES: (7526 7528)
PERIODS: (5 6)
BUFFER_TIME: (511882 511883)
BUFFER_SIZE: 22574
BUFFER_BYTES: 45148
TICK_TIME: 0

@eroom1966
Copy link
Author

@arkq
A quick update, I have scattered debug messages in src/asound/bluealsa-pcm.c to see where the deadlock occurs, and it appears to block here

if ((ret = write(pcm->pcm_fd, head, len)) == -1)

so it looks like the write() syscall to the file descriptor is blocked

It is strange that when I connect a gdb to the bluealsa process, all threads are in poll()
not sure how to progress this - any ideas ?

@arkq
Copy link
Owner

arkq commented Nov 10, 2018

The src/asound/bluealsa-pcm.c file is a part of PCM ALSA plug-in, so you should attach gdb to the aplay process, then you should see that it indeed has stuck in there.

OK, so since bluealsa waits in poll() and PCM tries to write data however it fails (blocks), that's something. It might be, that bluealsa polls on wrong file descriptor. Please check exactly at which poll()s bulealsa waits in "baio" threads. Then when you locate this poll, check the content of pfds array. It is also possible that bluealsa is here, then as well check content of pfd struct.

@eroom1966
Copy link
Author

@arkq
here is the stacktrace from baio, in fact there are 4 threads in baio - is that correct ?
BTW, I think I am missing some debug symbols here, how should I configure/build with debug symbols ?

(gdb) info thr
Id Target Id Frame
7 Thread 0x7688d3f0 (LWP 654) "bactl" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
6 Thread 0x75eff3f0 (LWP 655) "gdbus" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x74cff3f0 (LWP 2158) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
4 Thread 0x754ff3f0 (LWP 3039) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
3 Thread 0x742ff3f0 (LWP 3040) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x73aff3f0 (LWP 2214) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

  • 1 Thread 0x76f5c210 (LWP 651) "bluealsa" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

Stack trace for each thread

Thread 2
(gdb) where
#0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00017c98 in io_thread_write_bt (len=629, buffer=0x73102b00 "\200`\223\211[\n\274\063", fd=23) at ../../src/io.c:141
#2 io_thread_a2dp_source_sbc (arg=0x0) at ../../src/io.c:488
#3 0x76b9fe90 in start_thread (arg=0x73aff3f0) at pthread_create.c:311
#4 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

Thread 3
(gdb) where
#0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x000180e8 in io_thread_sco (arg=0x1d168) at ../../src/io.c:1414
#2 0x76b9fe90 in start_thread (arg=0x742ff3f0) at pthread_create.c:311
#3 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

Thread 4
(gdb) where
#0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x000197d8 in rfcomm_thread (arg=0x0) at ../../src/rfcomm.c:706
#2 0x76b9fe90 in start_thread (arg=0x754ff3f0) at pthread_create.c:311
#3 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

Thread 5
(gdb) where
#0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x000197d8 in rfcomm_thread (arg=0x0) at ../../src/rfcomm.c:706
#2 0x76b9fe90 in start_thread (arg=0x74cff3f0) at pthread_create.c:311
#3 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

@arkq
Copy link
Owner

arkq commented Nov 10, 2018

in fact there are 4 threads in baio - is that correct?

Yes, it is correct, there are 4 baio threads. Nothing to worry about.

Thread 2 is in io_thread_write_bt (len=629, buffer=0x73102b00, fd=23) at ../../src/io.c:141, that's the problem.... This poll() waits forever, because there is no space left in the BT socket (obtained from bluez). Also I see, that you are not using current master build (lines do not align). Try using current master (however, I doubt, that it will work).

In short, the problem is with writing data to bluez transport socket. You can reproduce this, by starting playback to BT headset (or whatever), and put this headset into a microwave oven (do not turn it on :D) or any Faraday cage for microwaves. Bluealsa should stuck in this io.c:141 poll() as well.

I do not know what might cause this problem in your case... The problem might be that the BT link between host and BT device has been dropped, bluetoothd has misbehaved, or bluealsa is using bluez is a wrong way.... Try powering off BT receiver (speaker) and see whether bluealsa has moved forward or is stuck in this poll() forever.

This poll() is here to ensure that every packet is delivered to the BT device. However, it might be a good idea to terminate this poll after a while (few seconds) and indicate that the device is gone. Then, someone (bluealsa client, e.g. PCM plug-in) might try to open PCM device once more.

@eroom1966
Copy link
Author

@arkq
I am running code off the Master, but I took this a few days ago, and of course I have added in lots of debug messages which would cause the line numbers to change. I will pull the latest master and try again and post some results here
Thx

@eroom1966
Copy link
Author

eroom1966 commented Nov 12, 2018

It seems like the bluetoothd (version 5.23 RPi 3B / Jessie) gets into some bad state
if I go into bluetoothctl, I cannot power on
[bluetooth]# power on
Failed to set power on: org.bluez.Error.Failed

if I list via rfkill
$ rfkill list
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
1: hci0: Bluetooth
Soft blocked: no
Hard blocked: no

This is well beyond my capabilities. I tried restarting the bluetooth and bluealsa daemons - no success once the bluetooth cannot power on, the only solution is a system reboot.

any ideas anybody ?

@dagbdagb
Copy link

dagbdagb commented Nov 12, 2018 via email

@eroom1966
Copy link
Author

eroom1966 commented Nov 12, 2018

@dagbdagb
At the point where it seems to go wrong I notice the following in the logfile
I have added extra instrumentation into the bluz-alsa code, firstly the poll() does not wait forever it times out after 1 second and retries a write 5 times. The interesting thing, the resource becomes unavailable, then bluetoothd reports a 'Suspend' followed by 'Abort'

Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:152: write() pfd.fd=20
Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:155: write()=-1 errno=11 Resource temporarily unavailable retries=5
Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:161: write()=-1 errno=EAGAIN
Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:152: write() pfd.fd=20
Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:833: PCM drained
Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:455: PCM close for 94:23:6E:5D:F1:78 type 1 stream 0
Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:905: Releasing transport: A2DP Source (SBC)
Nov 12 09:55:26 raspberrypi authpriv.notice sudo: pi : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/usr/bin/l2ping -c 1 -i hci0 94:23:6E:5D:F1:78
Nov 12 09:55:26 raspberrypi authpriv.info sudo: pam_unix(sudo:session): session opened for user root by pi(uid=0)
Nov 12 09:55:30 raspberrypi daemon.err bluetoothd[643]: Suspend: Connection timed out (110)
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:933: Closing BT: 20
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1073: Exiting IO thread
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1031: Closing PCM: 15
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/bluez.c:1058: Signal: PropertiesChanged: org.bluez.MediaTransport1: State
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:605: Client closed connection: 14
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:735: State transition: 2 -> 0
Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-
Nov 12 09:55:32 raspberrypi daemon.err bluetoothd[643]: Abort: Connection timed out (110)

at this point now hci0 is reported down, and I cannot bring it up

root@raspberrypi:~/servers# hciconfig
hci0: Type: BR/EDR Bus: UART
BD Address: B8:27:EB:1D:FA:46 ACL MTU: 1021:8 SCO MTU: 64:1
DOWN
RX bytes:20044 acl:219 sco:0 events:1959 errors:0
TX bytes:1799270 acl:3641 sco:0 commands:102 errors:0

root@raspberrypi:~/servers# hciconfig hci0 up
Can't init device hci0: Connection timed out (110)

Tried unloading/reloading the bluetooth hardware driver?

Hmm, not sure how to do that, can you advise ?

@eroom1966
Copy link
Author

Final update
I installed Debian 9/Stretch and this seems to have solved the problem, so I presume the issues were related to /usr/sbin/bluetoothd - having upgraded to 5.43, this now appears solid
closing
Many thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants