Skip to content

mosquitto_sub hits deadlock when using the -W (timeout) options #3175

@audricschiltknecht

Description

@audricschiltknecht

The mosquitto_sub client can end in a deadlock in certain scenarios when the timeout option (`-W) is set.
The client utilizes alarm() to set a signal handler that triggers once the timeout expires. However, this can result in a deadlock condition in the following case:

  • Client is displaying a log: a lock on &mosq->log_callback_mutex is taken in log__printf
  • Signal occurs, handler is called.
  • Client will call the mosquitto_disconnect_v5 function.
  • Function will at some point issue a log when calling send__disconnect
  • log__printf will try to acquire the lock on the same mutex from step 1: deadlock.

This is somewhat difficult to reproduce, but I've found the issue will investigating what we think is #2887 and a variation of #2887 (comment) seems to be the best bet to end up in the same case.

So:

  • Send lots of retained topics to the broker: for i in {0..20000}; do echo $i; mosquitto_pub -t "retained/$i" -m $i -r -q 0 ; done
  • Then try to remove all retained messages and enable debug logging: mosquitto_sub --retained-only --remove-retained -W 1 -F %t -t "retained/+" -d

Environment and version

  • Tested on mosquitto version: 2.0.18/2.0.20
  • OS: Ubuntu 24.04

Backtrace from GDB:

(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0x58816d092760) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x58816d092760, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007a62076a00f1 in lll_mutex_lock_optimized (mutex=0x58816d092760) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x58816d092760) at ./nptl/pthread_mutex_lock.c:93
#4  0x00007a6207857aac in log__printf (mosq=0x58816d0925f0, priority=16, fmt=0x7a620786987f "Client %s sending DISCONNECT") at /home/audric/Projets/mosquitto/lib/logging_mosq.c:40
#5  0x00007a6207862789 in send__disconnect (mosq=0x58816d0925f0, reason_code=4 '\004', properties=0x0) at /home/audric/Projets/mosquitto/lib/send_disconnect.c:54
#6  0x00007a62078552a4 in mosquitto_disconnect_v5 (mosq=0x58816d0925f0, reason_code=4, properties=0x0) at /home/audric/Projets/mosquitto/lib/connect.c:263
#7  0x000058816b69cf17 in my_signal_handler (signum=14) at /home/audric/Projets/mosquitto/client/sub_client.c:55
#8  <signal handler called>
#9  0x00007a620771c574 in __GI___libc_write (fd=1, buf=0x58816d094130, nbytes=86) at ../sysdeps/unix/sysv/linux/write.c:26
#10 0x00007a6207693965 in _IO_new_file_write (f=0x7a62078045c0 <_IO_2_1_stdout_>, data=0x58816d094130, n=86) at ./libio/fileops.c:1181
#11 0x00007a6207692561 in new_do_write (to_do=86, data=0x58816d094130 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))\n", fp=0x7a62078045c0 <_IO_2_1_stdout_>) at ./libio/libioP.h:1030
#12 _IO_new_do_write (fp=fp@entry=0x7a62078045c0 <_IO_2_1_stdout_>, data=0x58816d094130 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))\n", to_do=86) at ./libio/fileops.c:426
#13 0x00007a6207692ee3 in _IO_new_file_overflow (f=0x7a62078045c0 <_IO_2_1_stdout_>, ch=10) at ./libio/fileops.c:784
#14 0x00007a6207687dca in __GI__IO_puts (str=0x58816d094590 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))") at ./libio/ioputs.c:41
#15 0x000058816b69d459 in my_log_callback (mosq=0x58816d0925f0, obj=0x58816b6ad060 <cfg>, level=16, str=0x58816d094590 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))") at /home/audric/Projets/mosquitto/client/sub_client.c:176
#16 0x00007a6207857bbd in log__printf (mosq=0x58816d0925f0, priority=16, fmt=0x7a6207868220 "Client %s received PUBLISH (d%d, q%d, r%d, m%d, '%s', ... (%ld bytes))") at /home/audric/Projets/mosquitto/lib/logging_mosq.c:54
#17 0x00007a6207856462 in handle__publish (mosq=0x58816d0925f0) at /home/audric/Projets/mosquitto/lib/handle_publish.c:116
#18 0x00007a620786200d in handle__packet (mosq=0x58816d0925f0) at /home/audric/Projets/mosquitto/lib/read_handle.c:51
#19 0x00007a620785f88c in packet__read (mosq=0x58816d0925f0) at /home/audric/Projets/mosquitto/lib/packet_mosq.c:566
#20 0x00007a6207858935 in mosquitto_loop_read (mosq=0x58816d0925f0, max_packets=4096) at /home/audric/Projets/mosquitto/lib/loop.c:380
#21 0x00007a6207858140 in mosquitto_loop (mosq=0x58816d0925f0, timeout=-1, max_packets=1) at /home/audric/Projets/mosquitto/lib/loop.c:149
#22 0x00007a620785855d in mosquitto_loop_forever (mosq=0x58816d0925f0, timeout=-1, max_packets=1) at /home/audric/Projets/mosquitto/lib/loop.c:257
#23 0x000058816b69de5d in main (argc=10, argv=0x7ffc9a6c2078) at /home/audric/Projets/mosquitto/client/sub_client.c:403
#24 0x00007a620762a1ca in __libc_start_call_main (main=main@entry=0x58816b69db17 <main>, argc=argc@entry=10, argv=argv@entry=0x7ffc9a6c2078) at ../sysdeps/nptl/libc_start_call_main.h:58
#25 0x00007a620762a28b in __libc_start_main_impl (main=0x58816b69db17 <main>, argc=10, argv=0x7ffc9a6c2078, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc9a6c2068) at ../csu/libc-start.c:360
#26 0x000058816b69ce05 in _start ()

Metadata

Metadata

Assignees

Labels

Component: libmosquittoStatus: CompletedNothing further to be done with this issue, it can be closed by the requestor or committer.

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions