You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found a scenario in which the mosquitto_loop_stop(), called with force=false, never returns, leaving the hosting application stuck forever: mosquitto_loop_stop() remains blocked on pthread_join() call, because the network thread does not exit even if mosquitto_disconnect() has been called prior to mosquitto_loop_stop().
Issue seen in lib 1.6.12 in threaded mode on Linux - but, by inspecting the code, I would say that this issue affects current master's head, too.
I analyzed the runtime behavior by adding logs to the lib and found out why this is happening.
Please consider this sequence:
1.. network thread is blocked (for some seconds) inside net__socket_connect(blocking=true) - called by mosquitto__reconnect() (which is in turn called by mosquitto_loop_forever()).
This happens e.g. if broker is currently not reachable anymore, so the thread is trying to reconnect - thus net__socket_connect() waits for connection establishment.
2.. at the very same time, application calls mosquitto_disconnect(), which sets client state to mosq_cs_disconnected (which is the condition checked by the network thread to break its loop and exit). A good way to call mosquitto_disconnect() at the right moment to trigger the issue is to add logs before and after the net__socket_connect() (see step above).
3.. then, network thread returns from the net__socket_connect() call, with rc != 0 (broker is still unreachable) - and so client state is reset to mosq_cs_connect_pending:
this irreparably overrides the exit request by API of step 2.
4.. as a consequence, network thread does not exit (state is not mosq_cs_disconnected anymore)
5.. application now calls mosquitto_loop_stop(force=false), which remains stuck forever on the pthread_join() call - thread is not breaking its loop.
As a real world example, steps 2+5 are a very normal application shutdown sequence, which then may cause application hang while exiting with broker currently unreachable.
Personal thought: I think that the tecnique of setting the state from API to make the thread exit is very risky, given the continuos re-set of the state performed by the network thread on itself. Indeed, I suspect that the above sequence may not be the only problematic one.
A different method would be required to be sure that the network thread always exits when requested. Since mosquitto has already a socktpair for application thread to network thread communication, an idea would be to use that to signal the exit request (e.g. by writing a "special value" byte - currently the lib writes always a zero byte value). Then, when reading from mosq->sockpairR in mosquitto_loop() and interruptible_sleep(), the network thread may check the read byte value and signal "exit from API request" condition to mosquitto_loop_forever() with a new specific rc value - to break its loop.
Alternatively, a global "thread exit requested" flag may be set by API and read by the thread, in a mutexed context.
The text was updated successfully, but these errors were encountered:
Hi Roger,
I see the fix in 2.0.15, thanks. Just an additional, probably minor, note: if the network thread is sleeping inside interruptible_sleep() (e.g. because of reconnection retry delay), mosquitto_loop_stop() will be blocked on the pthread_join() call until interruptible_sleep() returns on its own; this would cause the application to be blocked for various seconds, depending on the selected reconnection backoff interval.
Sending the usual dummy byte into the socketpair in mosquitto_disconnect_v5() would awake the network thread and let it see the exit request immediately.
Just a thought.
Hi, I found a scenario in which the mosquitto_loop_stop(), called with force=false, never returns, leaving the hosting application stuck forever: mosquitto_loop_stop() remains blocked on pthread_join() call, because the network thread does not exit even if mosquitto_disconnect() has been called prior to mosquitto_loop_stop().
Issue seen in lib 1.6.12 in threaded mode on Linux - but, by inspecting the code, I would say that this issue affects current master's head, too.
I analyzed the runtime behavior by adding logs to the lib and found out why this is happening.
Please consider this sequence:
1.. network thread is blocked (for some seconds) inside net__socket_connect(blocking=true) - called by mosquitto__reconnect() (which is in turn called by mosquitto_loop_forever()).
This happens e.g. if broker is currently not reachable anymore, so the thread is trying to reconnect - thus net__socket_connect() waits for connection establishment.
2.. at the very same time, application calls mosquitto_disconnect(), which sets client state to mosq_cs_disconnected (which is the condition checked by the network thread to break its loop and exit). A good way to call mosquitto_disconnect() at the right moment to trigger the issue is to add logs before and after the net__socket_connect() (see step above).
3.. then, network thread returns from the net__socket_connect() call, with rc != 0 (broker is still unreachable) - and so client state is reset to mosq_cs_connect_pending:
this irreparably overrides the exit request by API of step 2.
4.. as a consequence, network thread does not exit (state is not mosq_cs_disconnected anymore)
5.. application now calls mosquitto_loop_stop(force=false), which remains stuck forever on the pthread_join() call - thread is not breaking its loop.
As a real world example, steps 2+5 are a very normal application shutdown sequence, which then may cause application hang while exiting with broker currently unreachable.
Personal thought: I think that the tecnique of setting the state from API to make the thread exit is very risky, given the continuos re-set of the state performed by the network thread on itself. Indeed, I suspect that the above sequence may not be the only problematic one.
A different method would be required to be sure that the network thread always exits when requested. Since mosquitto has already a socktpair for application thread to network thread communication, an idea would be to use that to signal the exit request (e.g. by writing a "special value" byte - currently the lib writes always a zero byte value). Then, when reading from mosq->sockpairR in mosquitto_loop() and interruptible_sleep(), the network thread may check the read byte value and signal "exit from API request" condition to mosquitto_loop_forever() with a new specific rc value - to break its loop.
Alternatively, a global "thread exit requested" flag may be set by API and read by the thread, in a mutexed context.
The text was updated successfully, but these errors were encountered: