Bridge connection enters a connect-disconnect loop when incomplete QoS 2 publish, and local broker fails to persist for any reason. #57

Closed
ralight opened this Issue Mar 15, 2016 · 2 comments

Projects

None yet

2 participants

@ralight
Contributor
ralight commented Mar 15, 2016

migrated from Bugzilla #467304
status UNCONFIRMED severity normal in component Mosquitto for 1.4
Reported in version 1.4 on platform PC
Assigned to: Roger Light

On 2015-05-14 02:46:43 -0400, jsaak jsaak wrote:

Bridge connection enters a connect-disconnect loop when incomplete QoS 2 publish, and local broker fails to persist for any reason.

Scenario:

  1. local mosq publishes QoS2 to remote mosq
  2. local mosq dies (fails to persist)
  3. local mosq restarts with "clean_session false"
  4. local mosq reestabilishes bridge connection to remote mosq
  5. remote mosq replies with PUBREC
  6. local mosq does not find the corresponding message in the DB, gives error
  7. local mosq disconnect bridge connection
  8. goto 4.

My proposed solution is that change 6.
If it does not find the mid in the DB reply anyway, with a WARNING.

--- a/mosquitto/lib/read_handle_shared.c
+++ b/mosquitto/lib/read_handle_shared.c
@@ -103,6 +103,10 @@ int _mosquitto_handle_pubrec(struct mosquitto *mosq)
_mosquitto_log_printf(NULL, MOSQ_LOG_DEBUG, "Received PUBREC from %s (Mid: %d)", mosq->id, mid);

    rc = mqtt3_db_message_update(mosq, mid, mosq_md_out, mosq_ms_wait_for_pubcomp);
  • if (rc) {
  • rc = 0;
  • _mosquitto_log_printf(NULL, MOSQ_LOG_WARNING, "Received PUBREC is not in the DB, replying anyway");
  • }
    #else
    _mosquitto_log_printf(mosq, MOSQ_LOG_DEBUG, "Client %s received PUBREC (Mid: %d)", mosq->id, mid);
@hmvp
hmvp commented May 19, 2016

We see the same in our production environment with the latest version.

Unfortunately this seems endemic in MQTT implementations: see also eclipse/paho.mqtt.java#27 for the same bug in the java client

Under the right circumstances this happens for most/all acknowledgements so PUBCOMP and PUBACK can probably trigger the same behaviour (Possibly SUBACKs as well)

@ralight ralight added a commit that referenced this issue May 19, 2016
@ralight ralight [57] Handle PUB* with unknown message id gracefully.
Allows message flow to complete where e.g. the broker didn't persist a
partially complete flow.

Thanks to jsaak jsaak and Hiram van Paassen.

Bug: #57
a187b3f
@ralight
Contributor
ralight commented May 19, 2016

Thanks for the nudge, I believe this is now fixed.

@ralight ralight closed this May 19, 2016
@ralight ralight added this to the 1.4.9 milestone May 19, 2016
@ralight ralight added a commit that referenced this issue May 19, 2016
@ralight ralight [57] File missed from previous commit.
Bug: #57
afc2c99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment