Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT client mistakenly doesn't send PINGREQs if it has received messages from the broker #1277

Closed
lrasku opened this issue Oct 14, 2022 · 3 comments
Milestone

Comments

@lrasku
Copy link

lrasku commented Oct 14, 2022

Following the upgrade to 1.3.11 our MQTT client programs started sporadically timing out. Looking at the MQTT traces, the problem seems to be that they don't send PINGREQs when expected:

1.3.9:

20221014 163930.108 5 DeviceManagementHandler -> PINGREQ (0)
20221014 163930.611 5 DeviceManagementHandler <- PINGRESP
20221014 164010.646 5 DeviceManagementHandler <- PUBLISH msgid: 0 qos: 0 retained: 0 payload len(21): LTE 2 -100.000000 -1
20221014 164030.166 5 DeviceManagementHandler -> PINGREQ (0)
20221014 164030.669 5 DeviceManagementHandler <- PINGRESP

A new PINGREQ is sent one minute after the previous one, as expected.

1.3.11:

20221014 172953.380 5 DeviceManagementHandler -> PUBLISH msgid: 71 qos: 1 retained: 0 rc 0 payload len(806): {
    "versionMajor"
20221014 172953.399 5 DeviceManagementHandler <- PUBACK msgid: 71
20221014 173049.450 5 DeviceManagementHandler <- PUBLISH msgid: 0 qos: 0 retained: 0 payload len(19): LTE 3 -88.000000 -7
20221014 173127.488 5 DeviceManagementHandler -> DISCONNECT (0)

Over a minute passes after the first outbound PUBLISH without a PINGREQ being sent, and so we disconnect.

I dug into the code and found this change: 0465673#diff-c2fc1b46b2f0d41497e77ebac887254227809403266c24de15ff8ad51c3144a8L715

-		else if (MQTTTime_difftime(now, client->net.lastSent) >= (DIFF_TIME_TYPE)(client->keepAliveInterval * 1000) ||
-					MQTTTime_difftime(now, client->net.lastReceived) >= (DIFF_TIME_TYPE)(client->keepAliveInterval * 1000))
+		else if (MQTTTime_difftime(now, client->net.lastSent) >= (DIFF_TIME_TYPE)(client->keepAliveInterval * 1000) &&
+				 MQTTTime_difftime(now, client->net.lastReceived) >= (DIFF_TIME_TYPE)(client->keepAliveInterval * 1000))

This seems wrong to me: a new PINGREQ should (at least) be sent whenever the send interval elapses, because that's what the broker expects. If both the send and receive intervals are required to be elapsed, PINGREQ sending can be delayed by incoming messages (which appears to be happening in the above trace), which the broker doesn't expect. And indeed, reverting this change seems to fix the problem in our builds. Is this diagnosis correct?

@praxx
Copy link

praxx commented Oct 30, 2022

Ran into this same issue - commenting out the check for lastReceived also fixes the issue for me. I'm using mosquitto as a broker, and it's forcefully disconnecting the client with "Client xxxx has exceeded timeout, disconnecting.".

It's unclear to me if mosquitto is doing the wrong thing (by not resetting the timeout after sending published messages), or if it's this change (which would be correct, if published messages received by the client should reset the timeouts).

@maripo31
Copy link

Hello,
Is it a new version sheduled to correct this point ?

icraggs added a commit that referenced this issue Nov 27, 2022
@icraggs icraggs added this to the 1.3.12 milestone Nov 27, 2022
@icraggs
Copy link
Contributor

icraggs commented Nov 27, 2022

I've added a fix to the develop branch which aims to keep the fixes for large packet sending and receiving, if you would like to try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants