Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Exceeded Disconnect when timeout has not been exceeded #2124

Closed
ImagineZA opened this issue Mar 9, 2021 · 5 comments
Closed

Timeout Exceeded Disconnect when timeout has not been exceeded #2124

ImagineZA opened this issue Mar 9, 2021 · 5 comments

Comments

@ImagineZA
Copy link

What could cause a timeout exceeded disconnect, that isn't actually a timeout exceeded disconnect?

I have a device that acts as the client to my Mosquitto Broker.
I have no control over the client MQTT settings except for:
-How often it publishes
-The Topic it Publishes
-How often it will send a PINGREQ

I currently have it sending a PINGREQ every 15s and it publishes every minute, it also publishes if any of the alarms on the device are triggered.

Funny thing is, I get a Client mqttClient has exceeded timeout, disconnecting. error when one of the alarms set's itself back to normal. It can't be a timeout error as the PINGREQ is very aggressive and keeps that connection alive;

So I'm asking if there is anything else, that could be triggering the error and forcing the device to disconnect?

@ralight
Copy link
Contributor

ralight commented Mar 9, 2021

The only reason you will get that message is if the last time the broker received a message is more than 1.5 times the keepalive value set by the client. That's it. You say you can change how often your client sends a PINGREQ - is that actually the keepalive value you are changing? If so, then setting a lower value actually makes it easier for your client to time out. 15*1.5 = 22 seconds though, which is still a long time.

Do you have any broker logs that could confirm/deny what is happening? A log with log_type debug set would show times of all the messages sent and received which would allow us to see what is happening. I realise this may not be straightforward to sort out depending on what else you have running.

@ImagineZA
Copy link
Author

ImagineZA commented Mar 10, 2021

I figured that would be the only reason I would get a timeout error, that's why I'm so confused.
The PINGREQ setting is literally how often it will send a ping/heartbeat.
In the logs here you can see that a PINGREQ is sent every 15 seconds between two publishes:

2021-03-09T22:31:13: Received PUBLISH from 860688042114364 (d0, q1, r0, m503, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (403 bytes))
2021-03-09T22:31:13: Sending PUBACK to 860688042114364 (m503, rc0)
2021-03-09T22:31:13: Sending PUBLISH to auto-DB502FE0-E8BC-8973-B882-5DEF82910344 (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (403 bytes))
2021-03-09T22:31:13: Received PUBLISH from 860688042114364 (d0, q1, r0, m504, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (99 bytes))
2021-03-09T22:31:13: Sending PUBACK to 860688042114364 (m504, rc0)
2021-03-09T22:31:13: Sending PUBLISH to auto-DB502FE0-E8BC-8973-B882-5DEF82910344 (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (99 bytes))
2021-03-09T22:31:25: Received PINGREQ from 860688042114364
2021-03-09T22:31:25: Sending PINGRESP to 860688042114364
2021-03-09T22:31:36: Received PINGREQ from 860688042114364
2021-03-09T22:31:36: Sending PINGRESP to 860688042114364
2021-03-09T22:31:47: Received PINGREQ from 860688042114364
2021-03-09T22:31:47: Sending PINGRESP to 860688042114364
2021-03-09T22:31:58: Received PINGREQ from 860688042114364
2021-03-09T22:31:58: Sending PINGRESP to 860688042114364
2021-03-09T22:32:10: Received PINGREQ from 860688042114364
2021-03-09T22:32:10: Sending PINGRESP to 860688042114364
2021-03-09T22:32:14: Received PUBLISH from 860688042114364 (d0, q1, r0, m505, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (403 bytes))
2021-03-09T22:32:14: Sending PUBACK to 860688042114364 (m505, rc0)
2021-03-09T22:32:14: Sending PUBLISH to auto-DB502FE0-E8BC-8973-B882-5DEF82910344 (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (403 bytes))
2021-03-09T22:32:15: Received PUBLISH from 860688042114364 (d0, q1, r0, m506, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (99 bytes))
2021-03-09T22:32:15: Sending PUBACK to 860688042114364 (m506, rc0)

I let it run overnight to see if anything funny happened;
A: Here is a disconnect and reconnect from the client but the message counts continued, is that normal?

2021-03-09T22:10:13: Received PUBLISH from 860688042114364 (d0, q1, r0, m484, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (83 bytes))
2021-03-09T22:10:13: Sending PUBACK to 860688042114364 (m484, rc0)
2021-03-09T22:10:13: Sending PUBLISH to auto-DB502FE0-E8BC-8973-B882-5DEF82910344 (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (83 bytes))
2021-03-09T22:10:29: Client 860688042114364 has exceeded timeout, disconnecting.
2021-03-09T22:16:42: Received PINGREQ from auto-DB502FE0-E8BC-8973-B882-5DEF82910344
2021-03-09T22:16:42: Sending PINGRESP to auto-DB502FE0-E8BC-8973-B882-5DEF82910344
2021-03-09T22:23:42: New connection from 41.13.246.164 on port 8833.
2021-03-09T22:23:42: New client connected from 41.13.246.164 as 860688042114364 (p2, c1, k10, u'iCE_Test_CPT').
2021-03-09T22:23:42: No will message specified.
2021-03-09T22:23:42: Sending CONNACK to 860688042114364 (0, 0)
2021-03-09T22:23:43: Received SUBSCRIBE from 860688042114364
2021-03-09T22:23:43: 	860688042114364/+ (QoS 0)
2021-03-09T22:23:43: 860688042114364 0 860688042114364/+
2021-03-09T22:23:43: Sending SUBACK to 860688042114364
2021-03-09T22:23:44: Received PUBLISH from 860688042114364 (d0, q1, r0, m487, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (383 bytes))
2021-03-09T22:23:44: Sending PUBACK to 860688042114364 (m487, rc0)

B: Here was my testing earlier in the day where I trigger one of the alarms on the device:

2021-03-09T14:46:30: Received PUBLISH from iCE_Test_CPT (d0, q1, r0, m42, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (403 bytes))
2021-03-09T14:46:30: Sending PUBACK to iCE_Test_CPT (m42, rc0)
2021-03-09T14:46:30: Sending PUBLISH to auto-BE7C5852-F235-F0B2-B806-0029AAAAC74B (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (403 bytes))
2021-03-09T14:46:31: Received PUBLISH from iCE_Test_CPT (d0, q1, r0, m43, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (99 bytes))
2021-03-09T14:46:31: Sending PUBACK to iCE_Test_CPT (m43, rc0)
2021-03-09T14:46:31: Sending PUBLISH to auto-BE7C5852-F235-F0B2-B806-0029AAAAC74B (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (99 bytes))
2021-03-09T14:46:53: Received PINGREQ from iCE_Test_CPT
2021-03-09T14:46:53: Sending PINGRESP to iCE_Test_CPT
2021-03-09T14:47:00: Received PUBLISH from iCE_Test_CPT (d0, q1, r0, m44, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (81 bytes))
2021-03-09T14:47:00: Sending PUBACK to iCE_Test_CPT (m44, rc0)
2021-03-09T14:47:00: Sending PUBLISH to auto-BE7C5852-F235-F0B2-B806-0029AAAAC74B (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (81 bytes))
2021-03-09T14:47:01: Received PUBLISH from iCE_Test_CPT (d0, q1, r0, m45, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (81 bytes))
2021-03-09T14:47:01: Sending PUBACK to iCE_Test_CPT (m45, rc0)
2021-03-09T14:47:01: Sending PUBLISH to auto-BE7C5852-F235-F0B2-B806-0029AAAAC74B (d0, q0, r0, m0, 'device/Sanlam/MQTT_Room_MQTT000000PBOK01', ... (81 bytes))
2021-03-09T14:47:32: Client iCE_Test_CPT has exceeded timeout, disconnecting.

As you can see, the device "timed out" after about 30 seconds (this is when I triggered the alarm), but in the other logs you can see the device seems to run, relatively, ok without disconnecting from a timeout

To clarify: iCE_Test_CPT and 860688042114364 are the same device, I suspected a Client ID conflict issue so I changed the Client ID to rule that out

@ImagineZA
Copy link
Author

OK so I found out what the issue is:

The Device is sending data via SIM card, when the alarm is triggered the device also sends out an SMS alert.
This causing the Data signal to drop while the SIM card utilizes the SMS channel, causing the disconnect.

Weird that it comes up as a timeout on the logs though

@ralight
Copy link
Contributor

ralight commented Mar 10, 2021

2021-03-09T22:23:42: New client connected from 41.13.246.164 as 860688042114364 (p2, c1, k10, u'iCE_Test_CPT').

This is your client connecting with a keepalive value of 10 (k10). This means you are promising to communicate with the broker at least once every ten seconds. The broker gives you a grace period of keepalive*0.5, so in total you must communicate every 15 seconds or be disconnected. What you are doing is sending a PINGREQ every 15 seconds. In the best case, there are no delays and your PINGREQ comes in just before the broker disconnects the client. When your client does something else, that PINGREQ is ever so slightly delayed, and so the timeout is triggered.

In other words, you need to change your client settings so that the keepalive and PINGREQ interval match. That could be increasing the keepalive to 15, decreasing the PINGREQ interval to 10, or setting them both to something else entirely.

If you can only change the PINGREQ interval and the client itself calculates the keepalive from that, or vice versa, then you'll have to go and complain to wherever you got the client from.

@ImagineZA
Copy link
Author

Ahhhhh!
Thanks for explaining that, I was always checking the variables on the message side because that's where the disconnect was coming from, completely blew past the variables on the connect.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants