Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pingresp issues (linked from Telegraf) #430

Closed
tkerby opened this issue Jun 15, 2020 · 4 comments
Closed

Pingresp issues (linked from Telegraf) #430

tkerby opened this issue Jun 15, 2020 · 4 comments

Comments

@tkerby
Copy link

tkerby commented Jun 15, 2020

I've filed this issue with the Telegraf project but also wanted to include here as they use the Paho client. When connecting to a mosquito server, I'm finding that I constantly get pingresp not received disconnections. It looks like the issue is some sort of incompatibility between server and client and only occurs when data is actively being published from an IoT client

Full details here including network traffic analysis: influxdata/telegraf#7648

Any insights appreciated

@MattBrittan
Copy link
Contributor

Unfortunately its difficult to comment without the MQTT logs (the network info helps but does not let us know what is happening internally) - these can be enabled with something like :

mqtt.ERROR = log.New(os.Stdout, "[ERROR] ", 0)
mqtt.CRITICAL = log.New(os.Stdout, "[CRIT] ", 0)
mqtt.WARN = log.New(os.Stdout, "[WARN]  ", 0)
mqtt.DEBUG = log.New(os.Stdout, "[DEBUG] ", 0)

Having said that I suspect that the issue may to be in the code that receives messages - if this blocks then what you are seeing is expected. The default for this library is to pass messages in order and this means that the handler is called within the go routine that handles comms; the upshot being that if the message handler blocks then the library will not process the next pingresp and will eventually disconnect. I would suggest logging the entry/exit from recvMessage and confirming that this is not blocking. Note: I have not looked at the telegraph code in any detail.

Note: The last release of this library (1.2.0) is over a year old and there have been some significant changes to the code since then (including some to address a few potential deadlocks). Hopefully a new release is not far off but in the meantime seeing if the issue is resolved in @master may worthwhile.

@GopherJ
Copy link

GopherJ commented Sep 25, 2020

This is becoming urgent for me, any idea? @tkerby @MattBrittan I'm preparing logs to @MattBrittan but I need to run it for a long time to be able to reproduce it.

@MattBrittan
Copy link
Contributor

This relates to v1.2.0 (released 19 Apr 2019) and v1.3.0 is now out (I have noted this on the telegraph issue). The delay between releases is unfortunate as it means there are a lot of changes between 1.2 and 1.3 including a major rewrite of the networking code. This means that it's unlikely that any effort will be dedicated to attempting to find/fix an issue with code that is now almost two years old (and probably already addressed).

If anyone wants to test telegraph with with v1.3.0 then I'd be interested to hear the result (but as I don't use telegraph it's not something I'll be doing). If there are no updates on this in a month or so I'll close this issue off.

@MattBrittan
Copy link
Contributor

Closing this because Telegraf has been upgraded to v1.3.0 (applying a significant number of fixes/changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants