New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SubscribeToken never completes if WebSocket gets closed #555
Comments
Thanks for logging this and providing plenty of detail! I'm not too surprised to see this issue; its probably something that is hidden from most users (as common practice is to subscribe from the On Connect callback). I'll take a look at it within the next week or so. |
It looks like this issue is due to a combination of things (inducing issues around dealing with the large number of options the library offers); when the connection is lost we conditionally clean-up the message IDs (releasing tokens)here:
The above does not get called if we are automatically reconnecting because the session will be resumed and we don't want any conflicting message IDs allocated in the meantime. The attempt to resend happens in
So I suspect that this issue will not happen if One challenge is going to be testing this; we use Mosquitto for testing and I cannot think of a way to reliably simulate this situation. I'd note that the way this all works is a bit confusing (after reconnect all existing tokens are terminated but the messages are still sent out). While I'd like to simplify this I think it's likely to break existing usages (it's been like this for years) so suspect that just closing out the tokens is all that's really needed. |
Improve handling of SUBSCRIBE/UNSUBSCRIBE over disconnect/reconnect. If the ResumeSubs option is not set then SUBSCRIBE/UNSUBSCRIBE packets should not be stored and any tokens should be completed when the connection is lost. This should resolve #555 but I cannot think of a way of testing this.
Thinking about this further while the above is all true it is not the root cause. Subscription packets are not part of the session and the only reason this library adds them to the store is due to the non-standard I have merged PR #557 into @master; can you please see if this resolves the issue for you. I cannot think off a way of testing this with Mosquitto so currently it's fixed 'in theory'! |
@MattBrittan thank you for explaining the root cause and implementing a fix so quickly. I tested the fix this morning and my integration test is now passing 🎉 . The call to I have another separate issue, but I wondered if you could clarify the expected behaviour before I treat this as a new issue. I have another integration test that attempts to publish a message to a non-existing topic. The expectation is that after the call to Would you expect that publishing to a non-existing or unauthorised topic would return an error? The AWS MQTT broker logs this publish attempt with If you think this is a bug, I will raise a separate issue. Thanks again 😄 |
These are AWS specific things; the MQTT spec does not really offer support for this kind of thing (a topic is valid if it meets the requirements in section 4) and:
Unfortunately if it just closes the connection the client has no way to tie that into the message. That means that if the message was QOS1+ then it will be resent when the connection comes back up (and an infinite loop will result). Unfortunately I don't think there is anything we can do about that... |
Closing the connection is the signal I was expecting to receive. In |
That's quite possible - v 1.1 dates back to 2017 and much of the client has been rewritten (and heaps of options added). This change is probably the cause. If you have |
I was recently upgrading our service dependencies and upgraded this module from
v.1.1.1
->v1.3.5
and found that one of our integration tests started failing. The scenario essentially creates a client and connects to a broker. It then attempts to subscribe to a topic that does not exist. This results in the following client error being received.[ERROR] [client] Connect comms goroutine - error triggered websocket: close 1005 (no status)
. This error is read from theibound
channel after the out bound prioritypackets.SubscribePacket
is written to the connection.The main issue here is that the call to
token.Wait()
blocks forever. I tried to dig through and understand what's happening and I think the issue is that when this error gets read from theibound
channel (net.go) is gets passed to another channel and eventually read off of thecommsErrors
channel (client.go) but never gets propagated to the original SubscribeToken, thus meaning thebaseToken.complete
channel is never populated or closed.I found that if I propagated the connection error by adding
c.index[c.lastIssuedID].setError(err)
belowERROR.Println(CLI, "Connect comms goroutine - error triggered", err)
, the call totoken.Wait()
did return. I doubt this is a proper fix, but perhaps help in some way?The broker used in our integration testing is an AWS IoT Core broker, and the mqtt client is configured like so:
I found that if I upgraded from
v1.1.1
->v1.2.0
I encounter the same issue, but with a different error message. I believe various changes sincev1.2.0
have changed the error message/handling, but I only dug throughv1.3.5
as we'd like to use the latest version.I've attached the stdout logs with debug enabled.
stdout_log.txt
The AWS Broker logs the following message when attempting to subscribe to the
non-existing-topic
:The text was updated successfully, but these errors were encountered: