Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoT web socket connection continuously getting Connection Error #1057

Closed
mistyleaf opened this issue Oct 5, 2018 · 11 comments
Closed

IoT web socket connection continuously getting Connection Error #1057

mistyleaf opened this issue Oct 5, 2018 · 11 comments
Assignees
Labels
iot Issues related to the IoT SDK pending-response Issue is pending response from the issue requestor question General question

Comments

@mistyleaf
Copy link

State your question

We are connecting to IOT using web socket. From time to time (not reproducible at will), we will see the following behavior.

We will successfully establish a connection. But then soon afterwards, we receive a MQTT Connection Error. We keep going in this loop. Sometimes the Connection Error could happen sometimes 5 seconds after connect, or 30 seconds later, or 10 minutes later.

A few questions:

Is there any light you can shed on why this might be happening? (Bad LTE connection? Keep-alive pings failing?)

What happens if keep-alive ping fails? Would this cause a Connection Error?

Is there a way to get better error messages on why Connection Error, or a better way to diagnose the issue?

Which AWS Services are you utilizing?

AWSIoT

Provide code snippets (if applicable)

Environment(please complete the following information):

  • SDK Version: 2.6.28
  • Dependency Manager: Cocoapods
  • Swift Version : 4.0

Device Information (please complete the following information):

  • Device: iPhone X
  • iOS Version: iOS 12
  • Specific to simulators:

If you need help with understanding how to implement something in particular then we suggest that you first look into our developer guide. You can also simplify your process of creating an application, as well as the associated backend setup by using the Amplify CLI.

@scb01 scb01 self-assigned this Oct 5, 2018
@scb01
Copy link
Contributor

scb01 commented Oct 5, 2018

@mistyleaf

sorry to hear that you are running into issues. To help debug this further, I have a few questions

@scb01 scb01 added the iot Issues related to the IoT SDK label Oct 5, 2018
@frankmuellr frankmuellr added the pending-response Issue is pending response from the issue requestor label Oct 5, 2018
@mistyleaf
Copy link
Author

@cbommas

Thanks for the reply!

Once the connection is dropped, I actually call a disconnect. The reason is that upon a disconnect event, I wanted to clear the access key and the session altogether. In the past (before the reconnection issues were fixed in some of your previous SDK releases), it was the only way for me to disable the auto retries and guarantee that we didn’t get into a reconnection loop (this was draining the battery on our app).

if status == .connectionRefused || status == .connectionError || status == .protocolError || status == .unknown {
         self.iotDataManager?.disconnect()
       }
if status == .disconnected {
         // on every disconnect, we clear AWSIotDataManager and always manually connect instead
         if let current = self.currentAccessKey {
           log.verbose("mqtt removing previous access key \(current)")
           AWSIoTDataManager.remove(forKey: current)
         }
         
         if self.shouldBeConnected == true {
           // manually reconnect in 15 seconds
           self.dispatchManualReconnect?.cancel()
           self.dispatchManualReconnect = DispatchWorkItem {
             if self.iotDataManager?.getConnectionStatus() == .disconnected && self.shouldBeConnected == true {
               log.verbose("dispatched mqtt manual reconnect")
               self.connect(userOpal: userOpal)
             }
           }
           DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + DispatchTimeInterval.seconds(15), execute: self.dispatchManualReconnect!)
         }
       }

So with this in mind, upon connection dropped, we successfully disconnect, and then successfully establish a connection. But the issue is that then a connection error happens sometimes 10 seconds, 30 seconds, or 10 minutes later (completely variable), but definitely before the credentials expire. As per logic above, we then successfully disconnect, and establish a connection again. And the loop continues.

My MQTT config is the following. However, with the above that I’ve stated, it won’t ever get put into effect. It would nice if you could add an option to disable auto retries altogether.

let mqttConfiguration = AWSIoTMQTTConfiguration(keepAliveTimeInterval: 60.0, baseReconnectTimeInterval: 5.0, minimumConnectionTimeInterval: 20.0, maximumReconnectTimeInterval: 900.0, runLoop: RunLoop.main, runLoopMode: RunLoopMode.defaultRunLoopMode.rawValue, autoResubscribe: true, lastWillAndTestament: AWSIoTMQTTLastWillAndTestament())

As far as turning on verbose logging and sending you logs, it is hard for me to reproduce. Thus, I’m not able to send you anything helpful at the moment, but I'll see if I can catch it again.

@scb01
Copy link
Contributor

scb01 commented Oct 8, 2018

@mistyleaf

  • There isn't an option currently to disable auto-retries currently. However, you can almost get the same behavior by setting the baseReconnectTimeInterval to a larger number ( say 900.0, which is the same as your maxTimeInterval).
  • When you connect again, can you check if you are reusing the clientID when you make the next connection. Since you are managing the re-connection yourself, I'd recommend that you create a new ID each time.

Please let me know how it goes after you make these changes.

@mistyleaf
Copy link
Author

mistyleaf commented Oct 8, 2018

@cbommas

Yes, I am reconnecting with the same clientID. Could that be causing the issue? Do you know under what conditions it may cause it to happen?

Oh, yeah, setting the baseReconnectTimeInterval to something much larger makes sense. So, if I set the baseReconnectTimeInterval to 900 (same as maxTimeInterval), and then get rid of the clear-out-the-key logic, would I still be able to manage my "manual" reconnect by dispatching another connect at 15 seconds or some time BEFORE the baseReconnectTimeInterval gets reached (and therefore keeping the same clientID)?

Thanks!

@scb01
Copy link
Contributor

scb01 commented Oct 8, 2018

@mistyleaf
Also, here is some info regarding the connection states

  • Unknown: Not really used any where in the SDK
  • Connecting: When a connection is requested, the status will transition to a connecting state. From here it can go to connected, refused or error.
  • refused: The connection has been refused by the server and never got established. This is a terminal state and the SDK will not retry.
  • Connected: This is the connected state. From here it can transition to Disconnected or ProtocolError. If the connection encounters an error, the SDK will automatically retry at a frequency based on the MQTTConfiguration options
  • ProtocolError: A message not conforming to the MQTT protocol was received by the client. The connection will be disconnected and the SDK will not retry. The status will transition to disconnected
  • Disconnected: The connection will transition to this state after the disconnect method has been issued

In your check above, I'd recommend looking for error and ProtocolError. For error state, issue a disconnect and then connect. For ProtocolError, simply connect.

@scb01
Copy link
Contributor

scb01 commented Oct 8, 2018

@mistyleaf
What I have observed is that when you connect with the same clientID on more than one connection, one of the connections is dropped by the server ( I am not sure if it is the new connection or the old connection).

So if it is at all possible, change the clientID each time you connect. If that doesn't work for you, then you should still be good when you set the baseReconnectTimeInterval to a larger number.

@scb01
Copy link
Contributor

scb01 commented Oct 9, 2018

One quick addition - you should also look for the refused status and connect (without needing to disconnect)

@mistyleaf
Copy link
Author

@cbommas

Thanks for the additional info on the connection states! That should be really helpful in fine-tuning the reconnection logic.

Also, will try what you suggested and either change the clientID, or set the baseReconnectTimeInterval higher so I can still control retries but not have to clear out the key.

One other question…is there any way to get more info on reasons for Connection Error? Trying to enable logging verbose didn’t seem to show much other info regarding this.

Thanks!

@frankmuellr frankmuellr added the question General question label Oct 15, 2018
@scb01
Copy link
Contributor

scb01 commented Oct 18, 2018

@mistyleaf
I haven't been to able to get further details on getting diagnostic information on Connection Error.
I will continue to look and will post back here when I have more info.

Just curious - did my suggestions on this thread help address the re-connection issues that you were facing? Please let me know.

@scb01
Copy link
Contributor

scb01 commented Oct 27, 2018

@mistyleaf

It looks like you are good to go . I will go ahead and close this out - please feel free to reopen if you are still facing issues.

@eduprat-chwy
Copy link

In my case I was running two devices at the same time with the same clientID, so one was disconnecting the other in each reconnection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iot Issues related to the IoT SDK pending-response Issue is pending response from the issue requestor question General question
Projects
None yet
Development

No branches or pull requests

4 participants