Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoT connection issues on packet loss. #1002

Closed
steelzeh opened this issue Jul 23, 2018 · 9 comments
Closed

IoT connection issues on packet loss. #1002

steelzeh opened this issue Jul 23, 2018 · 9 comments
Assignees
Labels
iot Issues related to the IoT SDK pending-community-response Issue is pending response from the issue requestor

Comments

@steelzeh
Copy link

steelzeh commented Jul 23, 2018

  • What service are you using?
    IoT

  • In what version of SDK are you facing the problem?
    2.6.25

  • Is the issue limited to Simulators / Actual Devices?
    Only tested on actual device

  • Can your problem be resolved if you bump to a higher version of SDK?
    No

  • Is this problem related to specific iOS version?
    No

  • How are you consuming the SDK? CocoaPods / Carthage / Prebuilt frameworks?
    Carthage

  • Can you give us steps to reproduce with a minimal, complete, and verifiable example? Please include any specific network conditions that might be required to reproduce the problem.

After the connection to a socket has been established, if there is a packet loss for an extended period of time the MQTT session is closed without notifying the user, this results in a disconnected socket where the status of the socket is still connected.

  1. Start the IoTSampleSwift app and connect to socket on both devices
  2. Enable packet loss via Network Link Conditioner under iOS Settings -> Developer
  3. Wait for around 2-3 minutes with packet loss.
  4. Suddenly this is being logged to console:

screen shot 2018-07-23 at 13 01 00

MQTTSessionEventConnectionClosed: MQTT session closed.

If you try and fetch .getConnectionStatus now it will return 2 = connected.
Nothing is notifying the user, the connection just gets terminated.

In another case the SDK never realises that the connection is lost, and when you disable the Network Link Conditioner no data comes through the socket.

So sometimes the sdk continues to say ClockTickXXX for an unknown time and the stream is never closed, other times after an extended period of time the stream is closed and the reconnectTimer is never initialised because in initiateReconnectTimer it checks if the mqttStatus != connected, but mqttStatus is never set to anything other than connected in MQTTSessionEventConnectionClosed siwtch case unless userDidIssueDisconnect is true.

@cbommas

@mutablealligator mutablealligator added the iot Issues related to the IoT SDK label Jul 23, 2018
@scb01 scb01 self-assigned this Jul 23, 2018
@scb01
Copy link
Contributor

scb01 commented Jul 23, 2018

@steelzeh

Thank you for reporting this issue. I will investigate and get back to you.
Couple of questions

  1. In the first case, where the connection error is being reported, are you seeing the connection being retried and established again?
  2. Can you share what you have setup in the MQTT Configuration for retry and keep alive.

@steelzeh
Copy link
Author

steelzeh commented Jul 23, 2018

@cbommas

After further investigation i have discovered that the fist issue happens because default KeepAliveInterval is set to 300, so it can take up to 5 minutes before the stream is closed, this can be avoided if keepAlive is reduced to detect a packet loss earlier. So this was basically an issue of me being impatient and the SDK not noticing that there is packet loss, so i have reduced the keepAlive in our app to 30 for it to catch the packet loss earlier. If the connection is restored within that time it reconnects automatically otherwise the stream is being terminated. Don't know if there is a better way or if the SDK can do a ping to check if there is a connection.

The second issue happens when the keepAlive runs out and it closes the stream but mqttStatus is still connected in case MQTTSessionEventConnectionClosed:

But i have fixed it by adding these 2 lines in the switch case.

self.mqttStatus = AWSIoTMQTTStatusConnectionError;
[self notifyConnectionStatus];

so the switch case ends up looking like this:

case MQTTSessionEventConnectionClosed:
            AWSDDLogInfo(@"MQTTSessionEventConnectionClosed: MQTT session closed.");
            
            self.connectionAgeInSeconds = 0;
            if (self.connectionAgeTimer != nil ) {
                [self.connectionAgeTimer invalidate];
                self.connectionAgeTimer = nil;
            }
             
            self.mqttStatus = AWSIoTMQTTStatusConnectionError;
            [self notifyConnectionStatus];

            //Check if user issued a disconnect
            if (self.userDidIssueDisconnect ) {
                //Clear all session state here.
                [self.topicListeners removeAllObjects];
                self.mqttStatus = AWSIoTMQTTStatusDisconnected;
                [self notifyConnectionStatus];
            }
            else {
                //Connection was closed unexpectedly. Retry.
                self.reconnectThread = [[NSThread alloc] initWithTarget:self selector:@selector(initiateReconnectTimer:) object:nil];
                [self.reconnectThread start];
            }
            break;

if those 2 lines are missing then initiateReconnectTimer wont run because mqttStatus is still connected, i don't know if this breaks something else without having further knowledge of the SDK.

@aat2703
Copy link

aat2703 commented Jul 24, 2018

@cbommas any news

@scb01
Copy link
Contributor

scb01 commented Jul 24, 2018

@steelzeh @aat2703

I am looking into this and if all works out, will target this to be included in the next rev of the SDK.
Will keep you guys posted.

@scb01
Copy link
Contributor

scb01 commented Jul 24, 2018

@steelzeh @aat2703

Thank you @steelzeh for your analysis. I was able to confirm this behavior and test out the fix. This is an edgecase of the fix we made for #965. The fix will be included in the next rev of the SDK (the timing is currently TBD, I will post back here once I have a timeline).

I made a small change to your suggested fix. The Switch now looks like this

    case AWSMQTTSessionEventConnectionClosed:
        AWSDDLogInfo(@"MQTTSessionEventConnectionClosed: MQTT session closed.");
        
        self.connectionAgeInSeconds = 0;
        if (self.connectionAgeTimer != nil ) {
            [self.connectionAgeTimer invalidate];
            self.connectionAgeTimer = nil;
        }
            
        //Check if user issued a disconnect
        if (self.userDidIssueDisconnect ) {
            //Clear all session state here.
            [self.topicListeners removeAllObjects];
            self.mqttStatus = AWSIoTMQTTStatusDisconnected;
            [self notifyConnectionStatus];
        }
        else {
            //Connection was closed unexpectedly.

            //Notify
            self.mqttStatus = AWSIoTMQTTStatusConnectionError;
            [self notifyConnectionStatus];

            //Retry
            self.reconnectThread = [[NSThread alloc] initWithTarget:self selector:@selector(initiateReconnectTimer:) object:nil];
            [self.reconnectThread start];
        }
        break;
    case AWSMQTTSessionEventConnectionError:
        AWSDDLogError(@"MQTTSessionEventConnectionError: Received an MQTT session connection error");
        
        self.connectionAgeInSeconds = 0;
        if (self.connectionAgeTimer != nil ) {
            [self.connectionAgeTimer invalidate];
            self.connectionAgeTimer = nil;
        }
        if (self.userDidIssueDisconnect ) {
            //Clear all session state here.
            [self.topicListeners removeAllObjects];
            self.mqttStatus = AWSIoTMQTTStatusDisconnected;
            [self notifyConnectionStatus];
        }
        else {
            //Connection errored out unexpectedly.

            //Notify
            self.mqttStatus = AWSIoTMQTTStatusConnectionError;
            [self notifyConnectionStatus];

            //Retry
            self.reconnectThread = [[NSThread alloc] initWithTarget:self selector:@selector(initiateReconnectTimer:) object:nil];
            [self.reconnectThread start];
        }
        break;

It will be great if you can give it a go on your side and let me know if you run into issues.

@steelzeh
Copy link
Author

Alright thank you @cbommas

@scb01
Copy link
Contributor

scb01 commented Aug 3, 2018

@steelzeh, @aat2703

The latest rev of the SDK (2.6.26) contains this fix. Please try it out and let us know how it goes.

@scb01 scb01 added the pending-community-response Issue is pending response from the issue requestor label Aug 3, 2018
@steelzeh
Copy link
Author

steelzeh commented Aug 4, 2018

@cbommas we were running our own forked version as we didn't want to wait 10 days for the update to be rolled out.

But it seems to have fixed our issues.

@scb01
Copy link
Contributor

scb01 commented Aug 4, 2018

@steelzeh

Thank you for confirming. I will go ahead and close this issue. Please reopen if you encounter further issues.

@scb01 scb01 closed this as completed Aug 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iot Issues related to the IoT SDK pending-community-response Issue is pending response from the issue requestor
Projects
None yet
Development

No branches or pull requests

4 participants