Skip to content
This repository has been archived by the owner on Jul 22, 2022. It is now read-only.

Gracefully reconnect after sas-url expires #60

Closed
Resultfactory opened this issue Aug 8, 2017 · 10 comments
Closed

Gracefully reconnect after sas-url expires #60

Resultfactory opened this issue Aug 8, 2017 · 10 comments

Comments

@Resultfactory
Copy link

Resultfactory commented Aug 8, 2017

We noticed the SAS_TOKEN_DEFAULT_LIFETIME set to an hour. This causes the system to be down for 4 minutes every hour times SAS_REFRESH_MULTIPLIER 0.8.

-> 10:55:08 DISCONNECT	69129107                // is 19.2 hours = 24 * 0.8
-> 10:55:08 CONNECT	69129377
timed out waiting for CONNACK	69361077
-> 10:59:09 DISCONNECT	69370097
-> 10:59:10 CONNECT	69371437
<- 10:59:10 CONNACK	69371457                // 4 minutes later

DEFAULT_MQTT_KEEPALIVE is set to 4 * 60
SAS_TOKEN_DEFAULT_LIFETIME is set to 3600 * 24
SAS_REFRESH_MULTIPLIER 0.8

We changed the lifetime to 24 hours, but we would like to gracefully refresh the sas token and reconnect.
Preferably without the 4 minutes downtime, caused by the DEFAULT_MQTT_KEEPALIVE?
Any suggestions?

@mamokarz
Copy link
Contributor

mamokarz commented Aug 9, 2017

Please Roy, can you investigate.

@ewertons ewertons self-assigned this Aug 29, 2017
@bgshankars
Copy link

Is there any way to override the default sas token default lifetime, what would I need to do, incase of reducing the lifetime to 30s.

@Resultfactory
Copy link
Author

@bgshankars You can change the defines: SAS_TOKEN_DEFAULT_LIFETIME or DEFAULT_SAS_TOKEN_EXPIRY_TIME_SECS
We are not sure which is used, probably the define in the transport you are using.

@Resultfactory
Copy link
Author

We did some additional digging, and it seems the refreshing of the Sas-url is not the problem. When a disconnect is done after the token lifetime expires, the socket or tls does not function correctly. The connection seems valid but the tls handshake completes too quickly. A connect message is sent over the seemingly valid connection and generates a timeout after 4 minutes. The connect after the timed out connection does succeed.

Log:

D (29626) tlsio_mbedtls AzureIoTHubTask: Handshake completed in 1050, on_underlying_io_open_complete
D (29626) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_send
-> 11:28:51 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 192 | USERNAME: XXX.azure-devices.net/XXX/api-version=2016-11-14&DeviceClientType=iothubclient%2f1.1.21%20(esp32) | PWD: XXXX | CLEAN: 0
D (29656) tlsio_mbedtls AzureIoTHubTask: leaving tlsio_mbedtls_open with 0
<- 11:28:51 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0


<- 11:32:49 PUBACK | PACKET_ID: 8
D (318596) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_send
D (318596) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_close
D (318596) socketio_lwip AzureIoTHubTask: Socket Closed in socketio_close
-> 11:33:40 DISCONNECT
D (318706) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_open
D (318766) socketio_lwip AzureIoTHubTask: Opened socketio_open:484
D (318776) tlsio_mbedtls AzureIoTHubTask: on_underlying_io_open_complete
D (318776) tlsio_mbedtls AzureIoTHubTask: on_underlying_io_open_complete
D (318776) tlsio_mbedtls AzureIoTHubTask: Handshake completed in 0, on_underlying_io_open_complete
D (318786) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_send
-> 11:33:40 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 192 | USERNAME: XXX.azure-devices.net/XXX/api-version=2016-11-14&DeviceClientType=iothubclient%2f1.1.21%20(esp32) | PWD: XXXX | CLEAN: 0
D (318826) tlsio_mbedtls AzureIoTHubTask: leaving tlsio_mbedtls_open with 0
I (328256) Statistics: [05-10-17 11:33:49Z] Uptime: 000.00:05:00	IPv4: 10.0.2.252	Free Heap: 66084	Temperature: 67
I (388256) Statistics: [05-10-17 11:34:49Z] Uptime: 000.00:06:00	IPv4: 10.0.2.252	Free Heap: 65828	Temperature: 66
I (448256) Statistics: [05-10-17 11:35:49Z] Uptime: 000.00:07:00	IPv4: 10.0.2.252	Free Heap: 65772	Temperature: 66
I (508256) Statistics: [05-10-17 11:36:49Z] Uptime: 000.00:08:00	IPv4: 10.0.2.252	Free Heap: 65520	Temperature: 66
Error: Time:Thu Oct  5 11:37:41 2017 File:C:/IoT/BEASTv3/components/azure_iot/iothubtransport_mqtt_common.c Func:InitializeConnection Line:1583 mqtt_client timed out waiting for CONNACK
D (559636) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_send
D (559646) tlsio_mbedtls AzureIoTHubTask: tlsio_mbedtls_close
D (559646) socketio_lwip AzureIoTHubTask: Socket Closed in socketio_close
-> 11:37:41 DISCONNECT

@tameraw
Copy link
Contributor

tameraw commented Oct 25, 2017

@JetstreamRoySprowl - Any update on this one?

@ewertons
Copy link
Contributor

Hi @Resultfactory ,

the SAS token refresh logic design did not intend to cause any long down time.
For MQTT, the disconnection and reconnection using a new SAS token is necessary due to the protocol constraints.
The four-minute timeout you are seeing should not be related to the SAS token refresh logic.
It does seem that the connection itself had some problem, and the reconnection packet timedout being sent, that is probably what caused the issue.

To assist better, could you:

  1. Elaborate more why you believe the SAS token logic might be influencing the down time (so we make sure we understand how you got to that conclusion, and to ensure we don't miss your point);
  2. Re-run your experiment and repro the issue, so we can check the service-side logs why the timeout happened (please send us the iothub fqdn and device ID as well)

Thanks,
Azure IoT Team

@Resultfactory
Copy link
Author

@ewertons in our last post we explain why we do not think the sas logic is at fault anymore.
We confirmed that the sas-url is generated again at that time, but we still get the 4 minute downtime.

Our issue of not being able to reconnect only occurs every 19 hours for 4 minutes.

Our scenario:

  1. Startup. We set up the ssl connection, send a CONNECT message and receive a CONNACK.
  2. After ~19 hours we receive a DISCONNECT and the ssl and underlying socket are closed. A new ssl and socket are created. The ssl handshake completes in 0ms, a CONNECT message is sent.
  3. 4 minutes later the connection is timed out.
  4. a new ssl and socket are set up, handshake completes(in more than 0ms), a CONNECT message is sent and CONNACK received.

Is there a email address we can send the iothub fqdn and deviceid to?

@Resultfactory
Copy link
Author

Resultfactory commented Nov 17, 2017

Note We don't have the 4 minute connection timeout anymore since the update. Our xio still does get destroyed and recreated, is this the expected behavior?

@JetstreamRoySprowl
Copy link

The SDK may be getting upset by this issue. I hope to have a fix for that problem in the next Arduino release.

@JetstreamRoySprowl
Copy link

The 1.0.44 Azure IoT libraries now have the fix for the retrieveoptions problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants