Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(azure-iot-mqtt-base): forceReconnect for disconnecting hang in mqtt-base #770

Merged
merged 10 commits into from
Apr 24, 2020

Conversation

YoDaMa
Copy link
Contributor

@YoDaMa YoDaMa commented Apr 24, 2020

Currently there is a bug where if the mqtt transport goes to update it's SAS token, aka calls a reconnect on the mqtt-base layer, while the internet is down (but TLS layer has not notified MQTT.js, and so MQTT.js has not notified mqtt-base), then mqtt-base will be hung waiting for a non-forced call to disconnect on MQTT.js. The call will be hung because the non-forced call will wait for the TLS layer to respond but the TLS socket is not aware the ethernet cable has been yanked.

When the SDK enters this corner-case, it will be indefinitely hung, since mqtt-base removes all listeners, so once the keepalive on the MQTT.js sends a ping down to the TLS layer and receives a error, it will send an error to the error handler in mqtt-base, but that error handler is indefinitely suspended and will not be listening for that error, thus will not pass that error up to the user.

Here is a snippet of some logs demonstrating this:

...
...
...
2020-04-23T20:50:00.558Z azure-iot-mqtt-base:MqttBase connected -> reconnecting (connected.updateSharedAccessSignature)
2020-04-23T20:50:00.558Z azure-iot-mqtt-base:MqttBase disconnecting mqtt client
2020-04-23T20:50:05.626Z azure-iot-mqtt-base:MqttBase mqtt client disconnected - reconnecting
2020-04-23T20:50:05.626Z azure-iot-mqtt-base:MqttBase username: .........
2020-04-23T20:50:05.626Z azure-iot-mqtt-base:MqttBase uri:      wss://..........
2020-04-23T20:50:06.275Z azure-iot-mqtt-base:MqttBase Device is connected
2020-04-23T20:50:06.275Z azure-iot-mqtt-base:MqttBase CONNACK: {"cmd":"connack", .........
2020-04-23T20:50:06.275Z azure-iot-mqtt-base:MqttBase mqtt client reconnected successfully
2020-04-23T20:50:06.276Z azure-iot-mqtt-base:MqttBase reconnecting -> connected ()
====================================================
[App] Disconnect Ethernet cable in 5 seconds.
[App] Time : 2020-04-23T20:51:01Z
====================================================
[App] Send Message : 0
[App] Time : 2020-04-23T20:51:06Z
2020-04-23T20:51:06.555Z azure-iot-common:RetryOperation Operation start time ......
2020-04-23T20:51:06.557Z azure-iot-device-mqtt:Mqtt sendEvent {"data":" ...........
2020-04-23T20:51:30.582Z azure-iot-mqtt-base:MqttBase connected -> reconnecting (connected.updateSharedAccessSignature)
2020-04-23T20:51:30.583Z azure-iot-mqtt-base:MqttBase disconnecting mqtt client
====================================================
[App] Connect Ethernet cable in 30 seconds.
[App] Time : 2020-04-23T20:58:06Z
====================================================
[App] Send Message : 1
[App] Time : 2020-04-23T20:58:36Z
2020-04-23T20:58:36.640Z azure-iot-common:RetryOperation Operation start time .........
2020-04-23T20:58:36.641Z azure-iot-device-mqtt:Mqtt sendEvent {"data":" ...........
...
...
...

To fix this corner case, we add a timeout into the mqtt-base reconnecting step, so that after 30 seconds if there is no response from MQTT.js we send a forced end, and abandon the initial attempt for a non-forced end. This is guaranteed to return even if there are issues with the TLS layer since it does not wait for the messages in the queue to be sent.

This will need to be combined with updated functionality to MQTT.js because at present if the MQTT.js client is disconnecting, then it will not invoke the callback...

There is a side effect that any messages presently in the queue will be dropped. To fix this a more significant change to the transport will need to be made to track all messages and ensure that if a message has not be ACKed then on reconnect it should be in a queue to be sent.

Copy link
Contributor

@anthonyvercolano anthonyvercolano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@YoDaMa YoDaMa merged commit 853fa05 into Azure:master Apr 24, 2020
@YoDaMa YoDaMa deleted the mqttbase_changes branch April 24, 2020 17:21
jebrando added a commit that referenced this pull request Jul 20, 2020
* fix(azure-iothub): fix signatue for getTwin, updateTwin, updateModuleTwin to use Twin

fix #673

* improvement(provisioning samples): use the results of the registration to create and open device

* test(azure-iot-device-mqtt): reformat _mqtt_test.js (#741)

* (chore) add issue templates workflow

* Delete ISSUE_TEMPLATE.md

* Update bug-report.md

* fix(azure-iot-mqtt-base): forceReconnect for disconnecting hang in mqtt-base (#770)

* release(2020-04-24): bump package versions (#771)

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* refactor: update to mqtt.js v4 (#772)

* build: update to mqtt.js v4

* refactor: add debug logs

* fix: eslint issue

* chore: remove network_e2e directory - no longer used (#775)

* refactor: expose connection error in mqtt (#776)

* release(2020-05-07): bump package versions (#788)

* release(2020-05-07): bump package versions

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* refactor(multipe packages): c2d states no longer using booleans (#797)

This moves the logic of checking if C2D is enabled down to the transport level, since it is not necessarily accurate to represent it at the device client level. There is not enough information at that level to make totally accurate judgements of if it is connected or not.

* chore(azure-iot-device-amqp): debug logs for c2d (#804)

* build: update to mqtt.js v4

* chore: add debug logs

* chore(azure-iot-device): updating api version to support twin arrays (#806)

* improvement(azure-iot-mqtt-base): maintain knowledge of on the wire publishes (#808)

By tracking the on the wire publishes we can reliably complete in case of disconnects or other
errors.

* feat(azure-iot-mqtt-device): modelID option (#809)

This adds the setOption for ModelID, which will enable users to use the existing device client library as a PnP library. To accomodate the lack of support in the existing service API, a switching API call has been added that will use the preview API version if the modelID is set.

* release(2020-05-28): bump package versions (#810)

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* Create synctodevops.yml

* chore(azure-iot-provisioning-device): update samples to use env var (#816)

* Add CodeQL security scanning (#815)

Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>

* improvement(azure-iot-mqtt-base): add timeouts to publishes that have… (#818)

* improvement(azure-iot-mqtt-base): add timeouts to publishes that have not been PUBACKed by the service

744

* update classes and unique identifiers.

* chore: create config.yml for issue templates (#823)

* chore: create config.yml for issue templates

* chore: update technical question for msft q&a

* fix: add vanity link for IoT help

* chore: update technical question options

* fix: add https://

* fix: add https://

* fix: add https://

* fix: add https://

Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>

* feat(azure-iot-device): adds emit to connect (#819)

'connect' is emitted whenever the device transport FSM enters the "connected" state for MQTT, and "authenticated" for AMQP.

* fix(azure-iot-device): lint error on simple sample (#826)

* fix(azure-iot-device): lint error on simple sample

* fix(azure-iot-device): add sample linting to the CI

* refactor(multiple): update to typescript 3.7.5 move to dist instead of lib parameter checking and suppression (#824)

* chore: updating docs to clarify edge support for linux only

* refactor(multiple): update to typescript 3.7.5 move to dist etc (#830)

* chore: node pnp sample for summer

* chore: update prov sample to use env variables (#831)

* chore: node pnp sample for summer

* refactor(multiple): update to typescript 3.7.5 move to dist vs lib other minor (#833)

* chore: node pnp sample for summer

* chore: rename digital-twin-model-id to model-id in mqtt (#829)

* chore: node pnp sample for summer

* chore: node pnp sample for summer

* chore: initial addition of pnp simple thermostat sample (#827)

* chore: node pnp sample for summer

* release(2020-07-06): bump package versions (#838)

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* chore: fixing the sample for feedback (#839)

* chore: remove writable property in reported (#842)

* chore: some changes

* chore: some changes

* chore: some changes

* Update pnpTemperatureController.js

* Updated file upload sample to reflect current SDK changes. (#834)

* fix(sample): update sample to use current SDK changes

* fix(sample): update to use latest SDK changes, add use strict

* fix(sample): update sample to use current SDK changes, fix linting issue

* Update upload_to_blob_advanced.js

* Update upload_to_blob_advanced.js

Co-authored-by: Chandler Lattin <chlattin@microsoft.com>
Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@microsoft.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@gmail.com>

* chore: align environment variable names

* Update pnpTemperatureController.js

* chore: removing all the lib folders in repo

Co-authored-by: Anthony Ercolano <toercola@microsoft.com>
Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@microsoft.com>
Co-authored-by: Elena Horton <52430760+elhorton@users.noreply.github.com>
Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>
Co-authored-by: Justin Hutchings <jhutchings1@users.noreply.github.com>
Co-authored-by: olivakar <oliva.kar@microsoft.com>
Co-authored-by: olivakar <oliva.tanusree@gmail.com>
Co-authored-by: Chandler Lattin <chandlerlattin@knights.ucf.edu>
Co-authored-by: Chandler Lattin <chlattin@microsoft.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@gmail.com>
anthonyvercolano added a commit that referenced this pull request Jul 27, 2020
* fix(azure-iothub): fix signatue for getTwin, updateTwin, updateModuleTwin to use Twin

fix #673

* improvement(provisioning samples): use the results of the registration to create and open device

* test(azure-iot-device-mqtt): reformat _mqtt_test.js (#741)

* (chore) add issue templates workflow

* Delete ISSUE_TEMPLATE.md

* Update bug-report.md

* fix(azure-iot-mqtt-base): forceReconnect for disconnecting hang in mqtt-base (#770)

* release(2020-04-24): bump package versions (#771)

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* refactor: update to mqtt.js v4 (#772)

* build: update to mqtt.js v4

* refactor: add debug logs

* fix: eslint issue

* chore: remove network_e2e directory - no longer used (#775)

* refactor: expose connection error in mqtt (#776)

* release(2020-05-07): bump package versions (#788)

* release(2020-05-07): bump package versions

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* refactor(multipe packages): c2d states no longer using booleans (#797)

This moves the logic of checking if C2D is enabled down to the transport level, since it is not necessarily accurate to represent it at the device client level. There is not enough information at that level to make totally accurate judgements of if it is connected or not.

* chore(azure-iot-device-amqp): debug logs for c2d (#804)

* build: update to mqtt.js v4

* chore: add debug logs

* chore(azure-iot-device): updating api version to support twin arrays (#806)

* improvement(azure-iot-mqtt-base): maintain knowledge of on the wire publishes (#808)

By tracking the on the wire publishes we can reliably complete in case of disconnects or other
errors.

* feat(azure-iot-mqtt-device): modelID option (#809)

This adds the setOption for ModelID, which will enable users to use the existing device client library as a PnP library. To accomodate the lack of support in the existing service API, a switching API call has been added that will use the preview API version if the modelID is set.

* release(2020-05-28): bump package versions (#810)

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* Create synctodevops.yml

* chore(azure-iot-provisioning-device): update samples to use env var (#816)

* Add CodeQL security scanning (#815)

Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>

* improvement(azure-iot-mqtt-base): add timeouts to publishes that have… (#818)

* improvement(azure-iot-mqtt-base): add timeouts to publishes that have not been PUBACKed by the service

744

* update classes and unique identifiers.

* chore: create config.yml for issue templates (#823)

* chore: create config.yml for issue templates

* chore: update technical question for msft q&a

* fix: add vanity link for IoT help

* chore: update technical question options

* fix: add https://

* fix: add https://

* fix: add https://

* fix: add https://

Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>

* feat(azure-iot-device): adds emit to connect (#819)

'connect' is emitted whenever the device transport FSM enters the "connected" state for MQTT, and "authenticated" for AMQP.

* fix(azure-iot-device): lint error on simple sample (#826)

* fix(azure-iot-device): lint error on simple sample

* fix(azure-iot-device): add sample linting to the CI

* refactor(multiple): update to typescript 3.7.5 move to dist instead of lib parameter checking and suppression (#824)

* chore: updating docs to clarify edge support for linux only

* refactor(multiple): update to typescript 3.7.5 move to dist etc (#830)

* chore: node pnp sample for summer

* chore: update prov sample to use env variables (#831)

* chore: node pnp sample for summer

* refactor(multiple): update to typescript 3.7.5 move to dist vs lib other minor (#833)

* chore: node pnp sample for summer

* chore: rename digital-twin-model-id to model-id in mqtt (#829)

* chore: node pnp sample for summer

* chore: node pnp sample for summer

* chore: initial addition of pnp simple thermostat sample (#827)

* chore: node pnp sample for summer

* release(2020-07-06): bump package versions (#838)

Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>

* chore: fixing the sample for feedback (#839)

* chore: remove writable property in reported (#842)

* chore: some changes

* chore: some changes

* chore: some changes

* Update pnpTemperatureController.js

* Updated file upload sample to reflect current SDK changes. (#834)

* fix(sample): update sample to use current SDK changes

* fix(sample): update to use latest SDK changes, add use strict

* fix(sample): update sample to use current SDK changes, fix linting issue

* Update upload_to_blob_advanced.js

* Update upload_to_blob_advanced.js

Co-authored-by: Chandler Lattin <chlattin@microsoft.com>
Co-authored-by: Anthony V. Ercolano <anthonyvercolano@users.noreply.github.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@microsoft.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@gmail.com>

* chore: align environment variable names

* Update pnpTemperatureController.js

* chore: removing all the lib folders in repo

* chore: removing a few more lib folders

* chore: updating twin calls to the pnp api-version

* chore: adding modelId object to twin

* chore: fixing registry sample

* fix(azure-iothub): update vesion and support modelId service side

Updated to -pnp-refresh.4. Fixed the sample to print out the modelId.  Updated various dependents to
the latest branch.

Co-authored-by: Yoseph Maguire <yoseph.maguire@microsoft.com>
Co-authored-by: Elena Horton <52430760+elhorton@users.noreply.github.com>
Co-authored-by: Azure IoT Client Build <aziotclb@microsoft.com>
Co-authored-by: Jelani Brandon <jelani.brandon@microsoft.com>
Co-authored-by: Justin Hutchings <jhutchings1@users.noreply.github.com>
Co-authored-by: olivakar <oliva.kar@microsoft.com>
Co-authored-by: olivakar <oliva.tanusree@gmail.com>
Co-authored-by: Chandler Lattin <chandlerlattin@knights.ucf.edu>
Co-authored-by: Chandler Lattin <chlattin@microsoft.com>
Co-authored-by: Yoseph Maguire <yoseph.maguire@gmail.com>
Co-authored-by: Jelani Brandon <jelanibrandon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants