Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminating 30 sec delay when M2M acknowledgment is interrupted by disconnection (#6836) #6857

Merged
merged 2 commits into from
Jan 19, 2023

Conversation

vipeller
Copy link
Contributor

@vipeller vipeller commented Jan 18, 2023

This fix is related to IcM 333616549:

  • when sending higher amounts of M2M messages via MQTT to a module, it can happen (within a few hours) that reauthentication (following a token expiration) happens when a M2M message got forwarded to the module, but the ACK was not sent back.
  • when a M2M message is being sent, there is a timeout of 30 sec (configurable) for the ACK
  • when a connection is closed the ACK cannot be sent back
  • when edge is waiting for an ACK, its message pump (to that given route/module) is blocked

The problem of the customer is that time-to-time the M2M messages get delayed for 30seconds (then it catches up with the messages accumulated during that 30seconds period). They drive some sort of dashboard and this behavior is not acceptable.

The root of the problem was that they use MQTT protocol. Around token expiration/reauthentication their message pump got stuck by the reason listed above - waiting for ACK for 30 sec when the connection was closed (the reopened in a very short time)

This fix adds a new Task to the exit-condition when the code is waiting for the ACK. So far, the two conditions were:

  • the ACK is received
  • a timeout occurs

Now the third condition is that when the device handler object gets closed (for any reason, e.g. because of the reauthentication)

Tested by:

  • temporarily modified edge to execute token expiration every 15 second
  • modified edge to fail every token check, causing the module to be disconnected every 15 seconds

Used two clients doing M2M, 1 msg/sec, the receiver client holds back the ACK for 800ms (to increase the odds to run into a disconnection)

Kept the code running for ~30 mins, the delay was not experienced anymore. Also, double-checked by temporarily added logs that the device handler was interrupted by the newly added task.

(cherry picked from commit 7a0051a)

Azure IoT Edge PR checklist:

This checklist is used to make sure that common guidelines for a pull request are followed.

General Guidelines and Best Practices

  • I have read the contribution guidelines.
  • Title of the pull request is clear and informative.
  • Description of the pull request includes a concise summary of the enhancement or bug fix.

Testing Guidelines

  • Pull request includes test coverage for the included changes.
  • Description of the pull request includes
    • concise summary of tests added/modified
    • local testing done.

…sconnection (Azure#6836)

This fix is related to IcM 333616549:

- when sending higher amounts of M2M messages via MQTT to a module, it can happen (within a few hours) that reauthentication (following a token expiration) happens when a M2M message got forwarded to the module, but the ACK was not sent back.
- when a M2M message is being sent, there is a timeout of 30 sec (configurable) for the ACK
- when a connection is closed the ACK cannot be sent back
- when edge is waiting for an ACK, its message pump (to that given route/module) is blocked

The problem of the customer is that time-to-time the M2M messages get delayed for 30seconds (then it catches up with the messages accumulated during that 30seconds period). They drive some sort of dashboard and this behavior is not acceptable.

The root of the problem was that they use MQTT protocol. Around token expiration/reauthentication their message pump got stuck by the reason listed above - waiting for ACK for 30 sec when the connection was closed (the reopened in a very short time)

This fix adds a new Task to the exit-condition when the code is waiting for the ACK. So far, the two conditions were:
- the ACK is received
- a timeout occurs

Now the third condition is that when the device handler object gets closed (for any reason, e.g. because of the reauthentication)

Tested by:
- temporarily modified edge to execute token expiration every 15 second
- modified edge to fail every token check, causing the module to be disconnected every 15 seconds

Used two clients doing M2M, 1 msg/sec, the receiver client holds back the ACK for 800ms (to increase the odds to run into a disconnection)

Kept the code running for ~30 mins, the delay was not experienced anymore. Also, double-checked by temporarily added logs that the device handler was interrupted by the newly added task.

(cherry picked from commit 7a0051a)
@kodiakhq kodiakhq bot merged commit e32cfce into Azure:release/1.4 Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants