Fix: Delay frequent twin pulls on reconnect #5188
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(cherry picked from commit 6dd7275)
Agent pulls twin on reconnect, so desired property changes will be learned if those happened while the device was disconnected. In certain circumstances it can happen that a device keeps connecting/disconnecting. One particular case is when two edge devices use the same identity and when one device connects, the other gets disconnected. In turn the disconnected device try to connect back and this makes the first device get disconnected. This fight over the connection can result in several twin pulls per seconds.
This solution delays subsequent twin pulls if those happen within a certain time window. The throttling applies only on connection, so if for some reason iothub sends frequent desired property changes, those are not throttled.
The solution below has a flaw: if the delay is e.g. 30 seconds (which is now and it is hardcoded), then if there was a pull 29 seconds ago and a new one comes in, that will be delayed by 30 seconds, giving an 59 second window. I was considering calculating the windows size dynamically, e.g. removing the time elapsed from the previous twin pull giving a more steady 30 sec delay windows, however weighting the frequency of the occurrence this protection is needed and the additional complexity of the code, I chose the simpler code.