Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Current implementation has multiple flaws.
At the moment first HostDown event happens after a number of specified tries, and if the host is still down, the next HostDown event will happen after the same number of tries. So if
time_wait
is 10s, andfailure_trigger_sample_size
2, it means 20 seconds between events. At the same time we set redis expiration key for "host down" key, totime_wait
+ 1, which means key will be expired 9 seconds before second HostDown event. This PR fix it by setting expiration time totime_wait
*failure_trigger_sample_size
Another issue is how host considered to be up. At the moment on first succesful attempt we just enabled the host, however if upstream is unstable, having single healthy attempt does not mean that it is recovered. This change makes Host UP logic work exactly like Host Down and now consider number of tries.
Fix #2036