Improve supervisor restart calculation #8261
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before restarting a child, a supervisor must check if the restart limit is reached. This adds a penalty to the overall restart time, which should be kept low.
The current implementation does this check by traversing the list of restarts in order to filter out those that have expired. Then it essentially traverses the result list via
length
in order to check if it is over the intensity limit. This behavior is2*O(n)
(?), withn
being the number of past restarts within the period.This PR introduces two optimizations:
it checks whether the restart limit is reached while it is traversing the restart list in order to remove expired restarts, thereby eliminating the need for an additional traversal via the call to
length
. Depending on the outcome, a restart is either allowed or disallowed. This behavior isO(n)
.it sidesteps the need to perform the step above by keeping a separate counter for restarts; as long as that counter is below the intensity value, it is safe to allow the restart, add the restart to the list, and increment the counter. This behavior is
O(1)
.Only when the counter reaches the intensity limit, the actual number of restarts within the given period must be calculated via the step above; if the restart is allowed, the restart list is updated and the counter set to the according value.
(Over time, this may lead to a large list of accumulated expired restarts being carried around. For this reason, the counter is limited not by the intensity value alone but rather by the minimum of the intensity value and a hardcoded limit. By gut feeling, I picked 1000)