Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve supervisor restart calculation #8261

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Maria-12648430
Copy link
Contributor

@Maria-12648430 Maria-12648430 commented Mar 13, 2024

Before restarting a child, a supervisor must check if the restart limit is reached. This adds a penalty to the overall restart time, which should be kept low.

The current implementation does this check by traversing the list of restarts in order to filter out those that have expired. Then it essentially traverses the result list via length in order to check if it is over the intensity limit. This behavior is 2*O(n) (?), with n being the number of past restarts within the period.

This PR introduces two optimizations:

  • it checks whether the restart limit is reached while it is traversing the restart list in order to remove expired restarts, thereby eliminating the need for an additional traversal via the call to length. Depending on the outcome, a restart is either allowed or disallowed. This behavior is O(n).

  • it sidesteps the need to perform the step above by keeping a separate counter for restarts; as long as that counter is below the intensity value, it is safe to allow the restart, add the restart to the list, and increment the counter. This behavior is O(1).

    Only when the counter reaches the intensity limit, the actual number of restarts within the given period must be calculated via the step above; if the restart is allowed, the restart list is updated and the counter set to the according value.

    (Over time, this may lead to a large list of accumulated expired restarts being carried around. For this reason, the counter is limited not by the intensity value alone but rather by the minimum of the intensity value and a hardcoded limit. By gut feeling, I picked 1000)

@Maria-12648430 Maria-12648430 marked this pull request as ready for review March 13, 2024 16:03
Copy link
Contributor

github-actions bot commented Mar 13, 2024

CT Test Results

    2 files     94 suites   34m 30s ⏱️
2 042 tests 1 994 ✅ 48 💤 0 ❌
2 351 runs  2 301 ✅ 50 💤 0 ❌

Results for commit 6b31837.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@rickard-green rickard-green added the team:PS Assigned to OTP team PS label Mar 18, 2024
@IngelaAndin IngelaAndin added the stalled waiting for input by the Erlang/OTP team label Mar 20, 2024
@IngelaAndin IngelaAndin self-assigned this Mar 20, 2024
@IngelaAndin IngelaAndin added this to the OTP-28.0 milestone Mar 20, 2024
@IngelaAndin
Copy link
Contributor

New optimizations are way to dangerous to include so close to the release, you never know what timing bugs that might be revealed, so we are postponing this for OTP-28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled waiting for input by the Erlang/OTP team team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants