Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set max dispatch workers to same as max forks #11800

Merged
merged 1 commit into from
Feb 24, 2022

Conversation

kdelee
Copy link
Member

@kdelee kdelee commented Feb 23, 2022

Right now, without this, we end up with a different number for max_workers than max_forks. For example, on a control node with 16 Gi of RAM,
max_mem_capacity w/ 100 MB/fork = (16*1024)/100 --> 164
max_workers = 5 * 16 --> 80

This means we would allow that control node to control up to 164 jobs, but all jobs after the 80th job will be stuck in waiting waiting for a dispatch worker to free up to run the job.

SUMMARY

Have max_workers == max forks based on memory capacity to prevent situations where we start jobs because we believe there is enough capacity to control the job, but there are not enough dispatch workers available to actually do so. In cases where a user decides to use the "capacity adjustment" or otherwise limit how many jobs a control node can control, we

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
  • API
AWX VERSION

ADDITIONAL INFORMATION

@jainnikhil30 noticed jobs hanging out in waiting for a long time when he was running many concurrent jobs, @AlanCoding helped identify how the max number of dispatch workers factors into that.

Right now, without this, we end up with a different number for max_workers than max_forks. For example, on a control node with 16 Gi of RAM,
  max_mem_capacity  w/ 100 MB/fork = (16*1024)/100 --> 164
  max_workers = 5 * 16 --> 80

This means we would allow that control node to control up to 164 jobs, but all jobs after the 80th job will be stuck in `waiting` waiting for a dispatch worker to free up to run the job.
@kdelee kdelee force-pushed the max_workers_same_as_max_mem_forks branch from 3fc5327 to e1be483 Compare February 23, 2022 16:18
@jainnikhil30
Copy link
Contributor

+1 tested the patch. It works. Before the patch with 16gb node only 80 jobs were running and rest all went into waiting. But with this patch I am able to run 110 jobs which is close to the max capacity.

Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit worried about increasing the dispatcher workers in general, because this puts us closer to... well... failure. But that should be handled by our general coefficients.

@kdelee kdelee merged commit 4bd6c2a into devel Feb 24, 2022
@kdelee kdelee deleted the max_workers_same_as_max_mem_forks branch February 24, 2022 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants