New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ATC/worker flags to limit max build containers for workers #2928

Open
mhuangpivotal opened this Issue Dec 10, 2018 · 7 comments

Comments

6 participants
@mhuangpivotal
Copy link
Contributor

mhuangpivotal commented Dec 10, 2018

Note: we want to refactor runtime code in #2926 before implementing this.

With the least-build-containers container placement strategy implemented in #2577, we also want a way to limit the max number of build containers on the workers.

The proposal in #2577 wants to add --max-tasks-per-worker to ATC:

if max-tasks-per-worker is set:

  • If the current worker has less than max-tasks-per-worker tasks running, then dispatch to it the new task (and directly or indirectly update the priority queue).
  • If on the other hand the current worker has >= than max-tasks-per-worker tasks running, then since the list is a priority queue, all other workers will be in a similar situation, no need to keep traversing the list. Do not dispatch the runnable task and wait for the next event to wake up the scheduler.

If max-tasks-per-worker is not set:

  • Take the current worker from the priority queue and dispatch the runnable task to it. This is still better than random placement because it dispatches to the matching worker will less running tasks.

This option requires least-build-containers to be set, otherwise it will error.

Additionally,

Add option max-tasks-for-worker to concourse worker and modify the scheduling algorithm on the ATC to allow to take care of differences between workers, while max-tasks-per-worker on the ATC would still be a maximum cap.

We may want to change the flag names to say build-containers instead of tasks, since the least-build-containers strategy includes get, task and put containers.

@cirocosta

This comment has been minimized.

Copy link
Member

cirocosta commented Dec 11, 2018

Hey @mhuangpivotal ,

Do you think this is something that could be advertised by the worker at registration time?

E.g., you could have a worker "here" that sets the default max-containers for garden (250), but another "there" that sets "10", in which case by having a per-worker setting, atc would schedule containers based on those values.

Wdyt?

thx!

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Dec 17, 2018

Should I close PR #2707 in favour of this? I prefer this approach overall, especially since I don't have to write it.

I would still set a default value below the default Garden limit (250), since there will be a lagging response to detecting hitting a capacity limit.

@vito vito added the triage label Jan 9, 2019

@ddadlani

This comment has been minimized.

Copy link
Contributor

ddadlani commented Feb 11, 2019

@marco-m what exactly is the use case for having a varying maximum for build containers on each worker? We don't currently set container maximums, those are enforced directly by Garden. Would the fix in #3251 along with the fewest-build-containers strategy not satisfy your use case?

@marco-m

This comment has been minimized.

Copy link

marco-m commented Feb 12, 2019

@ddadlani here is my understanding:

The fix in #3251, along with fewest-build-containers, would allow to spread evenly the task containers on the available workers. This would make fewest-build-containers usable in prod for us and would be a great improvement :-)

On the other hand, I think that this ticket, that stems from #2577, is about controlling the load. More in details: it would allow to control the max number of task containers on a given worker. For example: as an operator, if I know that more than, say, 2 task containers kill my workers, then I can set max-tasks-per-worker to 2. If the total number of runnable tasks is more than number_of_workers * max-tasks-per-worker, then the Concourse scheduler will not dispatch any task and wait for the next scheduler run / next event. This would provide a rough queue.

If it is possible to obtain the equivalent behavior with Garden (after the fix for #3251), even better!
But to be acceptable by the end users of a CI system (the developers), the following should happen:
if I set a limit on Garden, and the limit is hit, Concourse should skip and retry, it should not mark a given build failed due to the Garden limit. if the build is marked failed, developers will get even more frustrated.

Does it makes sense ?

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Feb 12, 2019

If the total number of runnable tasks is more than number_of_workers * max-tasks-per-worker, then the Concourse scheduler will not dispatch any task and wait for the next scheduler run / next event. This would provide a rough queue.

I submitted a PR for the same idea, expressed with the inverse: a global max-in-flight. It was deliberately simple to enable quick adoption.

But to be acceptable by the end users of a CI system (the developers), the following should happen:
if I set a limit on Garden, and the limit is hit, Concourse should skip and retry, it should not mark a given build failed due to the Garden limit. if the build is marked failed, developers will get even more frustrated.

I strongly agree. Because Concourse does maintain a safe work buffer, it becomes necessary as a safety measure to retain a capacity buffer. The CF buildpacks team, for example, retains enough workers to handle the possibility of around 40 pipelines operating simultaneously. But this is not the common case, so average utilisation is very low.

Similarly, this behaves badly in disaster-recovery scenarios. It's not uncommon for many pipelines to fire up simultaneously when a Concourse is restored or rebuilt from scratch. This is doubly problematic because the ATC will begin loading workers as soon as they begin to report, leading to a flapping restart when workers are added progressively (as BOSH does). In DR scenarios I have found that it becomes necessary to manually disable all pipelines and then unpause them one by one, waiting for each added load to stabilise before proceeding.

I shouldn't have to do this.

@cirocosta

This comment has been minimized.

Copy link
Member

cirocosta commented Feb 12, 2019

Hey @marco-m,

It sounds to me a lot similar to #2577 with the idea of having an extra constraint in scheduling in the sense that a task would "reserve" a container from the number of "available containers" that one can reserve from the "pool" that the worker has (like #2577 (comment), but instead of CPU / RAM, containers).

With a task "reserving" 1 container, when scheduling the task, the scheduler could take that into account and if it finds a worker that can accomodate the container:1 reservation, then it could subtract that reservation from available_containers of that worker (thus, reserving the resource) until it's running there; otherwise, leave it as pending.

By keeping track of how much work is being reserved and how much capacity we have, one could have a better metric for autoscaling too, not needing to keep a large buffer of resources as @jchesterpivotal expressed.

As mentioned in the comment in #2577, this could scale to other resources too, not only number of containers, but cpu & mem too.

Does that make sense?

@marco-m

This comment has been minimized.

Copy link

marco-m commented Feb 12, 2019

hello @cirocosta, yes, I agree 100%.

This task is the same as #2577 as you mention. My understanding is that this task, #2928, has been created by @mhuangpivotal to track a specific activity, while #2577 has been considered more a "discussion" ticket.

As you mention, then

this could scale to other resources too, not only number of containers, but cpu & mem too.

Exactly!

@topherbullock topherbullock moved this from Icebox to Backlog in Runtime Feb 12, 2019

@topherbullock topherbullock moved this from Backlog to Research 🤔 in Runtime Feb 12, 2019

@topherbullock topherbullock moved this from Research 🤔 to Backlog in Runtime Feb 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment