Check containers accumulate on a single worker with the 'fewest-build-containers' #3251

wagdav · 2019-02-08T15:06:14Z

Bug Report

We updated to Concourse 5.0 RC54 we see an accumulation of check containers. The containers are placed to single workers until they reach 256 and pipelines start to fail with the error message:

insufficient subnets remaining in the pool

Following a proposed workaround in #847 we increased the number of available subnets by providing the --network-pool=172.16.0.0/23 argument to Garden. This made the number of containers reach as high as ~350 on a single worker.

After a certain amount of time new error started to appear:

* iptables: create-instance-chains: iptables: Memory allocation problem.

The number of containers on a single worker went up to ~600. This error message was also reported in #3127 .

Steps to Reproduce

Enable the "fewest-build-container" strategy and set pipelines with a lot of resource checks. The number of build tasks or generated load by the task do not seem to directly influence the issue.

The number of pipelines required to trigger this issue depends on the available Linux workers. We have ~100 pipelines and about ~20 Linux workers.

Expected Results

Check containers distributed such that they don't kill workers.

Actual Results

Check containers gravitate toward a specific worker until all its resources are exhausted.

Possible explanation

The new "fewest-build-containers" strategy only applies to task containers, as expected. It seems that check containers fall back to Concourse's default behavior "volume locality", so they accumulate without limits on one or two workers.

A possible solution would be to use the random placement strategy for check containers. We've seen this working well 4.x using the random placement strategy. The check containers were evenly distributed among all the workers.

Additional Context

This graph shows the number of containers for all of our workers as a function of time. We can see three distinct time periods:

until 8:30 we have Concourse 4.2.1 with the random container placement strategy
between 8:30 and 12:00 we have Concourse 5.0 RC54 with fewest-build-containers strategy. Garden limits the number of containers at 256 and "insufficient subnets remaining in the pool" error messages are generated
from 12:00 we have still Concourse 5.0 but we allow more subnets by providing --network-pool=172.16.0.0/23. This way even more containers are placed on the worker and eventually it runs out of memory.

We can also see that the number of containers increase and decrease almost instantaneously.

Impact

Currently we are out of ideas on how to control this behavior and keep the system operational. We would be happy to provide more information that could help fixing this issue.

Version Info

Concourse version: 5.0 RC 54
Deployment type (BOSH/Docker/binary): binary
Infrastructure/IaaS: AWS
Browser (if applicable):
Did this used to work? possibly not

The text was updated successfully, but these errors were encountered:

kcmannem · 2019-02-08T15:16:35Z

Thanks for the detailed report, we've noticed this at times as well.
The number of active containers are reported by each worker on a heartbeat. The information doesn't arrive just-in-time when we make a decision on where to place containers (build or check). However, the atc does know the real world view of how many containers should be in the system in the database. We should consider better balancing based on db container (in any state) counts. We've had some discussion about better usage of check containers in #3079.

marco-m · 2019-02-08T20:32:29Z

Hello @kcmannem, although I am not familiar with that part of the code, I don't think that the fact that the ATC doesn't know in real time the number of containers of a worker is related to this problem. My reasoning is the following: it takes tens of minutes to go from 200 to 600 containers, so I would assume that a rough approximation would be enough for the ATC not to dispatch containers to a given worker. Also, if the scheduling (only for check containers) were random (as we propose in the ticket), then knowing or not the number of the containers of each worker would not be needed. Anything would be better that what we are currently seeing :-)

kcmannem · 2019-02-08T21:25:01Z

I was able to repo the insufficient subnets remaining in the pool message from garden. Guardian is supposed to block us from continuing the request and surfacing this error as shown:
https://github.com/cloudfoundry/guardian/blob/master/gardener/gardener.go#L191-L193
But that check is done by reading the dirs listed in the garden depotdir. When sending create requests in parallel, theres a race where some containers might not have been written to depot but were already created by runc.
Here's the script to repo:

import (
	"fmt"
	"os"

	"code.cloudfoundry.org/garden"
	"code.cloudfoundry.org/garden/client"
	"code.cloudfoundry.org/garden/client/connection"
)

func main() {
	gardenClient := client.New(connection.New("tcp", "127.0.0.1:7777"))
	for {
		go func() {
			container, err := gardenClient.Create(garden.ContainerSpec{RootFSPath: "raw:///worker-state/volumes/live/2b1ade31-5e37-4c4a-5720-efd371a0f964/volume"})
			if err != nil {
				fmt.Println(err)
				os.Exit(1)
			}
			fmt.Println(container)
		}()
	}
}

checking in on the depot dir we can see there's more than 250 limit.

root@797ab6de2967:/src# ls -l /worker-state/depot/ | wc -l
257

also verified through the api

~/w/concourse (issue/3052) $ ~/Downloads/gaol_darwin list | wc -l
     256

trying to create one now returns:

~/w/concourse (issue/3052) $ ~/Downloads/gaol_darwin create -r "raw:///worker-state/volumes/live/2b1ade31-5e37-4c4a-5720-efd371a0f964/volume"
error: insufficient subnets remaining in the pool

if the check built in guardian was working as intended, we should have been seeing:
https://github.com/cloudfoundry/guardian/blob/master/gardener/gardener.go#L460

kcmannem · 2019-02-08T21:29:24Z

@marco-m your point on having the random strategy for check container placement is totally valid and its what @ddadlani and I think is best for a short term fix. But we still need to protect ourselves from over creating containers when garden isn't doing it for us.

marco-m · 2019-02-12T11:15:12Z

As an additional data point, even if we retire the worker with runaway containers, that worker still gets new containers from the ATC for a while, before we start seeing the container count decreasing.

Reproduces concourse#3251

ddadlani · 2019-02-13T15:12:24Z

We added a small fix to randomly place check containers on workers when using the fewest-build-containers placement strategy. Our next steps will be to ensure it is integration tested and then it should be good to go.

wagdav · 2019-02-13T15:15:33Z

I managed to reproduce the container accumulation. My hypothesis is that a momentary loss of connection between the ATC and the worker triggers this failure mode.

I made a branch where I added some code and instructions: master...Pix4D:reproduce-check-container-accumulation

In summary:

Create a small pipeline with a mock resource
Run fly check-resource in an infinite loop
Stop the worker, wait a bit, then start it again
In a short while fly check-resource returns successfully, but the number of containers increase after each check-resource requrest

kcmannem · 2019-02-13T18:24:29Z

@wagdav That appears to be a different known issue where because of the worker restart, new check containers are created for the same resource config check session. This continues until the resource config check session expires. This issue has been around for a while, but was probably exacerbated due to all the check containers being placed on one worker. We don't have plans to fix this because we plan on doing away with long-lived check containers in the near future.

The current issue is to randomize the placement of check containers, which should reduce the impact of the issue you found.

clarafu · 2019-02-13T18:38:48Z

@wagdav that test was actually caused by another bug that is unrelated to fewest-build-containers. The number of containers increase after each check-resource in your test because it is unable to find any of the check containers it previously created due to a grace time before the container's expiration time that we use for finding an existing check container.

concourse/atc/db/container_owner.go

Line 154 in c44d1f8

    
           sq.Expr(fmt.Sprintf("expires_at > NOW() + interval '%d seconds'", int(c.expiries.GraceTime.Seconds()))),

Since your worker was just spun up in your test, it's uptime is very recent and therefore the check container that gets placed onto it expiries really quickly.

concourse/atc/db/container_owner.go

Line 198 in c44d1f8

    
           "NOW() + LEAST(GREATEST('%d seconds'::interval, NOW() - to_timestamp(max(start_time))), '%d seconds'::interval)",

The short expiration date in addition with the grace time probably resulted in the existing check containers not being found.

it's causing a ruckus because of #3251 Signed-off-by: Alex Suraci <suraci.alex@gmail.com>

vito · 2019-02-13T21:37:19Z

This should be fixed by #3288

wagdav · 2019-02-14T06:56:54Z

Thank you all for the feedback! We will try out the fix in #3288 and we are looking forward to the short-lived check containers in #3079 !

wagdav added bug triage labels Feb 8, 2019

wagdav mentioned this issue Feb 8, 2019

Flake: iptables: create-instance-chains: iptables: Memory allocation problem. #3127

Open

kcmannem assigned kcmannem and ddadlani Feb 8, 2019

ddadlani added this to Icebox in Runtime via automation Feb 11, 2019

ddadlani moved this from Icebox to Backlog in Runtime Feb 11, 2019

wagdav mentioned this issue Feb 11, 2019

fly containers does not give the pipeline/job/name of check containers #1374

Closed

vito added this to the v5.0.0 milestone Feb 11, 2019

vito removed the triage label Feb 11, 2019

ddadlani mentioned this issue Feb 11, 2019

Add ATC/worker flags to limit max build containers for workers #2928

Closed

kcmannem moved this from Backlog to In Flight in Runtime Feb 12, 2019

wagdav pushed a commit to Pix4D/concourse that referenced this issue Feb 13, 2019

Add instructions for the container accumulation.

ca515d0

Reproduces concourse#3251

vito added a commit that referenced this issue Feb 13, 2019

temporarily disable fewest-build-containers

b7cf09f

it's causing a ruckus because of #3251 Signed-off-by: Alex Suraci <suraci.alex@gmail.com>

vito mentioned this issue Feb 13, 2019

temporarily disable fewest-build-containers #3289

Merged

vito added the release/no-impact This is an issue that never affected released versions (i.e. a regression caught prior to shipping). label Feb 13, 2019

vito closed this as completed Feb 13, 2019

Runtime automation moved this from In Flight to Done Feb 13, 2019

topherbullock moved this from Done to Accepted in Runtime Mar 12, 2019

shemau mentioned this issue Jun 26, 2019

Container placement strategy is not honoured #4068

Closed

clarafu mentioned this issue Mar 4, 2022

Force get and check to use "fewest-build-containers" strategy. #8142

Merged

1 task

evanchaoli mentioned this issue Mar 21, 2022

Add no-input-strategy and let get/check use it. #8183

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check containers accumulate on a single worker with the 'fewest-build-containers' #3251

Check containers accumulate on a single worker with the 'fewest-build-containers' #3251

wagdav commented Feb 8, 2019 •

edited

kcmannem commented Feb 8, 2019

marco-m commented Feb 8, 2019

kcmannem commented Feb 8, 2019

kcmannem commented Feb 8, 2019

marco-m commented Feb 12, 2019

ddadlani commented Feb 13, 2019

wagdav commented Feb 13, 2019 •

edited

kcmannem commented Feb 13, 2019

clarafu commented Feb 13, 2019

vito commented Feb 13, 2019

wagdav commented Feb 14, 2019

Check containers accumulate on a single worker with the 'fewest-build-containers' #3251

Check containers accumulate on a single worker with the 'fewest-build-containers' #3251

Comments

wagdav commented Feb 8, 2019 • edited

Bug Report

Steps to Reproduce

Expected Results

Actual Results

Possible explanation

Additional Context

Impact

Version Info

kcmannem commented Feb 8, 2019

marco-m commented Feb 8, 2019

kcmannem commented Feb 8, 2019

kcmannem commented Feb 8, 2019

marco-m commented Feb 12, 2019

ddadlani commented Feb 13, 2019

wagdav commented Feb 13, 2019 • edited

kcmannem commented Feb 13, 2019

clarafu commented Feb 13, 2019

vito commented Feb 13, 2019

wagdav commented Feb 14, 2019

wagdav commented Feb 8, 2019 •

edited

wagdav commented Feb 13, 2019 •

edited