Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove guardian flag defaults #121

Merged
merged 2 commits into from Sep 21, 2020
Merged

Remove guardian flag defaults #121

merged 2 commits into from Sep 21, 2020

Conversation

muntac
Copy link
Contributor

@muntac muntac commented Sep 1, 2020

Part of: concourse/concourse#5985

Having default values for Guardian i.e. CONCOURSE_GARDEN_ flags is problematic now that Concourse supports two runtimes. I would argue that any default values for the web and worker commands that comprise a recommended setup for Concourse should be incorporated in the Concourse binary, and not the BOSH release and Helm chart. This makes the recommendations easier to manage and don't have to be kept in sync across three repositories.

CONCOURSE_GARDEN_ALLOW_HOST_ACCESS will have a value of "false" even without a default value being provided as it is represented as a bool in the Guardian codebase.

The recommendation to have CONCOURSE_GARDEN_MAX_CONTAINERS at 250 was based off CloudFoundry's Diego which also uses Guardian. See this post for more info. Concourse deployments have been operated at a higher limit than 250 and would often see an error here due to Guardian's default value for the network-pool.

The binary will set CONCOURSE_GARDEN_NETWORK_POOL to a default value of 10.80.0.0/16 as of concourse/concourse#6031.

@xtreme-sameer-vohra
Copy link
Contributor

Hey @muntac,
Is the idea to set a default for CONCOURSE_GARDEN_MAX_CONTAINERS in the binary similar to CONCOURSE_GARDEN_NETWORK_POOL ? While Concourse's use of Garden can support a limit above 250 containers, it depends on the workload + machine size and its important to have some guard rails that can be intentionally removed.

@muntac
Copy link
Contributor Author

muntac commented Sep 8, 2020

@xtreme-sameer-vohra I was unsure of what we should consider an average case upper limit with workload and machine variability taken into consideration.

I had thought we could have it unset initially and wait for feedback from users to see around what limits they start seeing problems. I haven't been able to find any past issues describing 250 being selected as the limit due to user feedback about performance. Mostly see stuff around the insufficient subnet problem, or hitting the max container limit.

Have you seen users run into performance problems where the issue was solved by limiting the number of containers?

@xtreme-sameer-vohra
Copy link
Contributor

Hey @muntac
AFAIK, you're right, there hasn't been any formal feedback gathering to set the arbitrary limit of 250. However, given that it has been the default, folks maybe unintentionally relying on it - specially for large scale clusters around upgrades.

What could be a safer approach towards encouraging folks to learn and set intentional limits for their own clusters ?

CONCOURSE_GARDEN_ALLOW_HOST_ACCESS will have a value of "false"
even without a default value being provided as it is represented
as a bool in the Guardian codebase.

The recommendation to have CONCOURSE_GARDEN_MAX_CONTAINERS at 250
was based off CloudFoundry's Diego which also uses Guardian.
Concourse deployments have been operated at a much higher limit
and do not need to be held to this limit.

The binary will set CONCOURSE_GARDEN_NETWORK_POOL to a default value of
10.80.0.0/16 as of concourse/concourse#6031.

concourse/concourse#5985

Signed-off-by: Muntasir Chowdhury <mchowdhury@pivotal.io>
README.md Outdated
Comment on lines 34 to 48
**Default values**

For Concourse deployments on BOSH made using the default values of the release, this prevents the behaviour drifting from that of the binary in case the binary's default values change.

Note that the comment stating the binary's default can still become out of sync when such an update happens. The current solution is a suboptimal one. It may be improved in the future by generating a list of the default values here from the binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,
Found this section a bit hard to follow. If the older statement isn't clear, should we explicitly add a reference to an example like this ? Or was the intention to call out that the current existing of default values in the release is sub-optimal and if we are making changes in an area, we should attempt to clean them up ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch! So I'm going to talk about two things now - 1) the example: field and 2) the "example" of mentioning a default.

  1. I was going off the concourse-chart and forgot we use an example field in the BOSH release to mention the default. However, I'm realizing putting something is an example doesn't really communicate to operators that if they don't put in a value this is what Concourse will default to.

  2. Putting in an example of how to mention default is a good suggestion. I'll try to make the sentences clear and put in an example like the following:

containerd.max_containers:
    env: CONCOURSE_CONTAINERD_MAX_CONTAINERS
    description: |
      Maximum container capacity. 0 means no limit. Defaults to 250.

Signed-off-by: Muntasir Chowdhury <mchowdhury@pivotal.io>
Copy link
Contributor

@xtreme-sameer-vohra xtreme-sameer-vohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants