Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regressions on older Linux kernels #344

Closed
jamieklassen opened this issue Jul 7, 2020 · 1 comment
Closed

regressions on older Linux kernels #344

jamieklassen opened this issue Jul 7, 2020 · 1 comment
Assignees
Labels

Comments

@jamieklassen
Copy link
Member

jamieklassen commented Jul 7, 2020

Problem

When garden-runc-release v1.19.11 was released, it broke compatibility with a specific edge case -- running inside a user namespace on a Linux 4.4 system. This is a rare scenario for Cloud Foundry, where guardian is run in the initial user namespace -- however it is not unusual for the concourse worker (and consequently guardian as well) to run in a container (= user namespace), like on Docker or k8s.

We didn't notice the impact this breaking change had on our users and actually shipped multiple subsequent releases before thoroughly investigating concourse/concourse#5711, and we still don't have a compelling workaround for users who cannot upgrade from Linux 4.4 or run in the initial user namespace.

If we plan to restore compatibility with the 4.4-user-namespace use case, the only path forward seems to be concourse/concourse#5407 since garden is not being actively maintained.

In any case, how can we prevent gaffes like this in the future?

Solution

Some subsystem of concourse -- the worker would be most relevant -- could get continuously exercised against older versions of Linux. The first idea that came to mind was:

  1. add node pools similar to https://github.com/concourse/hush-house/blob/519ee144b103ce0678bca19b3a30ecd44dda54fd/terraform/main.tf#L77-L88 but running Linux 3.19 and 4.4. Add statefulsets for those workers to https://github.com/concourse/hush-house/tree/master/deployments/with-creds, with appropriately chosen tags like linux-3.19 and linux-4.4, for example.
  2. add jobs to our pipelines which run the containerd integration tests on these workers, i.e. a containerd-linux-4.4 and a containerd-linux-3.19.

Questions

  • does GKE provide machine images for these versions of Linux? If so, are they capable of also running kubernetes nodes? If not, do we manage the configuration for these workers elsewhere? maybe raw google_compute_instances in terraform, provisioned similar to https://github.com/concourse/ci/blob/master/deployments/smoke/smoke.tf? If we manage these workers outside of k8s How do we meaningfully share config (like worker keys) between two config management systems?
  • is there a lighter-weight way to run a test that integrates with a different version of linux? rather than having a long-lived concourse worker running that linux kernel, maybe we do something more like bin-smoke but deployed on VMs with different kernels. This would mean we don't have to think about the lifecycle of this worker and its continuous upgrades, but it also means we would not really be dogfooding the older kernel experience.
@jamieklassen jamieklassen changed the title test against older Linux kernels regressions on older Linux kernels Jul 7, 2020
@github-actions
Copy link

This issue has had no activity for a while. It has been labeled stale, and will be closed in one week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants