Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes: fix startup by setting systemd as cgroup driver #111590

Closed
wants to merge 1 commit into from

Conversation

Mic92
Copy link
Member

@Mic92 Mic92 commented Feb 1, 2021

Motivation for this change

#108960 (comment)

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@Mic92 Mic92 mentioned this pull request Feb 1, 2021
10 tasks
Copy link
Contributor

@thiagokokada thiagokokada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is difficult, but could we have a test for those cases 🤔 ?

@Mic92
Copy link
Member Author

Mic92 commented Feb 1, 2021

I don't know if this is difficult, but could we have a test for those cases thinking ?

There is a test for kubernetes. I am sure it broke in the process.

@ghost
Copy link

ghost commented Feb 1, 2021

We could add the kubernetes test to docker.pussthru.tests.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm this fixes the kubernetes NixOS tests.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the right fix because it also improves stability according to the docs, but does require instructions in the release notes, because of the following:

Changing the cgroup driver of a Node that has joined a cluster is strongly not recommended.
If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.

If you have automation that makes it feasible, replace the node with another using the updated configuration, or reinstall it using automation.

-- https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

@ghost
Copy link

ghost commented Feb 2, 2021

Good point, I didn't know there was state related to the cgroup backend used. We could decide the default behavior based on the stateVersion in the module and set the system to use the old cgroup API if it's an old stateVersion.

@Mic92
Copy link
Member Author

Mic92 commented Feb 2, 2021

Seems like the right fix because it also improves stability according to the docs, but does require instructions in the release notes, because of the following:

Changing the cgroup driver of a Node that has joined a cluster is strongly not recommended.
If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.

If you have automation that makes it feasible, replace the node with another using the updated configuration, or reinstall it using automation.

-- kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

Not sure when I get back to this. When somebody else beat me to write the release notes I will cherry-pick from there.

@Mic92
Copy link
Member Author

Mic92 commented Feb 21, 2021

Closing this for now for someone else to pick up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants