Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes: 1.15.4 -> 1.16.3 #70905

Merged
merged 2 commits into from Nov 15, 2019
Merged

kubernetes: 1.15.4 -> 1.16.3 #70905

merged 2 commits into from Nov 15, 2019

Conversation

@saschagrunert
Copy link
Member

@saschagrunert saschagrunert commented Oct 10, 2019

Motivation for this change

Updating kubernetes to the latest release.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @johanot

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

@GrahamcOfBorg build kubernetes

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

@johanot this is my first time nix(os) kubernetes update, do you know if I have to adapt something else? I would not expect any breaking change from the current deployment (last famous words).

@saschagrunert saschagrunert force-pushed the saschagrunert:kubernetes-1.16 branch from 1852732 to 8484b08 Oct 10, 2019
@ofborg ofborg bot requested review from offlinehacker and johanot Oct 10, 2019
@srhb
Copy link
Contributor

@srhb srhb commented Oct 10, 2019

@saschagrunert Thank you!

Looks like this breaks something in the CNI components:

machine2# [  406.457500] kubelet[1126]: W1010 11:36:36.841528    1126 cni.go:202] Error validating CNI config &{mynet  false [0xc000fb2740] [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 34 44 34 110 97 109 101 34 58 34 109 121 110 101 116 34 44 34 112 108 117 103 105 110 115 34 58 91 123 34 100 101 108 101 103 97 116 101 34 58 123 34 98 114 105 100 103 101 34 58 34 100 111 99 107 101 114 48 34 44 34 105 115 68 101 102 97 117 108 116 71 97 116 101 119 97 121 34 58 116 114 117 101 125 44 34 110 97 109 101 34 58 34 109 121 110 101 116 34 44 34 116 121 112 101 34 58 34 102 108 97 110 110 101 108 34 125 93 125]}: [plugin flannel does not support config version ""]

Maybe it's just related to the empty version string, but at least we know that this is not a drop-in PR.

You can run the tests yourself with nix-build nixos/tests/kubernetes -A dns -A rbac in order to debug the problem further. 😄

@srhb srhb self-requested a review Oct 10, 2019
@saschagrunert saschagrunert force-pushed the saschagrunert:kubernetes-1.16 branch from 8484b08 to 1f210a0 Oct 10, 2019
@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

Maybe it's just related to the empty version string, but at least we know that this is not a drop-in PR.

Yes, flannel had issues recently with missing CNI versions in the configuration, I added it as a default to the kubelet, let's see if the tests are passing now. 🤞

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

@GrahamcOfBorg build kubernetes

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

@GrahamcOfBorg test kubernetes

@johanot
Copy link
Contributor

@johanot johanot commented Oct 10, 2019

@saschagrunert The tests definitely won't run on ofborg, unfortunately. :) They're too memory-hungry.

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

@saschagrunert The tests definitely won't run on ofborg, unfortunately. :) They're too memory-hungry.

Ah okay thanks for the hint :)

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 10, 2019

On my machine it looks like that the error is gone, can someone verify (or not) this please? 🙃

@srhb
Copy link
Contributor

@srhb srhb commented Oct 10, 2019

I'll give it a spin tomorrow! :)

@srhb
Copy link
Contributor

@srhb srhb commented Oct 13, 2019

The specific error is gone, but it looks like the tests are still broken. Did you successfully run them, or did you only look into the version string? :)

@srhb
Copy link
Contributor

@srhb srhb commented Oct 13, 2019

Looks like something changed with respect to how the kubelet looks itself up -- it now tries to look up the unqualified hostname which fails, because the nodename is FQDN. Maybe we changed something in the hostname infrastructure in NixOS. Investigating.

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 13, 2019

I only checked if the error message disappeared, since I was not sure how long the tests should run on my machine... It might be that we have to override the host names on the Kubelet and the proxy.

@srhb
Copy link
Contributor

@srhb srhb commented Oct 13, 2019

@srhb
Copy link
Contributor

@srhb srhb commented Oct 13, 2019

OK, I think what's going on is this:

https://github.com/kubernetes/kubernetes/pull/77167/files#diff-934760ea1a4e195e49bcd10625cf4e46R146-R148

So, the previous behaviour appears to have been to fallback to listen on 127.0.0.1 when the node cannot be looked up in the API by hostname. Now, our mismatch between short/fqdn hostname (in the proxy/apiserver sides respectively) will cause the proxy to fail to start. This is a misconfiguration in our test suite -- the proxy should be set up to query for the fqdn of the node instead of the shortname, since that's how our kubelets are set up to register.

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 14, 2019

OK, I think what's going on is this:

https://github.com/kubernetes/kubernetes/pull/77167/files#diff-934760ea1a4e195e49bcd10625cf4e46R146-R148

So, the previous behaviour appears to have been to fallback to listen on 127.0.0.1 when the node cannot be looked up in the API by hostname. Now, our mismatch between short/fqdn hostname (in the proxy/apiserver sides respectively) will cause the proxy to fail to start. This is a misconfiguration in our test suite -- the proxy should be set up to query for the fqdn of the node instead of the shortname, since that's how our kubelets are set up to register.

I would generally add this hostname option to the kube-proxy, since I think this can also fail in other usage scenarios. WDYT? I added it on top of another commit that I can easily revert it.

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 14, 2019

It might be possible that this:

services.kubernetes.kubelet.hostname = with config.networking;
mkDefault (hostName + optionalString (domain != null) ".${domain}");

has to be cfg.clusterDomain instead of just domain, right?

@srhb
Copy link
Contributor

@srhb srhb commented Oct 14, 2019

[ ... ] has to be cfg.clusterDomain instead of just domain, right?

I think domain is actually correct right now, though either could work. In fact, things would be easier if we weren't using the FQDN at all right now, because of how hostname lookups work in NixOS... That's something to keep in mind for the new module.

I would generally add this hostname option to the kube-proxy, since I think this can also fail in other usage scenarios. WDYT? I added it on top of another commit that I can easily revert it.

I think this may be the best way to go right now, though I'm sad about adding yet another ad-hoc option as needs arise. We really should generalize all those command lines in time, so that module updates aren't as necessary again. Also something to keep in mind. 😄

@johanot care to weigh in on that addition?

@srhb
Copy link
Contributor

@srhb srhb commented Oct 14, 2019

With these changes and appropriate reconfiguration of the test suite, I think we're almost there. The coredns deployment needs to be updated to take into account the deprecated beta APIs, but it still doesn't come up.

@srhb
Copy link
Contributor

@srhb srhb commented Oct 14, 2019

[...] but it still doesn't come up.

Disregard that. When I spell "apps/v1" correctly, it does come up. 😊

@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.1 kubernetes: 1.15.4 -> 1.16.2 Oct 16, 2019
@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Oct 16, 2019

FYI, 1.16.2 has arrived (https://github.com/kubernetes/kubernetes/releases/tag/v1.16.2). The changelog is not really useful, but.. afaik, most notably: CVE-2019-11253.

Yes, I updated it to 1.16.2 and squashed my commits together.

@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Nov 7, 2019

Can someone please assist me and give it another test run? I'm not sure how to run these tests on my AWS nix machine :-/

@srhb
Copy link
Contributor

@srhb srhb commented Nov 7, 2019

@srhb
Copy link
Contributor

@srhb srhb commented Nov 7, 2019

So the problem with the node name mismatch is still there -- the test case will need something like services.kubernetes.proxy.hostname = machineName + "." + domain; -- and a release note to go with it in order to explain that people upgrading to 20.03 will need to take care to ensure that this change doesn't break their setup.

I think everything else looks okay.

You should be able to run the test commands I gave you on any non-virtualized Linux with Nix where you have access to virtualization. Let me know if you need help with that. For virtualized Linuxen, I suspect you don't have the necessary hardware access.

@saschagrunert saschagrunert force-pushed the saschagrunert:kubernetes-1.16 branch 2 times, most recently from ed6c315 to 93556ce Nov 11, 2019
@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Nov 11, 2019

Got the tests to run locally and both (rbac, dns) succeeded. PTAL :)

@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.2 kubernetes: 1.15.4 -> 1.16.3 Nov 14, 2019
@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.3 kubernetes: 1.15.4 -> 1.16.4 Nov 14, 2019
@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.4 kubernetes: 1.15.4 -> 1.16.3 Nov 14, 2019
@saschagrunert saschagrunert force-pushed the saschagrunert:kubernetes-1.16 branch from 93556ce to 180c140 Nov 14, 2019
@johanot johanot mentioned this pull request Nov 14, 2019
5 of 10 tasks complete
@srhb
srhb approved these changes Nov 14, 2019
@srhb
Copy link
Contributor

@srhb srhb commented Nov 14, 2019

I'm OK with this now. Since we're just adding an option I think we don't need to add any release notes currently, and hopefully we'll have some more to report on the kubernetes module come 20.03 anyway. :)

@srhb
Copy link
Contributor

@srhb srhb commented Nov 14, 2019

(Just running the tests a final time, then merging)

@srhb
Copy link
Contributor

@srhb srhb commented Nov 14, 2019

Oops, I just realized the commit message is out of date. Would you mind fixing it? And while you're at it, I prefer splitting the module and test change into a separate commit prefixed "nixos/kubernetes: ..."

Last step, promise! 😁

@saschagrunert saschagrunert force-pushed the saschagrunert:kubernetes-1.16 branch from 180c140 to 803b756 Nov 14, 2019
@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Nov 14, 2019

Oops, I just realized the commit message is out of date. Would you mind fixing it? And while you're at it, I prefer splitting the module and test change into a separate commit prefixed "nixos/kubernetes: ..."

Last step, promise!

Hm, like this? 🙃

@srhb
Copy link
Contributor

@srhb srhb commented Nov 14, 2019

Sorry, no, I meant keep the package stuff as separate as possible from any module/test stuff. :) Like so:

srhb@6c22549
srhb@a8d4321

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
@saschagrunert saschagrunert force-pushed the saschagrunert:kubernetes-1.16 branch from 803b756 to 7f358a5 Nov 15, 2019
@saschagrunert
Copy link
Member Author

@saschagrunert saschagrunert commented Nov 15, 2019

Sure, I changed it as suggested. 👍

@srhb
Copy link
Contributor

@srhb srhb commented Nov 15, 2019

Reran the tests for good measure, all looks good, thank you for sticking with this. :)

@srhb srhb merged commit b3a4f74 into NixOS:master Nov 15, 2019
15 checks passed
15 checks passed
Evaluation Performance Report Evaluator Performance Report
Details
grahamcofborg-eval ^.^!
Details
grahamcofborg-eval-check-maintainers matching changed paths to changed attrs...
Details
grahamcofborg-eval-check-meta config.nix: checkMeta = true
Details
grahamcofborg-eval-darwin nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A darwin-tested
Details
grahamcofborg-eval-nixos nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release-combined.nix -A tested
Details
grahamcofborg-eval-nixos-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release.nix -A manual
Details
grahamcofborg-eval-nixos-options nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release.nix -A options
Details
grahamcofborg-eval-nixpkgs-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A manual
Details
grahamcofborg-eval-nixpkgs-tarball nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A tarball
Details
grahamcofborg-eval-nixpkgs-unstable-jobset nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A unstable
Details
grahamcofborg-eval-package-list nix-env -qa --json --file .
Details
grahamcofborg-eval-package-list-no-aliases nix-env -qa --json --file . --arg config { allowAliases = false; }
Details
kubernetes on aarch64-linux Success
Details
kubernetes on x86_64-linux Success
Details
@saschagrunert saschagrunert deleted the saschagrunert:kubernetes-1.16 branch Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.