-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.12: nodediscovery: Fix bug where CiliumInternalIP was flapping #29979
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 7084c17 ] [ backporter's notes: Trivial conflicts in includes and due to a new functions. ] This commit improves the debug logging of node update events by using the JSON representation instead of the Go syntax representation of the node. This makes it easier to parse the log message, as IP addresses are now printed as strings instead of byte arrays. Before: ``` level=debug msg="Received node update event from custom-resource: types.Node{Name:\"kind-worker\", Cluster:\"default\", IPAddresses:[]types.Address{types.Address{Type:\"InternalIP\", IP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xac, 0x12, 0x0, 0x3}}, types.Address{Type:\"InternalIP\", IP:net.IP{0xfc, 0x0, 0xc1, 0x11, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3}}, types.Address{Type:\"CiliumInternalIP\", IP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0x0, 0x0, 0xd2}}}, IPv4AllocCIDR:(*cidr.CIDR)(0xc000613180), IPv4SecondaryAllocCIDRs:[]*cidr.CIDR(nil), IPv6AllocCIDR:(*cidr.CIDR)(nil), IPv6SecondaryAllocCIDRs:[]*cidr.CIDR(nil), IPv4HealthIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0x0, 0x0, 0x30}, IPv6HealthIP:net.IP(nil), IPv4IngressIP:net.IP(nil), IPv6IngressIP:net.IP(nil), ClusterID:0x0, Source:\"custom-resource\", EncryptionKey:0x0, Labels:map[string]string{\"beta.kubernetes.io/arch\":\"amd64\", \"beta.kubernetes.io/os\":\"linux\", \"kubernetes.io/arch\":\"amd64\", \"kubernetes.io/hostname\":\"kind-worker2\", \"kubernetes.io/os\":\"linux\"}, Annotations:map[string]string(nil), NodeIdentity:0x0, WireguardPubKey:\"\"}" subsys=nodemanager ``` After: ``` level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.3\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::3\"},{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"}],\"IPv4AllocCIDR\":{\"IP\":\"10.0.1.0\",\"Mask\":\"////AA==\"},\"IPv4SecondaryAllocCIDRs\":null,\"IPv6AllocCIDR\":null,\"IPv6SecondaryAllocCIDRs\":null,\"IPv4HealthIP\":\"10.0.1.120\",\"IPv6HealthIP\":\"\",\"IPv4IngressIP\":\"\",\"IPv6IngressIP\":\"\",\"ClusterID\":0,\"Source\":\"custom-resource\",\"EncryptionKey\":0,\"Labels\":{\"beta.kubernetes.io/arch\":\"amd64\",\"beta.kubernetes.io/os\":\"linux\",\"kubernetes.io/arch\":\"amd64\",\"kubernetes.io/hostname\":\"kind-worker\",\"kubernetes.io/os\":\"linux\"},\"Annotations\":null,\"NodeIdentity\":0,\"WireguardPubKey\":\"\"}" subsys=nodemanager ``` Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 263e689 ] [ backporter's notes: Various conflicts. net.IPFamilyOfString(x) doesn't exist so it had to be replaced with ParseIPSloppy(x).To4(). slices.DeleteFunc doesn't exist so it had to be defined in our own slices package as done on v1.14. mutateNodeResource's logic was significantly refactored so the new code to clean node IP addresses had to be moved before all appending of node IP addresses. ] This fixes a bug in `UpdateCiliumNodeResource` where the `CiliumInternalIP` (aka `cilium_host` IP, aka router IP) was flapping in the node manager during restoration (i.e. during cilium-agent restarts). In particular in `cluster-pool` mode, `UpdateCiliumNodeResource` is called before the `cilium_host` IP has been restored, as there are some circular dependencies: The restored IP can only be fully validated after the IPAM subsystem is ready, but that in turn can only happen if the `CiliumNode` object has been created. The `UpdateCiliumNodeResource` function however will only announce the `cilium_host` IP if it has been restored. This commit attempts to break that cycle by not overwriting any already existing `CiliumInternalIP` in the CiliumNode resource. Overall, this change is rather hacky, in particular it does not address the fact that other less crucial node information (like the health IP) also flaps. But since we want to backport this bugfix to older stable branches too, this change is intentionally kept as minimal as possible. Example node event (as observed by other nodes) before this change: ``` 2023-12-18T12:58:20.070330814Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"}],..." subsys=nodemanager 2023-12-18T12:58:20.208082226Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"},{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"}],..." subsys=nodemanager ``` After this change (note the `CiliumInternalIP` present in both events): ``` 2023-12-18T15:38:23.695653876Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"},{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"}],..." subsys=nodemanager 2023-12-18T15:38:23.838604573Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"},{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"}],...}" subsys=nodemanager ``` Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
pchaigno
added
kind/backports
This PR provides functionality previously merged into master.
backport/1.12
This PR represents a backport for Cilium 1.12.x of a PR that was merged to main.
labels
Dec 19, 2023
/test-backport-1.12 |
julianwiedmann
approved these changes
Dec 19, 2023
gandro
approved these changes
Dec 19, 2023
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Dec 20, 2023
jaredledvina
added a commit
to DataDog/cilium
that referenced
this pull request
Jan 23, 2024
…available" This reverts commit 532590f. Instead we'll pick upstreams cilium#29979
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.12
This PR represents a backport for Cilium 1.12.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Manual backport of #29964. See commits for conflict notes (same ones as for v1.13).
Once this PR is merged, you can update the PR labels via: