New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: Use new asynchronous IPCache API for Manager (v2) #23208
Conversation
e879267
to
5857507
Compare
This comment was marked as outdated.
This comment was marked as outdated.
5857507
to
46d09ac
Compare
1981c89
to
cbae81c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
1 similar comment
This comment was marked as outdated.
This comment was marked as outdated.
f7d2b29
to
a90cfa1
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
a90cfa1
to
70087b4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Since #21183 it is no longer necessary to call `To4` before invoking `IPToNetPrefix`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The new name better represents what the function is. This commit has no functional impact. Signed-off-by: Chris Tarazi <chris@isovalent.com>
This will be useful for fetching the metadata for particular entries in the ipcache. Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This pulls out logic that is used in both NodeUpdated() and NodeDeleted() for proper code reuse. Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
f244a7f
to
6a364a4
Compare
Key diff from latest push to resolve #23208 (comment) in Details
- log.WithError(err2).WithFields(logrus.Fields{
- logfields.IPAddr: prefix,
- logfields.Identity: entry.identity.ID,
- }).Error("Failed to replace ipcache entry with new identity after label removal. Traffic may be disrupted.")
+ // It's plausible to pull the same information twice
+ // from different sources, for instance in etcd mode
+ // where node information is propagated both via the
+ // kvstore and via the k8s control plane. If the
+ // upsert was rejected due to source precedence, but the
+ // identity is unchanged, then we can safely ignore the
+ // error message.
+ oldID, ok := previouslyAllocatedIdentities[p]
+ if !(ok && oldID.ID == entry.identity.ID && errors.Is(err2, &ErrOverwrite{
+ ExistingSrc: oldID.Source,
+ NewSrc: entry.identity.Source,
+ })) {
+ log.WithError(err2).WithFields(logrus.Fields{
+ logfields.IPAddr: prefix,
+ logfields.Identity: entry.identity.ID,
+ }).Error("Failed to replace ipcache entry with new identity after label removal. Traffic may be disrupted.")
+ } |
This comment was marked as outdated.
This comment was marked as outdated.
For some reason, GH thinks that @cilium/cli review is needed but I can't tell why. 🤔 Edit: And Jenkins just triggered a new run somehow 🤦. The CI was all 🟢 before this... |
it's because of pkg/logging/logfields/logfields.go, this file is owned by cli team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed logfields.go. looks innocent 😇
Ok, thanks. I wonder how many of that team's members are even aware of that. |
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2353/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
I can say I'm very aware of it because I've reviewed changes to it many times :) I have a feeling the rest of the team is quite familiar with it as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving config-related changes
/test |
On top of #22086 with additional fixes.
Related: #21142