Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: Use new asynchronous IPCache API for Manager (v2) #23208

Merged
merged 9 commits into from May 17, 2023

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Jan 20, 2023

On top of #22086 with additional fixes.

Related: #21142

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 20, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 20, 2023
@christarazi christarazi force-pushed the pr/christarazi/nodemanager-ipcache-v5 branch from e879267 to 5857507 Compare January 20, 2023 07:19
@christarazi

This comment was marked as outdated.

@christarazi christarazi force-pushed the pr/christarazi/nodemanager-ipcache-v5 branch from 5857507 to 46d09ac Compare January 20, 2023 17:37
@christarazi christarazi removed the kind/community-contribution This was a contribution made by a community member. label Jan 20, 2023
@christarazi christarazi force-pushed the pr/christarazi/nodemanager-ipcache-v5 branch 7 times, most recently from 1981c89 to cbae81c Compare January 23, 2023 19:59
@christarazi

This comment was marked as outdated.

1 similar comment
@christarazi

This comment was marked as outdated.

@gandro gandro force-pushed the pr/christarazi/nodemanager-ipcache-v5 branch 2 times, most recently from f7d2b29 to a90cfa1 Compare January 24, 2023 16:44
@gandro

This comment was marked as outdated.

@christarazi

This comment was marked as outdated.

@christarazi

This comment was marked as outdated.

@christarazi christarazi self-assigned this Jan 27, 2023
@gandro gandro force-pushed the pr/christarazi/nodemanager-ipcache-v5 branch from a90cfa1 to 70087b4 Compare January 30, 2023 16:28
@gandro gandro self-assigned this Jan 30, 2023
@gandro gandro added the release-note/misc This PR makes changes that have no direct user impact. label Jan 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 30, 2023
@gandro

This comment was marked as outdated.

@christarazi christarazi changed the title pr/christarazi/nodemanager ipcache v5 node: Use new asynchronous IPCache API for Manager (v2) Jan 31, 2023
@christarazi christarazi added kind/enhancement This would improve or streamline existing functionality. area/daemon Impacts operation of the Cilium daemon. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Jan 31, 2023
gandro and others added 6 commits May 15, 2023 22:26
Since #21183 it is no longer necessary to call `To4` before
invoking `IPToNetPrefix`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The new name better represents what the function is. This commit has no
functional impact.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This will be useful for fetching the metadata for particular entries in
the ipcache.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This pulls out logic that is used in both NodeUpdated() and
NodeDeleted() for proper code reuse.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous
IPCache API. Instead of directly performing Upserts and Delete on the
various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now
associate each node IP with the corresponding labels. The CEW identity
is now also determined by the node's labels, rather than its numeric
identity.

This also fixes an issue where concurrent use of the synchronous and
asynchronous API would lead to the encryption key for the kube-apiserver
node being lost (c.f. #19318).

While we are at it, change the test to use netip types instead of
net.IP.

Co-authored-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/nodemanager-ipcache-v5 branch from f244a7f to 6a364a4 Compare May 16, 2023 06:05
@christarazi
Copy link
Member Author

christarazi commented May 16, 2023

Key diff from latest push to resolve #23208 (comment) in pkg/ipcache/metadata.go. The change was accidentally lost due to an incorrect rebase 😕. Failure in the CI for kvstore tests highlighted the issue.

Details

-			log.WithError(err2).WithFields(logrus.Fields{
-				logfields.IPAddr:   prefix,
-				logfields.Identity: entry.identity.ID,
-			}).Error("Failed to replace ipcache entry with new identity after label removal. Traffic may be disrupted.")
+			// It's plausible to pull the same information twice
+			// from different sources, for instance in etcd mode
+			// where node information is propagated both via the
+			// kvstore and via the k8s control plane. If the
+			// upsert was rejected due to source precedence, but the
+			// identity is unchanged, then we can safely ignore the
+			// error message.
+			oldID, ok := previouslyAllocatedIdentities[p]
+			if !(ok && oldID.ID == entry.identity.ID && errors.Is(err2, &ErrOverwrite{
+				ExistingSrc: oldID.Source,
+				NewSrc:      entry.identity.Source,
+			})) {
+				log.WithError(err2).WithFields(logrus.Fields{
+					logfields.IPAddr:   prefix,
+					logfields.Identity: entry.identity.ID,
+				}).Error("Failed to replace ipcache entry with new identity after label removal. Traffic may be disrupted.")
+			}

@christarazi

This comment was marked as outdated.

@christarazi
Copy link
Member Author

christarazi commented May 16, 2023

For some reason, GH thinks that @cilium/cli review is needed but I can't tell why. 🤔

Edit: And Jenkins just triggered a new run somehow 🤦. The CI was all 🟢 before this...

@michi-covalent
Copy link
Contributor

it's because of pkg/logging/logfields/logfields.go, this file is owned by cli team

Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed logfields.go. looks innocent 😇

@christarazi
Copy link
Member Author

pkg/logging/logfields/logfields.go

Ok, thanks. I wonder how many of that team's members are even aware of that.

@christarazi
Copy link
Member Author

christarazi commented May 16, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathConfig Iptables Skip conntrack for pod traffic

Failure Output

FAIL: Found 1 k8s-app=cilium logs matching list of errors that must be investigated:

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2353/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@asauber
Copy link
Member

asauber commented May 17, 2023

Ok, thanks. I wonder how many of that team's members are even aware of that.

I can say I'm very aware of it because I've reviewed changes to it many times :)

I have a feeling the rest of the team is quite familiar with it as well.

Copy link
Member

@asauber asauber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving config-related changes

@asauber
Copy link
Member

asauber commented May 17, 2023

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 17, 2023
@christarazi christarazi merged commit f2a4312 into main May 17, 2023
58 checks passed
@christarazi christarazi deleted the pr/christarazi/nodemanager-ipcache-v5 branch May 17, 2023 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants