Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_ids: introduce GetNodeID #26155

Merged
merged 6 commits into from Jun 15, 2023

Conversation

mhofstetter
Copy link
Member

@mhofstetter mhofstetter commented Jun 13, 2023

This PR introduces the possibility to retrieve the node id for a given node IP without having to use AllocateNodeID which comes with the drawback of actually allocating a new node id if one doesn't exist yet for the given IP.

Example: Using ipcache.AllocateNodeID to lookup the node ID for a node IP during auth gc initialisation results in unintended node ID allocations if the nodeids doesn't exist yet for the cilium nodes.

By using the new method GetNodeID we remove this unintended side-effect.

Therefore, the temporarily disabled auth map GC functionality has been re-enabled!

Related to: #26073 & #25964

@mhofstetter mhofstetter added kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact. area/servicemesh GH issues or PRs regarding servicemesh labels Jun 13, 2023
@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 13, 2023

/test

several manual re-runs of ci-ginkgo succeeded without an issue! :)

@@ -77,8 +72,28 @@ func (n *linuxNodeHandler) GetNodeIP(nodeID uint16) string {
return n.nodeIPsByIDs[nodeID]
}

func (n *linuxNodeHandler) GetNodeID(nodeIP net.IP) (uint16, bool) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to go with an error instead of the bool in the "public" facing API? my thoughts were that a bool fulfills the purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a bool will be fine as it mimics the map "interface" just fine

@@ -82,7 +83,7 @@ func containsIP(allowedIPs []net.IPNet, ipnet *net.IPNet) bool {
func newTestAgent(ctx context.Context) (*Agent, *ipcache.IPCache) {
ipCache := ipcache.NewIPCache(&ipcache.Configuration{
Context: ctx,
NodeIDHandler: &mockNodeHandler{},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: re-using the "duplicate" from ipcache kind of reduces the blast radius (e.g. wireguard) when changing the nodeidhandler interface

@mhofstetter mhofstetter marked this pull request as ready for review June 13, 2023 13:23
@mhofstetter mhofstetter requested review from a team as code owners June 13, 2023 13:23
@mhofstetter mhofstetter added the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Jun 13, 2023
@mhofstetter
Copy link
Member Author

/test

@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 14, 2023

/test-1.26-net-next

previous run hit #26097 & #25958

Copy link
Member

@meyskens meyskens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for me

@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 14, 2023

/test-1.26-net-next

previous run hit #25187

@mhofstetter
Copy link
Member Author

rebased to main

@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 14, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) with L4 policy Tests NodePort with L4 Policy

Failure Output

FAIL: Request from testclient-mljxm pod to service tftp://[fd04::12]:30203/hello failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/742/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@mhofstetter
Copy link
Member Author

rebased to main after jenkins based checks have been removed there

@mhofstetter
Copy link
Member Author

/test

@mhofstetter mhofstetter requested a review from a team as a code owner June 14, 2023 20:58
@mhofstetter mhofstetter requested a review from jibi June 14, 2023 20:58
@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 14, 2023

replaced linuxNodeHandler.Mutex with RWMutex and only use read-lock in case of GetNodeIP & GetNodeID

@mhofstetter
Copy link
Member Author

Added support for IPv6 in getNodeIDForIP - as discussed offline with @brb

@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 15, 2023

hit some strange error in integration-test (postcheck)

rebasing to main might help 🙏 (it has helped indeed 👀 )

@mhofstetter
Copy link
Member Author

/test

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhofstetter mhofstetter removed the request for review from aditighag June 15, 2023 09:33
@mhofstetter
Copy link
Member Author

removed review-request from @aditighag. @jrajahalme already reviewed in behalf of team ipcache.

pkg/datapath/linux/node_ids.go Outdated Show resolved Hide resolved
pkg/datapath/linux/node_ids.go Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jun 15, 2023
@mhofstetter mhofstetter added the dont-merge/blocked Another PR must be merged before this one. label Jun 15, 2023
This commit introduces the possibility to retrieve the node id for a
given node IP without having to use `AllocateNodeID` which comes with
the drawback of actually allocating a new node id if it doesn't exist
yet.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Currently, when retrieving the node id for an IP, the local node id 0 is
only returned if the given IP matches the nodes IPv4 - but not IPv6.

This commit adds support for IPv6.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Until now, a `lock.Mutex` was securing all fields of the
linuxnodehandler.

With the introduction ready-only methods GetNodeIP & GetNodeID, it
became useful to replace it with an RWMutex and only lock it for read in
these two functions.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Using `ipcache.ALlocateNodeID` to lookup the node id for a node IP
during auth gc initialisation results in unintended node id allocations
if the nodeids aren't yet created for the cilium nodes.

By using the new method `GetNodeID` we remove this unwanted side-effect.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
This commit re-enables the node-based auth map garbage collection which
have been temporarily disabled with cilium#26073.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
@mhofstetter
Copy link
Member Author

addressed @jibi's input

@aanm aanm removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 15, 2023
@mhofstetter mhofstetter removed the dont-merge/blocked Another PR must be merged before this one. label Jun 15, 2023
@mhofstetter
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 15, 2023
@joestringer joestringer merged commit a824c71 into cilium:main Jun 15, 2023
61 checks passed
@mhofstetter mhofstetter deleted the pr/mhofstetter/getnodeid branch June 16, 2023 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/servicemesh GH issues or PRs regarding servicemesh feature/authentication kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants