-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LRP: Fix node-local redirection exceptions #26144
Conversation
90e2997
to
6b02262
Compare
Commit 7a72d6b0c2038df448af6c8917364ff7ea6a1d4c does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
/test-1.26-net-next |
c19f5f7
to
c94624f
Compare
Commits ac33fe3b1bd7b70162b02d8a87006fb21ead34d7, c94624fbde2f7aa1112888c8fe1a5c8f66b171cf do not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
c94624f
to
51b98be
Compare
Commits ac33fe3b1bd7b70162b02d8a87006fb21ead34d7, 51b98be92ce1af6150f1d2c04bbd5305b6d21474 do not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
51b98be
to
49c78c6
Compare
Commits 56491a4, 49c78c6 do not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
@aditighag I've dropped release-blocker from this PR, it seems significant enough that it needs thorough review, and a compressed timeline will not help to get that. Feature freeze was last friday, and even if we consider that maybe this is fixing a known limitation, it does not look like it's tracking to land in time for v1.14 branching next week. If we really think that this is worthwhile backporting to v1.14 at a later time, we can first get it into a form that we're happy with, merge into main, and then make the call whether a backport makes sense or not. Either way we can get this into an upcoming v1.15 monthly snapshot release. |
Yes, agree with the assessment. |
This pull request has been automatically marked as stale because it |
This pull request has not seen any activity since it was marked stale. |
Commits 56491a4, 49c78c6 do not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
This pull request has been automatically marked as stale because it |
This pull request has not seen any activity since it was marked stale. |
When a Local Redirect Policy is applied, cilium BPF datapath translates frontend (identified by ip/port/protocol tuple) address from the policy to a node-local backend pod selected by the policy. However, in certain use cases, users may want to skip translating traffic from the backend to the frontend that gets redirected to the same backend. This behavior was enforced by default previously, and users didn't have a way to disable this behavior. Since this is a special case, introduce a new flag in the policy spec that users can explicitly set for certain use cases. Follow-up commits will implement the processing of this configuration. Signed-off-by: Aditi Ghag <aditi@cilium.io>
This is a preparatory commit to enable local redirect policies that have the `skipRedirectFromBackend` flag set in their specs. Endpoint netns cookies allocated by the Linux kernel for network namespaces are used as identifiers in the Cilium socket-LB BPF datapath in order to implement the LRP behavior in follow-up commits. Additionally, netns cookie metadata for pods are retrieved from the cni plugin process as it runs on the host, and hence has access to all pod network namespaces. Cilium agent pod on the other hand doesn't see non-root network namespaces created for other pods. Signed-off-by: Aditi Ghag <aditi@cilium.io>
This is a preparatory commit to read netns cookie set in the EndpointChangeRequest API calls. Add methods to retrieve the metadata for an endpoint. Follow-up commits will invoke these methods to realize certain local redirect policies. The netns cookie is also persisted along with the other endpoint state so that the metadata is available post cilium agent restarts. Signed-off-by: Aditi Ghag <aditi@cilium.io>
The BPF maps store certain LRP configurations where traffic from source pods destined to LRP frontends need to skip service load-balancing. The Cilium socket-LB datapath will identify such traffic by doing a lookup based on source pod identifier (netns cookie) and the destination address/port the pod is connecting to. Service LB is skipped for tuple entries found in the map. The map updates are handled in follow-up commits. Signed-off-by: Aditi Ghag <aditi@cilium.io>
Extend the `Subscriber` interface so that consumers can subscribe to events related to restored endpoints. This will be used by the local redirect policy manager in subsequent commits to get metadata for restored endpoints. Signed-off-by: Aditi Ghag <aditi@cilium.io>
Previously, the LRP manager read certain metadata for all pods, regardless of whether the pods were selected by deployed local redirect policies. Optimize this by getting metadata only for pods selected by policies. Signed-off-by: Aditi Ghag <aditi@cilium.io>
… flag Read the flag from user configured LRPs. Further processing of the flag is handled in subsequent commits. Signed-off-by: Aditi Ghag <aditi@cilium.io>
This commit handles the processing of policies set with the skipRedirectFromBackend flag, and plumbing of such policies in the relevant skip_lb{4,6} BPF maps introduced in previous commits. Such policies are translated into map entry tuples with policy selected backend pods netns cookie and the policy frontend the pods connect to. To that end, the LRP manager subscribes to endpoint related events in order to get the netns cookie metadata. It also handles different orders in which various kubernetes resources such as pods, endpoints, LRPs are deployed. Also, remove the socket lookup BPF helper checks that are not required anymore with the revised logic. Includes unit test cases to test the implementation. Signed-off-by: Aditi Ghag <aditi@cilium.io>
This commit revises the socket-LB datapath to skip load-balancing for traffic matching user configured local redirect policies with the skipRedirectFromBackend flag. Old logic: Previously, this was enforced by default for all local redirect services with the help of bpf_sk_lookup_* helpers. The intention was to identify cases where traffic from an LRP backend pod was getting redirected to itself using the socket lookup helpers. However, this logic led to cases where LB was skipped erroneously based on the unexpected behavior of the sk_lookup API implementation. Particularly, the logic intended to get established sockets matching passed ip/port tuple, but the kernel API can also return listening sockets matching just the passed port [1]. As a result, when a client pod has a process listening on the same port as deployed LRP(s), traffic from the pod destined to an LRP service frontend was not load-balanced, thereby skipping the desired node-local redirection. New logic: Skip LB only for socket tuples matching user configured local redirect policy tuples populated in the cilium_skip_lb{4,6} maps. The maps are populated by the userspace based on user deployed LRPs. The BPF cgroup hooks don't have access to the source IP address set in the BPF socket `ctx`, but it needs some kind of identifier to skip LB only for traffic from source entities selected by user configured LRPs. To that end, we use the netns cookie retrieved from the 'ctx' as this identifier. [1] https://elixir.bootlin.com/linux/v6.8/source/include/net/inet_hashtables.h#L425 Signed-off-by: Aditi Ghag <aditi@cilium.io>
This commit adds additional service map related helpers that'll be used in follow-up commits. Signed-off-by: Aditi Ghag <aditi@cilium.io>
Adds test cases to validate various cases around local redirect policy configurations where certain cases skip service translation in the socket-LB BPF datapath. Signed-off-by: Aditi Ghag <aditi@cilium.io>
Extends the alignchecker structs list with the newly added map key structs. Signed-off-by: Aditi Ghag <aditi@cilium.io>
Document the flag behavior and requirements. Update the kiam LRP spec that needs to enable this flag. Also, extend the upgrade guide with the updated behavior. Signed-off-by: Aditi Ghag <aditi@cilium.io>
ac1b2d0
to
18c746a
Compare
/test Hit - #31827. |
@joamaki PTAL |
This PR fixes cases with the local redirect policy datapath behavior with socket-LB where node-local redirection was being erroneously skipped.
Summary
Extend the local redirect policy spec with a new flag called
skipRedirectFromBackend
that allow users to skip redirection of traffic from an LRP backend to itself via the LRP frontend for certain use cases. This replaces the previous default logic in the BPF socket-LB datapath involving local redirect services LB that led to unintended error cases.LRPs with the new flag are processed by the LRP manager, and translated into entries populated in newly introduced maps that enable socket-LB datapath to skip local redirect service translation from certain pods.
Highlights of PR commits
(Check commits description for more details.)
Tested manually on a local cluster. An e2e test will be added in the cilium-cli connectivity suite as a follow-up once this dependency PR is merged.
Example LRP with the extended spec -
Fixes: #15195
Fixes: #20164