Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRP: Fix node-local redirection exceptions #26144

Merged
merged 13 commits into from
Apr 24, 2024

Conversation

aditighag
Copy link
Member

@aditighag aditighag commented Jun 12, 2023

This PR fixes cases with the local redirect policy datapath behavior with socket-LB where node-local redirection was being erroneously skipped.

Summary

Extend the local redirect policy spec with a new flag called skipRedirectFromBackend that allow users to skip redirection of traffic from an LRP backend to itself via the LRP frontend for certain use cases. This replaces the previous default logic in the BPF socket-LB datapath involving local redirect services LB that led to unintended error cases.
LRPs with the new flag are processed by the LRP manager, and translated into entries populated in newly introduced maps that enable socket-LB datapath to skip local redirect service translation from certain pods.

Highlights of PR commits

(Check commits description for more details.)

  • [CRD] Extend LRP CRD spec with the new flag.
  • [Plugin, openapi, pkg/endpoint] Preparatory commits to get endpoint netns cookies metadata.
  • [pkg/maps] Define new BPF maps to store the translated LRP state with the new flag.
  • [pkg/redirectpolicy] Update the LRP manager to process and plumb such user configured policies.
  • [BPF] Revise BPF datapath to implement the desired LRP behavior for certain traffic
  • [BPF] Unit tests
  • [Doc] Document the flag along with requirements.
    Tested manually on a local cluster. An e2e test will be added in the cilium-cli connectivity suite as a follow-up once this dependency PR is merged.

Example LRP with the extended spec -

lrp-skip-lb (1)

Fixes: #15195
Fixes: #20164

Fix LRP error cases where node-local redirection was erroneously skipped. Extend LRP spec in order for users to explicitly skip node-local redirection from LRP selected backend pods.

@aditighag aditighag added release-note/bug This PR fixes an issue in a previous release of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Jun 12, 2023
@maintainer-s-little-helper
Copy link

Commit 7a72d6b0c2038df448af6c8917364ff7ea6a1d4c does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 14, 2023
@aditighag
Copy link
Member Author

/test-1.26-net-next

@aditighag aditighag force-pushed the pr/aditighag/lrp-skip-lb branch 2 times, most recently from c19f5f7 to c94624f Compare June 16, 2023 00:21
@maintainer-s-little-helper
Copy link

Commits ac33fe3b1bd7b70162b02d8a87006fb21ead34d7, c94624fbde2f7aa1112888c8fe1a5c8f66b171cf do not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper
Copy link

Commits ac33fe3b1bd7b70162b02d8a87006fb21ead34d7, 51b98be92ce1af6150f1d2c04bbd5305b6d21474 do not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper
Copy link

Commits 56491a4, 49c78c6 do not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@joestringer joestringer removed the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Jun 22, 2023
@joestringer
Copy link
Member

@aditighag I've dropped release-blocker from this PR, it seems significant enough that it needs thorough review, and a compressed timeline will not help to get that. Feature freeze was last friday, and even if we consider that maybe this is fixing a known limitation, it does not look like it's tracking to land in time for v1.14 branching next week. If we really think that this is worthwhile backporting to v1.14 at a later time, we can first get it into a form that we're happy with, merge into main, and then make the call whether a backport makes sense or not. Either way we can get this into an upcoming v1.15 monthly snapshot release.

@aditighag
Copy link
Member Author

I've dropped release-blocker from this PR, it seems significant enough that it needs thorough review,

Yes, agree with the assessment.

@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 27, 2023
@github-actions
Copy link

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Aug 10, 2023
@aditighag aditighag reopened this Oct 26, 2023
@maintainer-s-little-helper
Copy link

Commits 56491a4, 49c78c6 do not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 27, 2023
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 26, 2023
Copy link

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Dec 10, 2023
@aditighag aditighag reopened this Dec 11, 2023
@aditighag aditighag added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Apr 22, 2024
When a Local Redirect Policy is applied, cilium BPF
datapath translates frontend (identified by ip/port/protocol tuple)
address from the policy to a node-local backend pod selected by the
policy. However, in certain use cases, users may want to skip
translating traffic from the backend to the frontend that gets
redirected to the same backend. This behavior was enforced by default
previously, and users didn't have a way to disable this behavior.
Since this is a special case, introduce a new flag in the policy spec
that users can explicitly set for certain use cases.
Follow-up commits will implement the processing of this configuration.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
This is a preparatory commit to enable local redirect
policies that have the `skipRedirectFromBackend` flag set
in their specs.
Endpoint netns cookies allocated by the Linux kernel for network
namespaces are used as identifiers in the Cilium socket-LB BPF
datapath in order to implement the LRP behavior in follow-up commits.
Additionally, netns cookie metadata for pods are retrieved from the
cni plugin process as it runs on the host, and hence has access to all
pod network namespaces. Cilium agent pod on the other hand doesn't see
non-root network namespaces created for other pods.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
This is a preparatory commit to read netns cookie
set in the EndpointChangeRequest API calls. Add
methods to retrieve the metadata for an endpoint.
Follow-up commits will invoke these methods to
realize certain local redirect policies.
The netns cookie is also persisted along with the
other endpoint state so that the metadata is available
post cilium agent restarts.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
The BPF maps store certain LRP configurations
where traffic from source pods destined to
LRP frontends need to skip service load-balancing.
The Cilium socket-LB datapath will
identify such traffic by doing a lookup based
on source pod identifier (netns cookie) and
the destination address/port the pod is connecting to.
Service LB is skipped for tuple entries found in the
map.
The map updates are handled in follow-up commits.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Extend the `Subscriber` interface so that consumers can
subscribe to events related to restored endpoints.
This will be used by the local redirect policy manager
in subsequent commits to get metadata for restored endpoints.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Previously, the LRP manager read certain metadata for
all pods, regardless of whether the pods were selected
by deployed local redirect policies.
Optimize this by getting metadata only for pods selected
by policies.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
… flag

Read the flag from user configured LRPs. Further processing of the flag
is handled in subsequent commits.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
This commit handles the processing of policies set with the
skipRedirectFromBackend flag, and plumbing of such policies
in the relevant skip_lb{4,6} BPF maps introduced in
previous commits. Such policies are translated into
map entry tuples with policy selected backend pods netns cookie
and the policy frontend the pods connect to.
To that end, the LRP manager subscribes to endpoint related events
in order to get the netns cookie metadata. It also handles different
orders in which various kubernetes resources such as pods, endpoints,
LRPs are deployed.
Also, remove the socket lookup BPF helper checks that are not required
anymore with the revised logic. 

Includes unit test cases to test the implementation.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
This commit revises the socket-LB datapath to
skip load-balancing for traffic matching user
configured local redirect policies with the
skipRedirectFromBackend flag.

Old logic:
Previously, this was enforced by default for all local redirect
services with the help of bpf_sk_lookup_* helpers. The intention was
to identify cases where traffic from an LRP backend pod was getting
redirected to itself using the socket lookup helpers. However, this
logic led to cases where LB was skipped erroneously based on the
unexpected behavior of the sk_lookup API implementation. Particularly,
the logic intended to get established sockets matching passed ip/port
tuple, but the kernel API can also return listening sockets matching
just the passed port [1]. As a result, when a client pod has a process
listening on the same port as deployed LRP(s), traffic from the pod
destined to an LRP service frontend was not load-balanced, thereby
skipping the desired node-local redirection.

New logic:
Skip LB only for socket tuples matching user configured local
redirect policy tuples populated in the cilium_skip_lb{4,6} maps.
The maps are populated by the userspace based on user deployed
LRPs.
The BPF cgroup hooks don't have access to the source
IP address set in the BPF socket `ctx`, but it needs some
kind of identifier to skip LB only for traffic from source
entities selected by user configured LRPs. To that end, we
use the netns cookie retrieved from the 'ctx' as this identifier.

[1] https://elixir.bootlin.com/linux/v6.8/source/include/net/inet_hashtables.h#L425

Signed-off-by: Aditi Ghag <aditi@cilium.io>
This commit adds additional service map
related helpers that'll be used in
follow-up commits.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Adds test cases to validate various cases
around local redirect policy configurations
where certain cases skip service translation
in the socket-LB BPF datapath.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Extends the alignchecker structs list with
the newly added map key structs.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Document the flag behavior and requirements.
Update the kiam LRP spec that needs to enable
this flag.
Also, extend the upgrade guide with the updated
behavior.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag
Copy link
Member Author

aditighag commented Apr 23, 2024

/test

Hit - #31827.

@aditighag
Copy link
Member Author

@joamaki PTAL

@aditighag aditighag added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. labels Apr 23, 2024
bpf/bpf_sock.c Show resolved Hide resolved
pkg/redirectpolicy/manager.go Show resolved Hide resolved
@aditighag aditighag removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Apr 24, 2024
@aditighag aditighag added this pull request to the merge queue Apr 24, 2024
Merged via the queue into cilium:main with commit 0356174 Apr 24, 2024
63 of 64 checks passed
@aditighag aditighag deleted the pr/aditighag/lrp-skip-lb branch April 24, 2024 14:44
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pinned These issues are not marked stale by our issue bot. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
7 participants