New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.13] Backports 2024-03-05 #31161
[v1.13] Backports 2024-03-05 #31161
Conversation
290469f
to
5c0dce0
Compare
/test-backport-1.13 |
/test-1.25-4.19 |
/test-1.18-4.19 |
/test-1.25-4.19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @cilium/tophat I can only really defer to you as author that these changes are properly backported. Approve from that perspective.
As a committer I struggle to understand how this fits the backporting criteria. Now that we're approaching v1.13.13, users have a pretty high expectation of stability in the releases so it makes me a bit nervous to see such invasive changes. I don't see any test coverage changes in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5c0dce0
to
049d7fd
Compare
[ upstream commit 99786be ] [ backporter's notes: needed to resolve complexity issues in subsequent patches ] We first look up the destination endpoint to check for tunnel redirection, and then look it up a second time to access its sec_label and IPSec key. Make the first look-up unconditional, so that we can remove the second. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit ac63856 ] This is an alternative approach to fix cilium#21954, so that we can re-introduce the 2005 from-proxy routing rule in following patches to fix L7 proxy issues. This commit simply allows packets to WORLD as long as they are from ingress proxy. This was one of the solution suggested by Martynas, as recorded in commit message cilium/cilium@c534bb7: One fix was to extend the troublesome check https://github.com/cilium/cilium/blob/v1.12.3/bpf/bpf_host.c#L626 by allowing proxy replies to `WORLD_ID`. To tell if an skb is originated from ingress proxy, the commit extends the semantic of existing flags `TC_INDEX_F_SKIP_{INGRESS,EGRESS}_PROXY`, renames flags to clarify the changed meaning. Fixes: cilium#21954 (Reply from pod to outside is dropped when L7 ingress policy is used) Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 217ae4f ] [ backporter's notes: this is a custom backport to init.sh ] This commit re-introduced the 2005 routes that were removed by cilium@9dd6cfc (datapath: remove 2005 route table for ipv6 only) and cilium@c1a0dba (datapath: remove 2005 route table for ipv4 only). Signed-off-by: Robin Gögge <r.goegge@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 3a93b00 ] Previously we set skb->mark in from_host@cilium_host, expect the mark to remain unchanged after kernel transmits skb from cilium_host to cilium_net. The skb->mark is for instance used to transport IPsec-related information. However, as of 2023-10-19, kernel 5.10 still misses the backport patch[1] to fix a bug in skb_scrub_packet() which clears skb->mark for veth_xmit even if the veth pair is under the same netns: https://elixir.bootlin.com/linux/v5.10.198/source/include/linux/netdevice.h#L3975 To avoid hitting this issue, this patch sets metadata in skb->cb to survive skb_scrub_packet(), then to_host@cilium_net can retrieve this info and set proper mark. Only from_host bpf is setting cb, while from_lxc bpf is still using mark. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within the same name space") Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d2f1ea0 ] With cilium#29530 in place, we now also divert proxy traffic to cilium_host when per-EP routes are enabled. But we potentially still need to deliver this traffic to a local endpoint - say for a pod-to-pod connection on the same node, with L7 proxy inbetween. In a configuration with per-EP routes but no BPF Host-Routing, l3_local_delivery() transfers the source identity to the skb->mark and redirects to bpf_lxc, where the to-container program handles the packet. If we transfer the packet with MARK_MAGIC_IDENTITY, to-container will look up the network policy and redirect to the L7 proxy *again*. Thus we need to fully restore the proxy's actual mark, so that to-container's inherit_identity_from_host() call finds the expected magic ID. It then sets the TC_INDEX_F_FROM_INGRESS_PROXY flag, and skips the redirect to L7 proxy. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 0ebe516 ] [ backporter's notes: this is a custom backport to init.sh. Only apply the change to IPv4 rules. ] With tunnel routing, traffic to remote pods already flows via cilium_host. This is sufficient for what IPsec requires. Thus currently only native routing requires the custom redirect logic for from-ingress proxy traffic. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 9fbd5a8 ] Currently the L7 proxy performs SNAT for traffic when tunnel routing is enabled, even for cluster-internal traffic. This prevents cilium_host from detecting pod-level traffic, and we thus can't apply features. Modify SupportsOriginalSourceAddr(), so that the proxy doesn't SNAT such traffic when some conditions are met. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 244a5e9 ] GKE has DROP policy for filter table, so we have to explicitly accept proxy traffic. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d4b81c0 ] from-proxy traffic gets redirected to cilium_host. Skip the proxy paths when handle_ipv*_cont() is included by from-netdev. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit f018b20 ] Once forward traffic for an egress proxy connection has traversed through cilium_host / cilium_net, we expect IPsec-marked packets to get handled by xfrm. But this currently conflicts with an iptables rule for the proxy's transparent socket, which then over-writes the mark: -A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff We can avoid this by adding an extra filter to this rule, so that it doesn't match IPsec-marked packets. Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 5201896 ] [ backporter's notes: this is a backport to pre-cell iptables ] After forward traffic for an egress proxy onnection has traversed through cilium_host / cilium_net, we expect IPsec-marked packets to get handled by xfrm. This currently conflicts with early demux, which matches the connection's transparent socket and assigns it to the packet: ``` // https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/tcp_ipv4.c#L1770 int tcp_v4_early_demux(struct sk_buff *skb) { ... sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif, inet_sdif(skb)); if (sk) { skb->sk = sk; ... } ``` It then gets dropped in ip_forward(), before reaching xfrm: ``` // https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/ip_forward.c#L100 int ip_forward(struct sk_buff *skb) { ... if (unlikely(skb->sk)) goto drop; ... } ``` To avoid this we disable early-demux in a L7 + IPsec config. Note that the L7 proxy feature needs to deal with similar troubles, as the comment for inboundProxyRedirectRule() describes. Ideally we would build a similar solution for IPsec, diverting traffic with policy routing so that it doesn't get intercepted by early-demux. Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e96e9cd ] The from-host path already knows how to handle traffic that comes from the ingress proxy. Extend this logic to also cover traffic that originates from the egress proxy. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This is a v1.13-only patch, the closest upstream equivalent is 217ae4f ("Re-introduce 2005 route table"). It slightly differs from the v1.14 variant 579d0a4 ("proxy: also install from-ingress-proxy rules with per-EP routing"), as for v1.13 we had only merged 4c441ab ("datapath: remove 2005 route table for ipv4 only"). Thus IPv6 is still using the 2005 rule, and we want to preserve existing behaviour as much as possible. In a config with per-EP routes, we therefore only install the 2005 rule for IPv6 when IPsec strictly requires it. Egressing traffic would usually get routed straight to eth0. Install the 2005 rule to divert the traffic into cilium_host first. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
049d7fd
to
e145638
Compare
Updated e145638 with a code comment and detailed patch description. |
/test-backport-1.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for detailed commit message!
#1629 initially limited the IPv6 test to v1.14, as v1.13 was missing some functionality. But now that v1.13 has cilium/cilium#31161, the IPv6 path should actually work. So let's extend the test coverage. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Manual backport of
Once this PR is merged, a GitHub action will update the labels of these PRs: