Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.13] Backports 2024-03-05 #31161

Merged
merged 14 commits into from Mar 7, 2024
Merged

Conversation

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Mar 5, 2024
@julianwiedmann
Copy link
Member Author

/test-backport-1.13

@julianwiedmann
Copy link
Member Author

/test-1.25-4.19

@julianwiedmann
Copy link
Member Author

/test-1.18-4.19

@julianwiedmann
Copy link
Member Author

/test-1.25-4.19

@julianwiedmann julianwiedmann marked this pull request as ready for review March 5, 2024 20:44
@julianwiedmann julianwiedmann requested a review from a team as a code owner March 5, 2024 20:44
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @cilium/tophat I can only really defer to you as author that these changes are properly backported. Approve from that perspective.

As a committer I struggle to understand how this fits the backporting criteria. Now that we're approaching v1.13.13, users have a pretty high expectation of stability in the releases so it makes me a bit nervous to see such invasive changes. I don't see any test coverage changes in the PR.

@julianwiedmann julianwiedmann added the release-blocker/1.13 This issue will prevent the release of the next version of Cilium. label Mar 6, 2024
@joestringer joestringer mentioned this pull request Mar 6, 2024
6 tasks
Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The only thing I want to say is about the separate routing rules for v4/v6. It's unique and different from v1.14 backport: 1.13 vs 1.14

It would be nice to have some comments for other maintainers.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 7, 2024
@julianwiedmann
Copy link
Member Author

Thank you! The only thing I want to say is about the separate routing rules for v4/v6. It's unique and different from v1.14 backport: 1.13 vs 1.14

It would be nice to have some comments for other maintainers.

Right, that one is subtle - thank you! I'll improve the patch description / add a comment.

@julianwiedmann julianwiedmann added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 7, 2024
julianwiedmann and others added 11 commits March 7, 2024 11:21
[ upstream commit 99786be ]

[ backporter's notes: needed to resolve complexity issues in subsequent
  patches ]

We first look up the destination endpoint to check for tunnel redirection,
and then look it up a second time to access its sec_label and IPSec key.

Make the first look-up unconditional, so that we can remove the second.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit ac63856 ]

This is an alternative approach to fix cilium#21954, so that we
can re-introduce the 2005 from-proxy routing rule in following patches
to fix L7 proxy issues.

This commit simply allows packets to WORLD as long as they are from
ingress proxy. This was one of the solution suggested by Martynas, as
recorded in commit message cilium/cilium@c534bb7:

    One fix was to extend the troublesome check
    https://github.com/cilium/cilium/blob/v1.12.3/bpf/bpf_host.c#L626 by
    allowing proxy replies to `WORLD_ID`.

To tell if an skb is originated from ingress proxy, the commit extends
the semantic of existing flags `TC_INDEX_F_SKIP_{INGRESS,EGRESS}_PROXY`,
renames flags to clarify the changed meaning.

Fixes: cilium#21954 (Reply from pod to outside is dropped when L7 ingress policy is used)

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 217ae4f ]

[ backporter's notes: this is a custom backport to init.sh ]

This commit re-introduced the 2005 routes that were removed by
cilium@9dd6cfc (datapath: remove 2005 route table for ipv6 only)
and cilium@c1a0dba (datapath: remove 2005 route table for ipv4 only).

Signed-off-by: Robin Gögge <r.goegge@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 3a93b00 ]

Previously we set skb->mark in from_host@cilium_host, expect the mark
to remain unchanged after kernel transmits skb from cilium_host to
cilium_net. The skb->mark is for instance used to transport
IPsec-related information.

However, as of 2023-10-19, kernel 5.10 still misses the backport patch[1]
to fix a bug in skb_scrub_packet() which clears skb->mark for veth_xmit even if the
veth pair is under the same netns:
https://elixir.bootlin.com/linux/v5.10.198/source/include/linux/netdevice.h#L3975

To avoid hitting this issue, this patch sets metadata in skb->cb to
survive skb_scrub_packet(), then to_host@cilium_net can retrieve this
info and set proper mark.

Only from_host bpf is setting cb, while from_lxc bpf is still using mark.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within the same name space")

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d2f1ea0 ]

With cilium#29530 in place, we now also
divert proxy traffic to cilium_host when per-EP routes are enabled. But we
potentially still need to deliver this traffic to a local endpoint - say
for a pod-to-pod connection on the same node, with L7 proxy inbetween.

In a configuration with per-EP routes but no BPF Host-Routing,
l3_local_delivery() transfers the source identity to the skb->mark and
redirects to bpf_lxc, where the to-container program handles the packet.

If we transfer the packet with MARK_MAGIC_IDENTITY, to-container will
look up the network policy and redirect to the L7 proxy *again*. Thus we
need to fully restore the proxy's actual mark, so that to-container's
inherit_identity_from_host() call finds the expected magic ID. It then
sets the TC_INDEX_F_FROM_INGRESS_PROXY flag, and skips the redirect to
L7 proxy.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 0ebe516 ]

[ backporter's notes: this is a custom backport to init.sh. Only apply
  the change to IPv4 rules. ]

With tunnel routing, traffic to remote pods already flows via cilium_host.
This is sufficient for what IPsec requires. Thus currently only native
routing requires the custom redirect logic for from-ingress proxy traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 9fbd5a8 ]

Currently the L7 proxy performs SNAT for traffic when tunnel routing is
enabled, even for cluster-internal traffic. This prevents cilium_host from
detecting pod-level traffic, and we thus can't apply features.

Modify SupportsOriginalSourceAddr(), so that the proxy doesn't SNAT such
traffic when some conditions are met.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 244a5e9 ]

GKE has DROP policy for filter table, so we have to explicitly accept
proxy traffic.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d4b81c0 ]

from-proxy traffic gets redirected to cilium_host. Skip the proxy paths
when handle_ipv*_cont() is included by from-netdev.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit f018b20 ]

Once forward traffic for an egress proxy connection has traversed through
cilium_host / cilium_net, we expect IPsec-marked packets to get handled
by xfrm. But this currently conflicts with an iptables rule for the
proxy's transparent socket, which then over-writes the mark:

    -A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff

We can avoid this by adding an extra filter to this rule, so that it
doesn't match IPsec-marked packets.

Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
jschwinger233 and others added 3 commits March 7, 2024 11:24
[ upstream commit 5201896 ]

[ backporter's notes: this is a backport to pre-cell iptables ]

After forward traffic for an egress proxy onnection has traversed through
cilium_host / cilium_net, we expect IPsec-marked packets to get handled
by xfrm.

This currently conflicts with early demux, which matches the connection's
transparent socket and assigns it to the packet:

```
// https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/tcp_ipv4.c#L1770
int tcp_v4_early_demux(struct sk_buff *skb)
{
...
	sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
				       iph->saddr, th->source,
				       iph->daddr, ntohs(th->dest),
				       skb->skb_iif, inet_sdif(skb));
	if (sk) {
		skb->sk = sk;
...
}
```

It then gets dropped in ip_forward(), before reaching xfrm:

```
// https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/ip_forward.c#L100
int ip_forward(struct sk_buff *skb)
{
...
    if (unlikely(skb->sk))
		goto drop;
...
}
```

To avoid this we disable early-demux in a L7 + IPsec config.

Note that the L7 proxy feature needs to deal with similar troubles, as the
comment for inboundProxyRedirectRule() describes. Ideally we would build
a similar solution for IPsec, diverting traffic with policy routing so that
it doesn't get intercepted by early-demux.

Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e96e9cd ]

The from-host path already knows how to handle traffic that comes from
the ingress proxy. Extend this logic to also cover traffic that originates
from the egress proxy.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This is a v1.13-only patch, the closest upstream equivalent is
217ae4f ("Re-introduce 2005 route table").

It slightly differs from the v1.14 variant 579d0a4
("proxy: also install from-ingress-proxy rules with per-EP routing"), as
for v1.13 we had only merged 4c441ab
("datapath: remove 2005 route table for ipv4 only"). Thus IPv6 is still
using the 2005 rule, and we want to preserve existing behaviour as much as
possible. In a config with per-EP routes, we therefore only install the
2005 rule for IPv6 when IPsec strictly requires it.

Egressing traffic would usually get routed straight to eth0. Install the
2005 rule to divert the traffic into cilium_host first.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann
Copy link
Member Author

Updated e145638 with a code comment and detailed patch description.

@julianwiedmann
Copy link
Member Author

/test-backport-1.13

Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for detailed commit message!

@julianwiedmann julianwiedmann removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Mar 7, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 7, 2024
@julianwiedmann julianwiedmann merged commit c743b68 into cilium:v1.13 Mar 7, 2024
61 of 62 checks passed
@julianwiedmann julianwiedmann deleted the v1.13-proxy branch March 7, 2024 12:36
julianwiedmann added a commit to cilium/cilium-cli that referenced this pull request Mar 19, 2024
#1629 initially limited the IPv6
test to v1.14, as v1.13 was missing some functionality.

But now that v1.13 has cilium/cilium#31161, the
IPv6 path should actually work. So let's extend the test coverage.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.13 This issue will prevent the release of the next version of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants