Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.14] Backports 2024-03-05 #31160

Merged
merged 13 commits into from Mar 7, 2024
Merged

Conversation

[ upstream commit ac63856 ]

This is an alternative approach to fix cilium#21954, so that we
can re-introduce the 2005 from-proxy routing rule in following patches
to fix L7 proxy issues.

This commit simply allows packets to WORLD as long as they are from
ingress proxy. This was one of the solution suggested by Martynas, as
recorded in commit message cilium/cilium@c534bb7:

    One fix was to extend the troublesome check
    https://github.com/cilium/cilium/blob/v1.12.3/bpf/bpf_host.c#L626 by
    allowing proxy replies to `WORLD_ID`.

To tell if an skb is originated from ingress proxy, the commit extends
the semantic of existing flags `TC_INDEX_F_SKIP_{INGRESS,EGRESS}_PROXY`,
renames flags to clarify the changed meaning.

Fixes: cilium#21954 (Reply from pod to outside is dropped when L7 ingress policy is used)

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Mar 5, 2024
rgo3 and others added 12 commits March 5, 2024 13:44
[ upstream commit 217ae4f ]

[ backporter's notes: this is a custom backport to init.sh ]

This commit re-introduced the 2005 routes that were removed by
cilium@9dd6cfc (datapath: remove 2005 route table for ipv6 only)
and cilium@c1a0dba (datapath: remove 2005 route table for ipv4 only).

Signed-off-by: Robin Gögge <r.goegge@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 3a93b00 ]

Previously we set skb->mark in from_host@cilium_host, expect the mark
to remain unchanged after kernel transmits skb from cilium_host to
cilium_net. The skb->mark is for instance used to transport
IPsec-related information.

However, as of 2023-10-19, kernel 5.10 still misses the backport patch[1]
to fix a bug in skb_scrub_packet() which clears skb->mark for veth_xmit even if the
veth pair is under the same netns:
https://elixir.bootlin.com/linux/v5.10.198/source/include/linux/netdevice.h#L3975

To avoid hitting this issue, this patch sets metadata in skb->cb to
survive skb_scrub_packet(), then to_host@cilium_net can retrieve this
info and set proper mark.

Only from_host bpf is setting cb, while from_lxc bpf is still using mark.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within the same name space")

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d2f1ea0 ]

With cilium#29530 in place, we now also
divert proxy traffic to cilium_host when per-EP routes are enabled. But we
potentially still need to deliver this traffic to a local endpoint - say
for a pod-to-pod connection on the same node, with L7 proxy inbetween.

In a configuration with per-EP routes but no BPF Host-Routing,
l3_local_delivery() transfers the source identity to the skb->mark and
redirects to bpf_lxc, where the to-container program handles the packet.

If we transfer the packet with MARK_MAGIC_IDENTITY, to-container will
look up the network policy and redirect to the L7 proxy *again*. Thus we
need to fully restore the proxy's actual mark, so that to-container's
inherit_identity_from_host() call finds the expected magic ID. It then
sets the TC_INDEX_F_FROM_INGRESS_PROXY flag, and skips the redirect to
L7 proxy.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 0ebe516 ]

[ backporter's notes: this is a custom backport to init.sh ]

With tunnel routing, traffic to remote pods already flows via cilium_host.
This is sufficient for what IPsec requires. Thus currently only native
routing requires the custom redirect logic for from-ingress proxy traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 9fbd5a8 ]

Currently the L7 proxy performs SNAT for traffic when tunnel routing is
enabled, even for cluster-internal traffic. This prevents cilium_host from
detecting pod-level traffic, and we thus can't apply features.

Modify SupportsOriginalSourceAddr(), so that the proxy doesn't SNAT such
traffic when some conditions are met.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 244a5e9 ]

GKE has DROP policy for filter table, so we have to explicitly accept
proxy traffic.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d4b81c0 ]

from-proxy traffic gets redirected to cilium_host. Skip the proxy paths
when handle_ipv*_cont() is included by from-netdev.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit f018b20 ]

Once forward traffic for an egress proxy connection has traversed through
cilium_host / cilium_net, we expect IPsec-marked packets to get handled
by xfrm. But this currently conflicts with an iptables rule for the
proxy's transparent socket, which then over-writes the mark:

    -A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff

We can avoid this by adding an extra filter to this rule, so that it
doesn't match IPsec-marked packets.

Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 5201896 ]

[ backporter's notes: this is a backport to pre-cell iptables ]

After forward traffic for an egress proxy onnection has traversed through
cilium_host / cilium_net, we expect IPsec-marked packets to get handled
by xfrm.

This currently conflicts with early demux, which matches the connection's
transparent socket and assigns it to the packet:

```
// https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/tcp_ipv4.c#L1770
int tcp_v4_early_demux(struct sk_buff *skb)
{
...
	sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
				       iph->saddr, th->source,
				       iph->daddr, ntohs(th->dest),
				       skb->skb_iif, inet_sdif(skb));
	if (sk) {
		skb->sk = sk;
...
}
```

It then gets dropped in ip_forward(), before reaching xfrm:

```
// https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/ip_forward.c#L100
int ip_forward(struct sk_buff *skb)
{
...
    if (unlikely(skb->sk))
		goto drop;
...
}
```

To avoid this we disable early-demux in a L7 + IPsec config.

Note that the L7 proxy feature needs to deal with similar troubles, as the
comment for inboundProxyRedirectRule() describes. Ideally we would build
a similar solution for IPsec, diverting traffic with policy routing so that
it doesn't get intercepted by early-demux.

Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e96e9cd ]

The from-host path already knows how to handle traffic that comes from
the ingress proxy. Extend this logic to also cover traffic that originates
from the egress proxy.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This is a v1.14-only patch, the closest upstream equivalent is
217ae4f ("Re-introduce 2005 route table").

Egressing traffic would usually get routed straight to eth0. Install the
2005 rule to divert the traffic into cilium_host first.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann
Copy link
Member Author

/test-backport-1.14

@julianwiedmann julianwiedmann marked this pull request as ready for review March 5, 2024 15:29
@julianwiedmann julianwiedmann requested a review from a team as a code owner March 5, 2024 15:29
@julianwiedmann julianwiedmann added the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Mar 6, 2024
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as #31161 (review), this seems like an unconventional application of the backporting policy with higher risk than I'd usually advocate for with no clear sign of testing to mitigate the risk. Deferring to you as the committer+author sponsor of these changes.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 6, 2024
@julianwiedmann julianwiedmann merged commit 579d0a4 into cilium:v1.14 Mar 7, 2024
59 checks passed
@julianwiedmann julianwiedmann deleted the v1.14-proxy branch March 7, 2024 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants