Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Don't encrypt on path hostns -> remote pod #25440

Merged
merged 3 commits into from May 22, 2023

Conversation

pchaigno
Copy link
Member

In pod-to-pod encryption with IPsec and tunneling, Cilium currently encrypts traffic on the path hostns -> remote pod even though traffic is in plain-text on the path remote pod -> hostns. When using native routing, neither of those paths is encrypted because traffic from the hostns doesn't go through the bpf_host BPF program.

Cilium's Transparent Encryption with IPsec aims at encrypting pod-to-pod traffic. It is therefore unclear why we are encrypting traffic from the hostns. The simple fact that only one direction of the connection is encrypted begs the question of its usefulness. It's possible that this traffic was encrypted by mistake: some of this logic is necessary for node-to-node encryption with IPsec (not supported anymore) and pod-to-pod encryption may have been somewhat simplified to encrypt *-to-pod traffic.

Encrypting traffic from the hostns nevertheless creates several issues. First, this situation creates a path asymmetry between the forward and reply paths of hostns<>remote pod connections. Path asymmetry issues are well known to be a source of bugs, from of --ctstate INVALID -j DROP iptables rules to NAT issues.

Second, Gray recently uncovered a separate bug which, when combined with this encryption from hostns, can prevent Cilium from starting. That separate bug is still being investigated but it seems to cause the reload of bpf_host to depend on Cilium connecting to etcd in a clustermesh context. If this etcd is a remote pod, Cilium connects to it on hostns -> remote pod path. The bpf_host program being unloaded[1], it fails. We end up in a cyclic dependency: bpf_host requires connectivity to etcd, connectivity to etcd requires bpf_host.

This pull request therefore removes encryption with IPsec for the path hostns -> remote pod when using tunneling (already unencrypted when using native routing).

1 - More specifically, in Gray's case, the bpf_host program is already loaded, but it needs to be reloaded because the IPsec XFRM config changed. Without this reload, encryption fails.

Fix path asymmetry when using pod-to-pod encryption with IPsec and tunnel mode.

@pchaigno pchaigno added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. labels May 14, 2023
@pchaigno pchaigno requested a review from a team as a code owner May 14, 2023 21:42
@nbusseneau

This comment was marked as outdated.

@jschwinger233
Copy link
Member

jschwinger233 commented May 15, 2023

Looks good, just one point which I'm not sure: can we delete this as there seems to be no more skb with mark 0xe00 (MARK_MAGIC_ENCRYPT) reaching cilium_net ingress.

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question to clarify, mostly to understand if this affects more than just the hostns -> remote pod path.

bpf/bpf_host.c Show resolved Hide resolved
bpf/bpf_host.c Show resolved Hide resolved
bpf/lib/encap.h Show resolved Hide resolved
@pchaigno
Copy link
Member Author

Looks good, just one point which I'm not sure: can we delete this as there seems to be no more skb with mark 0xe00 (MARK_MAGIC_ENCRYPT) reaching cilium_net ingress.

I think that's correct. I'll try to remove it and rerun CI.

@pchaigno pchaigno requested a review from a team as a code owner May 16, 2023 19:58
@julianwiedmann
Copy link
Member

Looks good, just one point which I'm not sure: can we delete this as there seems to be no more skb with mark 0xe00 (MARK_MAGIC_ENCRYPT) reaching cilium_net ingress.

Agreed. We're no longer using CB_ENCRYPT_MAGIC, so the ENCRYPT_OR_PROXY_MAGIC essentially becomes CB_PROXY_MAGIC now.

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it - just for the fact that it brings us in line with the behaviour in native-routing 😁.

Thank you Paul!

bpf/bpf_host.c Show resolved Hide resolved
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for agent related changes 🚀

TL;DR. this commit removes a bit of dead code that seems to have been
intended for IPsec in native routing mode but is never actually
executed.

These code paths are only executed if going through cilium_host and
coming from the host (see !from_host check above). For remote
destinations, we only go through cilium_host if the destination is part
of a remote pod CIDR and we are running in tunneling mode. In native
routing mode, we go straight to the native device.

Example routing table for tunneling (10.0.0.0/24 is the remote pod CIDR):

    10.0.0.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 mtu 1373 <- we follow this
    10.0.1.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61
    10.0.1.61 dev cilium_host scope link
    192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.11**

Example routing table for native routing:

    10.0.0.0/24 via 192.168.56.12 dev enp0s8 <- we follow this
    10.0.1.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61
    10.0.1.61 dev cilium_host scope link
    192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.11

Thus, this code path is only used for tunneling with IPsec. However,
IPsec in tunneling mode should already be handled by the
encap_and_redirect_with_nodeid call above in the same functions (see
info->key argument).

So why was this added? It was added in commit b76e6eb ("cilium:
ipsec, support direct routing modes") to support "direct routing modes".
I found that very suspicious because, per the above, in native routing
mode, traffic from the hostns shouldn't even go through cilium_host.

I thus tested it out. I've checked IPsec with native routing mode, with
and without endpoint routes. I can confirm that, in all those cases,
traffic from the hostns is not encrypted when going to a remote pod.

Therefore, this code is dead. I'm unsure when it died.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
In pod-to-pod encryption with IPsec and tunneling, Cilium currently
encrypts traffic on the path hostns -> remote pod even though traffic
is in plain-text on the path remote pod -> hostns. When using native
routing, neither of those paths is encrypted because traffic from the
hostns doesn't go through the bpf_host BPF program.

Cilium's Transparent Encryption with IPsec aims at encrypting pod-to-pod
traffic. It is therefore unclear why we are encrypting traffic from the
hostns. The simple fact that only one direction of the connection is
encrypted begs the question of its usefulness. It's possible that this
traffic was encrypted by mistake: some of this logic is necessary for
node-to-node encryption with IPsec (not supported anymore) and
pod-to-pod encryption may have been somewhat simplified to encrypt
*-to-pod traffic.

Encrypting traffic from the hostns nevertheless creates several issues.
First, this situation creates a path asymmetry between the forward and
reply paths of hostns<>remote pod connections. Path asymmetry issues are
well known to be a source of bugs, from of '--ctstate INVALID -j DROP'
iptables rules to NAT issues.

Second, Gray recently uncovered a separate bug which, when combined with
this encryption from hostns, can prevent Cilium from starting. That
separate bug is still being investigated but it seems to cause the
reload of bpf_host to depend on Cilium connecting to etcd in a
clustermesh context. If this etcd is a remote pod, Cilium connects to it
on hostns -> remote pod path. The bpf_host program being unloaded[1], it
fails. We end up in a cyclic dependency: bpf_host requires connectivity
to etcd, connectivity to etcd requires bpf_host.

This commit therefore removes encryption with IPsec for the path hostns
-> remote pod when using tunneling (already unencrypted when using
native routing).

1 - More specifically, in Gray's case, the bpf_host program is already
    loaded, but it needs to be reloaded because the IPsec XFRM config
    changed. Without this reload, encryption fails.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
For the similar reasons as in the previous commit, we don't want to
encrypt traffic going from a pod to the CiliumInternalIP. This is
currently the only node IP address type that is associated an encryption
key.

Since we don't encrypt traffic from the hostns to remote pods anymore
(see previous commit), encrypting traffic going to a CiliumInternalIP
(remote node) would result in a path asymmetry: traffic going to the
CiliumInternalIP would be encrypted, whereas reply traffic coming from
the CiliumInternalIP wouldn't.

This commit removes that caseand therefore ensures we never encrypt
traffic going to a node IP address.

Reported-by: Gray Lian <gray.liang@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno
Copy link
Member Author

This was all green except for Travis CI hitting the known timeout flake, but I had to rebase to fix a conflict.

@nbusseneau

This comment was marked as off-topic.

2 similar comments
@nbusseneau

This comment was marked as off-topic.

@nbusseneau

This comment was marked as outdated.

@pchaigno
Copy link
Member Author

/test-vagrant

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 22, 2023
@pchaigno pchaigno merged commit 3b3e8d0 into cilium:main May 22, 2023
58 checks passed
@julianwiedmann julianwiedmann added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. labels Jul 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.18 Jul 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 Jul 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 Jul 25, 2023
@margamanterola margamanterola removed this from Backport done to v1.13 in 1.13.3 Jul 26, 2023
@margamanterola margamanterola removed this from Backport done to v1.11 in 1.11.18 Jul 26, 2023
@margamanterola margamanterola removed this from Backport done to v1.12 in 1.12.10 Jul 26, 2023
@margamanterola margamanterola added this to Backport done to v1.11 in 1.11.19 Jul 26, 2023
@margamanterola margamanterola added this to Backport done to v1.12 in 1.12.12 Jul 26, 2023
@margamanterola margamanterola added this to Backport done to v1.13 in 1.13.5 Jul 26, 2023
@margamanterola
Copy link
Member

mlh had added this to the wrong projects, because the PR had been merged back in May. Moved to the correct projects now.

jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Oct 12, 2023
This commit brings back host-to-remote-pod packets as long as they are
from proxy (with mark 0xa00).

We removed encryption on path hostns -> remote pod on
cilium#25440, because cilium aims at encrypting pod-to-pod traffic.

However, we recently realized IPsec with L7 proxy relies on
host-to-remote-pod encryption: for normal pod to pod connection, cilium
proxy will hijack the traffic on hostns, respond on behalf of dest pod
from hostns. Unconditionally removing IPsec encryption code from
bpf_host leads to encryption leak, those reply packets from proxy go out
in plain-text.

To implement the encryption for from-proxy packets only, we have to
check skb->mark at some point, but by the time we look up ipcache,
skb->mark has been cleared.

Paul suggested three possible solutions:

    1. We move this logic to earlier in the packet processing (likely in
       do_netdev). It's option A you described before. That means we
       will have two lookups into the ipcache for a large number of
       packets
    2. We don't reset the packet mark in inherit_identity_from_host such
       that we can use the value in subsequent tail called programs.
       That's another variant of option B. That could however many
       implications on the existing logic that are hard to anticipate.
       Not something we want to change in a backported bugfix.
    3. We pass the information we need via the skb->cb. Yet another
       variant of option B. There's a small risk of conflicts with other
       cb values, but that's as usual. That's however basically the same
       as what you did except you used skb->tc_index instead of skb->cb.

As none of them is satisfactory, we decided to use skb->tc_index and
flag `TC_INDEX_F_SKIP_INGRESS_PROXY` to tell if the skb is from proxy,
as long as we rename the flag `TC_INDEX_F_SKIP_{INGRESS,EGRESS}_PROXY`
to clarify that the meaning of that flag changed.

Fixes: cilium#25440 (bpf: Don't encrypt on path hostns -> remote pod)

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Dec 1, 2023
Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Dec 1, 2023
Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Dec 1, 2023
Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Dec 1, 2023
Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Dec 1, 2023
Previously #25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: #25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
pjablonski123 pushed a commit to pjablonski123/cilium that referenced this pull request Dec 15, 2023
Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
julianwiedmann pushed a commit to julianwiedmann/cilium that referenced this pull request Mar 5, 2024
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit to julianwiedmann/cilium that referenced this pull request Mar 5, 2024
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit to julianwiedmann/cilium that referenced this pull request Mar 5, 2024
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit to julianwiedmann/cilium that referenced this pull request Mar 5, 2024
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit that referenced this pull request Mar 7, 2024
[ upstream commit e78ff16 ]

Previously #25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: #25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit to julianwiedmann/cilium that referenced this pull request Mar 7, 2024
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit to julianwiedmann/cilium that referenced this pull request Mar 7, 2024
[ upstream commit e78ff16 ]

Previously cilium#25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: cilium#25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann pushed a commit that referenced this pull request Mar 7, 2024
[ upstream commit e78ff16 ]

Previously #25440 removed
bpf_host's logic for host-to-remote-pod packets.

However, we recently realized such host-to-remote-pod traffic can also
be pod-to-pod traffic passing through L7 proxy. This commit made
bpf_host capable of handling these host-to-remote-pod packets as long as
they are originated from L7 proxy.

Fixes: #25440

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.11.19
Backport done to v1.11
1.12.12
Backport done to v1.12
1.13.5
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

7 participants