New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Don't encrypt on path hostns -> remote pod
#25440
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Looks good, just one point which I'm not sure: can we delete this as there seems to be no more skb with mark 0xe00 (MARK_MAGIC_ENCRYPT) reaching cilium_net ingress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question to clarify, mostly to understand if this affects more than just the hostns -> remote pod
path.
I think that's correct. I'll try to remove it and rerun CI. |
f76464c
to
103e277
Compare
Agreed. We're no longer using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it - just for the fact that it brings us in line with the behaviour in native-routing 😁.
Thank you Paul!
103e277
to
e1f313f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for agent related changes 🚀
TL;DR. this commit removes a bit of dead code that seems to have been intended for IPsec in native routing mode but is never actually executed. These code paths are only executed if going through cilium_host and coming from the host (see !from_host check above). For remote destinations, we only go through cilium_host if the destination is part of a remote pod CIDR and we are running in tunneling mode. In native routing mode, we go straight to the native device. Example routing table for tunneling (10.0.0.0/24 is the remote pod CIDR): 10.0.0.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 mtu 1373 <- we follow this 10.0.1.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 10.0.1.61 dev cilium_host scope link 192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.11** Example routing table for native routing: 10.0.0.0/24 via 192.168.56.12 dev enp0s8 <- we follow this 10.0.1.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 10.0.1.61 dev cilium_host scope link 192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.11 Thus, this code path is only used for tunneling with IPsec. However, IPsec in tunneling mode should already be handled by the encap_and_redirect_with_nodeid call above in the same functions (see info->key argument). So why was this added? It was added in commit b76e6eb ("cilium: ipsec, support direct routing modes") to support "direct routing modes". I found that very suspicious because, per the above, in native routing mode, traffic from the hostns shouldn't even go through cilium_host. I thus tested it out. I've checked IPsec with native routing mode, with and without endpoint routes. I can confirm that, in all those cases, traffic from the hostns is not encrypted when going to a remote pod. Therefore, this code is dead. I'm unsure when it died. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
In pod-to-pod encryption with IPsec and tunneling, Cilium currently encrypts traffic on the path hostns -> remote pod even though traffic is in plain-text on the path remote pod -> hostns. When using native routing, neither of those paths is encrypted because traffic from the hostns doesn't go through the bpf_host BPF program. Cilium's Transparent Encryption with IPsec aims at encrypting pod-to-pod traffic. It is therefore unclear why we are encrypting traffic from the hostns. The simple fact that only one direction of the connection is encrypted begs the question of its usefulness. It's possible that this traffic was encrypted by mistake: some of this logic is necessary for node-to-node encryption with IPsec (not supported anymore) and pod-to-pod encryption may have been somewhat simplified to encrypt *-to-pod traffic. Encrypting traffic from the hostns nevertheless creates several issues. First, this situation creates a path asymmetry between the forward and reply paths of hostns<>remote pod connections. Path asymmetry issues are well known to be a source of bugs, from of '--ctstate INVALID -j DROP' iptables rules to NAT issues. Second, Gray recently uncovered a separate bug which, when combined with this encryption from hostns, can prevent Cilium from starting. That separate bug is still being investigated but it seems to cause the reload of bpf_host to depend on Cilium connecting to etcd in a clustermesh context. If this etcd is a remote pod, Cilium connects to it on hostns -> remote pod path. The bpf_host program being unloaded[1], it fails. We end up in a cyclic dependency: bpf_host requires connectivity to etcd, connectivity to etcd requires bpf_host. This commit therefore removes encryption with IPsec for the path hostns -> remote pod when using tunneling (already unencrypted when using native routing). 1 - More specifically, in Gray's case, the bpf_host program is already loaded, but it needs to be reloaded because the IPsec XFRM config changed. Without this reload, encryption fails. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
For the similar reasons as in the previous commit, we don't want to encrypt traffic going from a pod to the CiliumInternalIP. This is currently the only node IP address type that is associated an encryption key. Since we don't encrypt traffic from the hostns to remote pods anymore (see previous commit), encrypting traffic going to a CiliumInternalIP (remote node) would result in a path asymmetry: traffic going to the CiliumInternalIP would be encrypted, whereas reply traffic coming from the CiliumInternalIP wouldn't. This commit removes that caseand therefore ensures we never encrypt traffic going to a node IP address. Reported-by: Gray Lian <gray.liang@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
e1f313f
to
4c1ef86
Compare
This was all green except for Travis CI hitting the known timeout flake, but I had to rebase to fix a conflict. |
This comment was marked as off-topic.
This comment was marked as off-topic.
2 similar comments
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as outdated.
This comment was marked as outdated.
/test-vagrant |
mlh had added this to the wrong projects, because the PR had been merged back in May. Moved to the correct projects now. |
This commit brings back host-to-remote-pod packets as long as they are from proxy (with mark 0xa00). We removed encryption on path hostns -> remote pod on cilium#25440, because cilium aims at encrypting pod-to-pod traffic. However, we recently realized IPsec with L7 proxy relies on host-to-remote-pod encryption: for normal pod to pod connection, cilium proxy will hijack the traffic on hostns, respond on behalf of dest pod from hostns. Unconditionally removing IPsec encryption code from bpf_host leads to encryption leak, those reply packets from proxy go out in plain-text. To implement the encryption for from-proxy packets only, we have to check skb->mark at some point, but by the time we look up ipcache, skb->mark has been cleared. Paul suggested three possible solutions: 1. We move this logic to earlier in the packet processing (likely in do_netdev). It's option A you described before. That means we will have two lookups into the ipcache for a large number of packets 2. We don't reset the packet mark in inherit_identity_from_host such that we can use the value in subsequent tail called programs. That's another variant of option B. That could however many implications on the existing logic that are hard to anticipate. Not something we want to change in a backported bugfix. 3. We pass the information we need via the skb->cb. Yet another variant of option B. There's a small risk of conflicts with other cb values, but that's as usual. That's however basically the same as what you did except you used skb->tc_index instead of skb->cb. As none of them is satisfactory, we decided to use skb->tc_index and flag `TC_INDEX_F_SKIP_INGRESS_PROXY` to tell if the skb is from proxy, as long as we rename the flag `TC_INDEX_F_SKIP_{INGRESS,EGRESS}_PROXY` to clarify that the meaning of that flag changed. Fixes: cilium#25440 (bpf: Don't encrypt on path hostns -> remote pod) Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Previously #25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: #25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously #25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: #25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously cilium#25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit e78ff16 ] Previously #25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: #25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
In pod-to-pod encryption with IPsec and tunneling, Cilium currently encrypts traffic on the path
hostns -> remote pod
even though traffic is in plain-text on the pathremote pod -> hostns
. When using native routing, neither of those paths is encrypted because traffic from the hostns doesn't go through the bpf_host BPF program.Cilium's Transparent Encryption with IPsec aims at encrypting pod-to-pod traffic. It is therefore unclear why we are encrypting traffic from the hostns. The simple fact that only one direction of the connection is encrypted begs the question of its usefulness. It's possible that this traffic was encrypted by mistake: some of this logic is necessary for node-to-node encryption with IPsec (not supported anymore) and pod-to-pod encryption may have been somewhat simplified to encrypt
*-to-pod
traffic.Encrypting traffic from the hostns nevertheless creates several issues. First, this situation creates a path asymmetry between the forward and reply paths of
hostns<>remote pod
connections. Path asymmetry issues are well known to be a source of bugs, from of--ctstate INVALID -j DROP
iptables rules to NAT issues.Second, Gray recently uncovered a separate bug which, when combined with this encryption from hostns, can prevent Cilium from starting. That separate bug is still being investigated but it seems to cause the reload of bpf_host to depend on Cilium connecting to etcd in a clustermesh context. If this etcd is a remote pod, Cilium connects to it on hostns -> remote pod path. The bpf_host program being unloaded[1], it fails. We end up in a cyclic dependency: bpf_host requires connectivity to etcd, connectivity to etcd requires bpf_host.
This pull request therefore removes encryption with IPsec for the path
hostns -> remote pod
when using tunneling (already unencrypted when using native routing).1 - More specifically, in Gray's case, the bpf_host program is already loaded, but it needs to be reloaded because the IPsec XFRM config changed. Without this reload, encryption fails.