New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-scale IPcache: Chapter 5 #25601
Conversation
9e102bb
to
3c8b31f
Compare
3c8b31f
to
8df9be0
Compare
Reported-by: Gray Lian <gray.liang@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This commit moves the decapsulation of the special tunnel for high-scale IPcache mode to the native device. We can't rely on cilium_host to decapsulate because we won't be going through cilium_host when endpoint routes are enabled. As a consequence, native devices are required for Cilium to function in this mode. Therefore, Cilium agents will attempt to auto-detect the devices and fatal if they can't. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
In a subsequent commit, we will rely on the endpoints map to retrieve the security identity of local endpoints. This commit therefore populates the endpoints map with the security identity of each endpoint. The map stays the same size because we can reuse the existing padding. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
In high-scale IPcache mode, we can't rely on the IPcache to retrieve the security identity of local endpoints. When endpoint routes are disabled, we jump directly from the source to the destination BPF programs and can therefore pass the security identity via the skb->cb. However, when endpoint routes are enabled, we must go through the Linux stack. In that last case, we can rely on the endpoints map to retrieve security IDs at the destination BPF program. This commit implements that work around. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
In high-scale IPcache mode, pods are excluded from the IPcache and we therefore can't rely on the IPcache to know when to encapsulate traffic. We decide to encapsulate or not based on a user-provided set of CIDRs. Of course, we never want to encapsulate traffic going to local pods. If endpoint routes are disabled, that is already the case: we will never go through the encapsulation logic as we will directly deliver packets to their destination endpoint with a bpf_redirect. If endpoint routes are enabled however, we need to check if the destination is a local endpoint, in which case we don't want to encapsulate. We do that in this commit using the local endpoint map which contains only endpoints hosted on the node. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Previous commits fixed our support for endpoint routes with high-scale IPcache so we can now cover it in tests. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
8df9be0
to
718b8cf
Compare
/test |
if (!revalidate_data(ctx, &data, &data_end, &ip6)) | ||
return DROP_INVALID; | ||
|
||
ep = __lookup_ip6_endpoint((union v6addr *)&ip6->saddr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a stupid question: should it be ip6->daddr
?
If the traffic is from remote, local endpoint map doesn't have an entry for the saddr, right? So this lookup can only be useful for local-to-local traffic. In that case, can we perform policy check at from-container
's handle_ipv4_from_lxc
, where we can get the dst_sec_id
from local endpoint map if ipcache doesn't return an entry:
Lines 832 to 834 in 4535f13
info = lookup_ip4_remote_endpoint(ip4->daddr, cluster_id); | |
if (info && info->sec_identity) { | |
*dst_sec_identity = info->sec_identity; |
@squeed Better not trust those MLH labeling blindly. There was an open question from Gray above. |
@pchaigno argh, good catch, sorry. |
No big deal. Everything here is behind a feature flag so we won't break anything else. |
This is in the context of the high-scale ipcache feature described at cilium/design-cfps#7.
This pull request adds support for endpoint routes in high-scale ipcache mode. See commits for details.
Updates: #25243.