Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cilium: Add scope knob for local address to be considered host id in …
…ipcache In some advanced environments, there may be devices in the hostns which could have a link scoped 10.x.y.z address. The default behavior of Cilium when populating its local ipcache is to skip all link local addresses as per listLocalAddresses(). Depending on the datapath configuration, this may cause issues when a Pod wants to talk to such an address in the hostns. For example, when routing doesn't go via stack such as the case in BPF host routing, then for such addresses, the ipcache will fall-back to WORLD id instead of HOST id. The datapath then assumes that this needs to be xmitted to the given device from tc layer instead of pushing traffic up the local stack as the case with HOST id traffic. Then, if such device is f.e. a dummy dev, such traffic is being blackholed. We tested that changing scope to global for such address would make traffic flow working, so the culprit really is in listLocalAddresses()'s logic which unconditionally skips all addr.Scope == int(netlink.SCOPE_LINK). Allow to customize this, given the kernel also allows many other scope values. The agent gets a new --local-max-addr-scope param for this so that e.g. link local scope can be included via `--local-max-addr-scope=253` or via `--local-max-addr-scope=link`. To preserve the default, it's still excluded. Example, default: # ./daemon/cilium-agent --identity-allocation-mode=crd \ --enable-ipv6=true --enable-ipv4=true --disable-envoy-version-check=true \ --tunnel=disabled --k8s-kubeconfig-path=$HOME/.kube/config \ --kube-proxy-replacement=strict --enable-l7-proxy=false \ --auto-direct-node-routes=true --enable-bandwidth-manager=true \ --ipv4-native-routing-cidr=10.91.0.0/16 --ipv6-native-routing-cidr=f00d::a5b:0:0:0/96 \ --enable-ipv4-masquerade=false --enable-ipv6-masquerade=false root@zh-lab-node-1:~/go/src/github.com/cilium/cilium# ./cilium/cilium bpf ipcache list | grep "identity=1 " 2001:1620:665:0:1ac0:4dff:fe09:c164/128 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 10.91.0.191/32 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 f00d::a5b:0:0:a962/128 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 192.168.178.91/32 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 Example, with link scope: # ./daemon/cilium-agent --identity-allocation-mode=crd \ --enable-ipv6=true --enable-ipv4=true --disable-envoy-version-check=true \ --tunnel=disabled --k8s-kubeconfig-path=$HOME/.kube/config \ --kube-proxy-replacement=strict --enable-l7-proxy=false \ --auto-direct-node-routes=true --enable-bandwidth-manager=true \ --ipv4-native-routing-cidr=10.91.0.0/16 --ipv6-native-routing-cidr=f00d::a5b:0:0:0/96 \ --enable-ipv4-masquerade=false --enable-ipv6-masquerade=false \ --local-max-addr-scope=link root@zh-lab-node-1:~/go/src/github.com/cilium/cilium# ./cilium/cilium bpf ipcache list | grep "identity=1 " 192.168.178.91/32 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 2.2.2.2/32 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 2001:1620:665:0:1ac0:4dff:fe09:c164/128 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 f00d::a5b:0:0:a962/128 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 10.91.0.191/32 identity=1 encryptkey=0 tunnelendpoint=0.0.0.0 Where 2.2.2.2/32 is a custom address with link scope on device enp5s0: 3: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 18:c0:4d:09:c1:64 brd ff:ff:ff:ff:ff:ff inet 2.2.2.2/32 scope link enp5s0 valid_lft forever preferred_lft forever inet 192.168.178.91/24 brd 192.168.178.255 scope global dynamic enp5s0 valid_lft 592972sec preferred_lft 592972sec inet6 2001:1620:665:0:1ac0:4dff:fe09:c164/64 scope global dynamic mngtmpaddr noprefixroute valid_lft 6805sec preferred_lft 3205sec inet6 fe80::1ac0:4dff:fe09:c164/64 scope link valid_lft forever preferred_lft forever [...] 8: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:b4:96:9d:b4:74 brd ff:ff:ff:ff:ff:ff inet 10.91.0.191/32 scope link cilium_host valid_lft forever preferred_lft forever inet6 2001:1620:665:0:1ac0:4dff:fe09:c164/128 scope global valid_lft forever preferred_lft forever For start of 1.13 cycle we can change the defaults.AddressScopeMax to link. For this, the option --local-max-addr-scope is intentionally a hidden one, so that once we do the switch there is also an optional opt-out. Fixes: #19427 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- Loading branch information