New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPF host routing bypasses DNS resolution of dockerd on Kind clusters #23283
Comments
Looking into the filtered strace output of one of the
It looks that the thread have entered the netns of the The missing bit in the puzzle is how the following got redirected to the
|
There we go, the last missing puzzle with the help of pwru:
Which happens because of the following rule:
|
TL;DR because the BPF host routing makes the packet to bypass that iptables rule, the packet is actually sent to the host netns. As there is no UDP socket listening on |
Instead of disabling the BPF masquerade. We can reenable it after [1] has been fixed. [1]: #23283 Signed-off-by: Martynas Pumputis <m@lambda.lt>
Host routing breaks testing in KIND clusters, see #23283. Instead of disabling BPF masquerading, force legacy routing. This will allow running egress gateway tests in CI, which require BPF masquerading. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Enable the egress gateway in some datapath workflows. Doing this is a bit tricky, since EGW relies on BPF masquerading to function. The latter has been disabled to work around #23283. Instead, we can force legacy host routing which has a similar effect. Unfortunately, BPF masquerading doesn't work for IPv6 so enabling it breaks a bunch of testcases! The solution is to run half of the tests with BPF masq and EGW disabled, and the other half with EGW on, BPF masq on and IPv6 masq disabled. Updates #24151. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Enable the egress gateway in some datapath workflows. Doing this is a bit tricky, since EGW relies on BPF masquerading to function. The latter has been disabled to work around #23283. Instead, we can force legacy host routing which has a similar effect. Unfortunately, BPF masquerading doesn't work for IPv6 so enabling it breaks a bunch of testcases! The solution is to run half of the tests with BPF masq and EGW disabled, and the other half with EGW on, BPF masq on and IPv6 masq disabled. Updates #24151. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
We are using our Kind provisioning script to create K8s clusters when testing in the CI. Recently, we discovered that on some kernels a default DNS resolver, which is dockerd, is troublesome for the BPF host routing, which we want to test in the CI (cilium#23283). Fix this by patching the coredns configmap after creating a kind cluster to point to the 8.8.8.8 resolver. Alternative fixes (may still be applied later): * Pass a custom /etc/resolv.conf to kubelet via --resolv-conf in the Kind / kubeadm config. * Override /etc/resolv.conf of Kind nodes after creating a cluster (no race condition, as CoreDNS pods won't be started, as a CNI is not ready). * Patch Kind to allow users to specify custom DNS entries (i.e., docker run --dns="1.1.1.1,8.8.8.8"). Fixes: cilium#23283 Fixes: cilium#23330 Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
The cilium#23283 should have been fixed by the 3e8f697 ("contrib/kind: set custom DNS resolver for Kind nodes") commit, so we can re-enable masquerading by default and re-enable fast routing. Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
We are using our Kind provisioning script to create K8s clusters when testing in the CI. Recently, we discovered that on some kernels a default DNS resolver, which is dockerd, is troublesome for the BPF host routing, which we want to test in the CI (#23283). Fix this by patching the coredns configmap after creating a kind cluster to point to the 8.8.8.8 resolver. Alternative fixes (may still be applied later): * Pass a custom /etc/resolv.conf to kubelet via --resolv-conf in the Kind / kubeadm config. * Override /etc/resolv.conf of Kind nodes after creating a cluster (no race condition, as CoreDNS pods won't be started, as a CNI is not ready). * Patch Kind to allow users to specify custom DNS entries (i.e., docker run --dns="1.1.1.1,8.8.8.8"). Fixes: #23283 Fixes: #23330 Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
[ upstream commit 03eeda7 ] We are using our Kind provisioning script to create K8s clusters when testing in the CI. Recently, we discovered that on some kernels a default DNS resolver, which is dockerd, is troublesome for the BPF host routing, which we want to test in the CI (#23283). Fix this by patching the coredns configmap after creating a kind cluster to point to the 8.8.8.8 resolver. Alternative fixes (may still be applied later): * Pass a custom /etc/resolv.conf to kubelet via --resolv-conf in the Kind / kubeadm config. * Override /etc/resolv.conf of Kind nodes after creating a cluster (no race condition, as CoreDNS pods won't be started, as a CNI is not ready). * Patch Kind to allow users to specify custom DNS entries (i.e., docker run --dns="1.1.1.1,8.8.8.8"). Fixes: #23283 Fixes: #23330 Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 03eeda7 ] We are using our Kind provisioning script to create K8s clusters when testing in the CI. Recently, we discovered that on some kernels a default DNS resolver, which is dockerd, is troublesome for the BPF host routing, which we want to test in the CI (#23283). Fix this by patching the coredns configmap after creating a kind cluster to point to the 8.8.8.8 resolver. Alternative fixes (may still be applied later): * Pass a custom /etc/resolv.conf to kubelet via --resolv-conf in the Kind / kubeadm config. * Override /etc/resolv.conf of Kind nodes after creating a cluster (no race condition, as CoreDNS pods won't be started, as a CNI is not ready). * Patch Kind to allow users to specify custom DNS entries (i.e., docker run --dns="1.1.1.1,8.8.8.8"). Fixes: #23283 Fixes: #23330 Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 03eeda7 ] We are using our Kind provisioning script to create K8s clusters when testing in the CI. Recently, we discovered that on some kernels a default DNS resolver, which is dockerd, is troublesome for the BPF host routing, which we want to test in the CI (cilium#23283). Fix this by patching the coredns configmap after creating a kind cluster to point to the 8.8.8.8 resolver. Alternative fixes (may still be applied later): * Pass a custom /etc/resolv.conf to kubelet via --resolv-conf in the Kind / kubeadm config. * Override /etc/resolv.conf of Kind nodes after creating a cluster (no race condition, as CoreDNS pods won't be started, as a CNI is not ready). * Patch Kind to allow users to specify custom DNS entries (i.e., docker run --dns="1.1.1.1,8.8.8.8"). Fixes: cilium#23283 Fixes: cilium#23330 Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
A note for people stumbling upon this issue like I have: This has only been "fixed" when using our |
Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit a1089a7 ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] [ backporter's notes: we keep masquerade set to false on upgrade tests for 1.14 due to limitations outlined in #14350. However we still backport the rest of the changes as regular non-upgrade tests still benefit from it. ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] [ backporter's notes: we keep masquerade set to false on upgrade tests for 1.14 due to limitations outlined in #14350. However we still backport the rest of the changes as regular non-upgrade tests still benefit from it. ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a1089a7 ] [ backporter's notes: we keep masquerade set to false on upgrade tests for 1.14 due to limitations outlined in #14350. However we still backport the rest of the changes as regular non-upgrade tests still benefit from it. ] Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: #23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Currently, BPF masquerade was always disabled in the clustermesh E2E tests due to unintended interactions with Docker iptables rules breaking DNS resolution [1]. Instead, let's explicitly configure external upstream DNS servers for coredns, so that we can also enable this feature when KPR is enabled. While being there, let's also make the KPR setting explicit, instead of relying on the Cilium CLI configuration (which is based on whether the kube-proxy daemonset is present or not). [1]: cilium#23283 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
cilium#23283 has been resolved, no need to workaround it any longer. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
cilium#23283 is resolved by now. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Clarify that while the issue was closed as resolved, this actually only applies to scenarios where the kind.sh script is used. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Clarify that while the issue was closed as resolved, this actually only applies to scenarios where the kind.sh script is used. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Clarify that while the issue was closed as resolved, this actually only applies to scenarios where the kind.sh script is used. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Clarify that while the issue was closed as resolved, this actually only applies to scenarios where the kind.sh script is used. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
While debugging #23171, I've discovered that the BPF masquerade is not properly working on > 5.4 kernels when running in the
ci-datapath
. The main symptom was failing requests tohttps://one.one.one.one
. A closer look revealed that the DNS resolution for external FQDN was failing.With the help of tcpdump running on the node of the CoreDNS pods (
kind-control-plane
), with the iptables masquerading I got the following trace:While with the BPF masquerading - the following:
10.244.1.217
- the client pod.10.244.0.74
- the CoreDNS pod (lxc61ce9e2c0db2
).172.12.1.4
- the CoreDNS node IP (eth0
).172.12.1.1
- the Docker bridge IP addr which connects Kind nodes.From the traces diff it quickly became clear that in the end that it's not the BPF masq matter, but the BPF host routing (with the iptables masquerading it's automatically disabled), and that the following Docker's DNS xlation is bypassed with the BPF host routing:
I still don't have a clue how the translation above happens.
The text was updated successfully, but these errors were encountered: