New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/contrib: Bump CoreDNS version to 1.8.3 #18018
Conversation
@brb I thought it was the same bug. I mean it might be the same bug but the fix is only available in conjunction of both PRs. Nice finding. I assume we will need to backport this to all branches? |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
cilium/cilium#18018 updated the tests in cilium/cilium to consistenly use CoreDNS v1.8.3. Thus we no longer need to pre-pull any other CoreDNS image versions. Signed-off-by: Tobias Klauser <tobias@cilium.io>
cilium/cilium#18018 updated the tests in cilium/cilium to consistenly use CoreDNS v1.8.3. Thus we no longer need to pre-pull any other CoreDNS image versions. Signed-off-by: Tobias Klauser <tobias@cilium.io>
FYI: opened cilium/packer-ci-build#301 to drop all but |
Naïve question: why can't we simply use the latest version? |
CI is failing:
|
[ upstream commit 06d9441 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: #18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 06d9441 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: #18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>
[ upstream commit 06d9441 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: #18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>
[ upstream commit 06d9441 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: #18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>
[ upstream commit 06d9441 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: #18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 06d9441 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: #18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
As reported in [1], Go's HTTP2 client < 1.16 had some serious bugs which
could result in lost connections to kube-apiserver. Worse than this was
that the client couldn't recover.
In the case of CoreDNS the loose of connectivity to kube-apiserver was
even not logged. I have validated this by adding the following rule on
the node which was running the CoreDNS pod (6443 port as the socket-lb
was doing the service xlation):
After upgrading CoreDNS to the one which was compiled with Go >= 1.16,
the pod was not only logging the errors, but also was able to recover
from them in a fast way. An example of such an error:
To determine the min vsn bump, I was using the following:
Hopefully, the bumped version will fix the CI flakes in which a service
domain name is not available after 7min. In other words, CoreDNS is not
able to resolve the name which means that it hasn't received update from
the kube-apiserver for the service.
[1]: kubernetes/kubernetes#87615 (comment)
Fix #17401