New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes: Verify k8s-API failure behavior #1967
Comments
[ Quoting <notifications@github.com> in "[coredns/coredns] kubernetes: Verif..." ]
Recent question on slack relating to kubernets-api failover: During an API
failure CoreDNS replies with DNS errors (presumably NXDOMAIN) for kubernetes
records. Increasing TTLs could help, but need to validate behavior of API
connection during API outages...
NXDOMAIN... I hope not, SERVFAIL should hopefully be returned in that case.
Need to revisit/review the k8s client lib and verify...
how it reconnects after k8s api comes back
how it maintains the k8s api cache while api is down
how it maintains k8s api cache during reconnect
Note: "k8s api cache" is the cache maintained by the k8s client lib. Not the CoreDNS `cache` plugin.
Is this hard to text in the CI?
/Miek
…--
Miek Gieben
|
TBD. I think yes. I need to figure out how to test manually first. :) |
Agreed - SERVFAIL would be correct response. I don't know how we respond in this case. I'll see if i can reproduce it. |
[ Quoting <notifications@github.com> in "Re: [coredns/coredns] kubernetes: V..." ]
> NXDOMAIN... I hope not, SERVFAIL should hopefully be returned in that case.
Agreed - SERVFAIL would be correct response. I don't know how we respond in this case. I'll see if i can reproduce it.
iptables? And close of the endpoint from a vm?
|
I was just thinking I'd delete the kubernetes-api pod... and see how that goes. But making coredns run external to the cluster would allow us to control the length of the outage using iptables as you suggest... it would make timing observation of api cache state easier. I haven't had time to to play with it yet. It may be easier to automate than I fear. |
what is actually the problem, reading through the issue, it says if the control plane is down, we can't answer correctly? |
I dont know if there is a real problem here or not... need to look at the mode of failure first hand. I have not validated this myself, but it sounds as if we too quickly lose the api cache when the api connection fails. Perhaps lost immediately as the connection is lost. This is, IIUC controlled by the kubernetes client lib. Hopefully there is a connection option to to add some persistence to it. |
Recent question on slack relating to kubernets-api failover: During an API failure CoreDNS replies with DNS errors (presumably NXDOMAIN) for kubernetes records. Increasing TTLs could help, but need to validate behavior of API connection during API outages...
Need to revisit/review the k8s client lib and verify...
how it reconnects after k8s api comes back
how it maintains the k8s api cache while api is down
how it maintains k8s api cache during reconnect
Note: "k8s api cache" is the cache maintained by the k8s client lib. Not the CoreDNS
cache
plugin.The text was updated successfully, but these errors were encountered: