New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"serve_stale" option in the "cache" plugin behaves incorrectly #3586
Comments
Excellent bug report. Thank you! |
cc: @gonzalop |
/plugin: cache
|
I can reproduce the issue with/without the serve_stale patch and with/without the serve_stale option in Corefile. The first request gets the NXDOMAIN, subsequent requests are from cache with the configured 3600s TTL, which skips prefetching. Lowering the default cache denial TTL to a few seconds mitigates the problem:
I've always looked the ncache related lines in get and getIgnoreTTL with suspicion... |
Could this be related to #3037 thought the prefetch skipping only affects low TTLs. |
Default TTL for denial of existence responses is 1800 according to the docs and the example config doesn't overwrite TTLs. I haven't verified that this is what happens here but it looks to be the expected behavior, independent of using serve_stale or not. |
Should this be closed as WAI then? |
@gonzalop With:
And a response from my upstream:
I see that NXDOMAIN response cached for 10m, where I would expect it to be cached for 10s. This doesn't seem like correct behavior? This is with 1.6.7. If I remove the |
Oh! I'll take a look later this week. |
Does the |
@gonzalop, have you had a chance to look at the issue? |
Looking now :) |
Serving stale responses from the negative cache interacts poorly with other plugins and seems to confuse users. Fixes coredns#3586.
Serving stale responses from the negative cache interacts poorly with other plugins and seems to confuse users. Fixes coredns#3586. Signed-off-by: Gonzalo Paniagua Javier <gonzalo.mono@gmail.com>
[ Quoting <notifications@github.com> in "Re: [coredns/coredns] "serve_stale"..." ]
Looking now :)
Looking at this thread because of the PR now openend that disabled serve_stale for
negative only - which would be a strange thing to do from a DNS perspective; there isn't a
difference between negative and positive caching apart from where to look for a TTL and
the data. So this may be a deeper problem.... ?
|
I think #3744 addresses the underlying issue here. |
#3744 has fixed the issue. Thanks, @chrisohaver! |
What happened:
serve_stale
does not update NXDOMAIN status if it gets constantly hammered by requests.What you expected to happen:
CoreDNS updates the record once an upstream DNS starts to return an A record after returning NXDOMAIN.
How to reproduce it (as minimally and precisely as possible):
Begin the exercise by repeatedly hammering a CoreDNS instance with requests to the non-existent domain. It correctly returns NXDOMAIN.
Create an appropriate (I've just created a Service in Kubernetes) A record on the upstream DNS. Verify:
The aforementioned
while
loop will return NXDOMAIN indefinitely.Removing the
serve_stale
option alleviates the issue.Anything else we need to know?:
Notice, that these tests are performed not against the primary CoreDNS of a Kubernetes cluster, but against a secondary one that forwards requests to the primary (a node-level caching mechanism).
Environment:
cat /etc/os-release
):The text was updated successfully, but these errors were encountered: