Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/cache: remove item.Autoritative #2885

Merged
merged 2 commits into from Jun 13, 2019

Conversation

Projects
None yet
4 participants
@miekg
Copy link
Member

commented Jun 12, 2019

Confuses clients if not set; remove it.

Fixes #2887

plugin/cache: remove item.Autoritative
Confuses clients if not set; remove it.

Signed-off-by: Miek Gieben <miek@miek.nl>

@corbot corbot bot requested a review from grobie Jun 12, 2019

@corbot

This comment has been minimized.

Copy link

commented Jun 12, 2019

Thank you for your contribution. I've just checked the OWNERS files to find a suitable reviewer. This search was successful and I've asked grobie (via plugin/cache/OWNERS) for a review.

If you have questions or suggestions for this bot, please file an issue against the miekg/dreck repository.

The bot understands the commands that are listed here.

@codecov-io

This comment has been minimized.

Copy link

commented Jun 12, 2019

Codecov Report

Merging #2885 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2885      +/-   ##
==========================================
- Coverage    55.7%   55.67%   -0.04%     
==========================================
  Files         205      205              
  Lines       10291    10290       -1     
==========================================
- Hits         5733     5729       -4     
- Misses       4141     4144       +3     
  Partials      417      417
Impacted Files Coverage Δ
plugin/cache/item.go 88.37% <100%> (-0.27%) ⬇️
plugin/forward/connect.go 81.69% <0%> (-4.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27ca097...643d6ea. Read the comment docs.

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jun 12, 2019

When we cache a non-authoritative answer, now cache will send to the client as if it is was authoritative. Seems like it would less wrong to retain the original value and pass that to the client.

@miekg

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

@m4r10k

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

@miekg the PR patched version (1.5.0-pr2885) now runs on-premises without any problems now. LGTM.

Would you like to have a Readme update too, or should we got with it as it is?

Thanks!

@miekg

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

@m4r10k

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

Your are welcome!

As for the Reame: It was just a question if you would like to mention this change anyway in the readme or if it is enough to have it in the changelog. I only raised the question to fullfil the usual process. 🙈

@miekg

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

There is nothing about this in the README, so I'll won't add anything. For the release notes you're correct this will be on there.

Add extra comments on why we do this
Signed-off-by: Miek Gieben <miek@miek.nl>

@miekg miekg merged commit 481dea5 into master Jun 13, 2019

3 checks passed

codecov/project 55.67% (target 50%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
stickler-ci No lint errors found.

@corbot corbot bot deleted the no-auth branch Jun 13, 2019

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

Won't clients continue to be confused by non-authoritative answers when the cache plugin isn't being used? If so, would this hack be better moved to a more global context than just in the cache plugin?

@miekg

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

@m4r10k

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

Hmm, I am not that deep into DNS here and maybe I got this wrong (please forgive me) but as I understand, as long as the cache plugin is not enabled, every domain specified in the Corefile which coreDNS is responsible for is authoritative and if it is forwarded, it is not maybe.

The case here is, that the coreDNS, with enabled cache plugin, only responses the first time with an authoritative answer and later, as the data comes from the cache, it answers with non-authoritative, even if it is a domain it is always authoritative for (Corefile).

This behavior reflects, what the tcpdumps are showing, what the same client with the same software and the same DNS query is (wrongly) doing - only because the cache plugin is enabled.

But hey, I could be wrong here for sure 🙈 😄

@miekg

This comment has been minimized.

Copy link
Member Author

commented Jun 14, 2019

@m4r10k

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

Is there an upstream bug the for Python resolver btw?

I do not know. What we encounter is, that with some quite old Linuxes we got resolving errors from different programs, like Puppet agent or rsync for example. Both can maybe tracked down to an old glibc (or something), as somehow mention on the internet. I guess that the program languages, in our case Ruby will somewhere be linked against a C library or uses some DNS code(?) of an C library which may cause this symptoms. There are legacy systems out there (also with Kubernetes and containers as such) which are not that actual as they should be of course, but nevertheless nobody is perfect. 😄

In our case, the cache is useful, as we use the ETCD backend massively and without caching, if the storages where the ETCD backends are running (virtual machines on SAN storage) have a hiccup, the coreDNS cannot answer the queries just in time. Especially in a container driven environment, like we have, this can hurt cluster raft protocols - only some seconds (2-3) of a storage switchover are enough to face strange situations. Therefore, for reliability, we have to enable caching to mitigate these possible timeouts. But without this patch, we break older legacy services. Meh..

This use case of DNS - containers and fast dynamic service discovery - wasn't on the radar 20-30 years ago, but now it is and as @miekg said, who cares about this flag in case of cache? There seems to be, as I understand, no hard protocol standards for this in DNS.

So this PR does what it should with minimal impact on the rest. 👍

PS: containers do not have a DNS cache (nscd, systemd-resolved, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.