Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/loop: fix loop detection when an upstream DNS server is not reachable #2255

Merged
merged 1 commit into from Oct 31, 2018

Conversation

matlec
Copy link
Contributor

@matlec matlec commented Oct 31, 2018

1. Why is this pull request needed and what does it do?

The loop plugin erroneously detects a loop (and causes the server startup to fail) when the server is configured with an upstream DNS server that is unavailable during startup. The issue can be reproduced with the following minimal configuration file (choose any IP address that is unavailable for you):

.:53 {
	proxy . 10.20.30.40
	loop
}

Running the server with this configuration file yields the following output:

[FATAL] plugin/loop: Forwarding loop detected in "." zone. Exiting. See https://coredns.io/plugins/loop#troubleshooting. Probe query: "HINFO 1132263262137561571.8670159584435333845.".

The loop detection fails for the following reason: when the server receives the HINFO query, the query first passes the loop plugin which registers an occurence of the query by calling inc(). When the query is passed to the proxy plugin, the plugin tries to forward the request to the (unavailable) upstream server which ultimately causes the HINFO query request to fail. In its current implementation, the loop plugin retries the query several times over a total time span of 30 seconds. With each retry the loop plugin registers another occurrence of the HINFO query request. When loop plugin receives the second retry (i.e. when seen() returns 3) , the plugin erroneously believes to have detected a loop.

2. Which issues (if any) are related?

None

3. Which documentation changes (if any) need to be made?

None

@corbot corbot bot requested a review from chrisohaver October 31, 2018 16:10
@corbot
Copy link

corbot bot commented Oct 31, 2018

Thank you for your contribution. I've just checked the OWNERS files to find a suitable reviewer. This search was successful and I've asked chrisohaver (via plugin/loop/OWNERS) for a review.

If you have questions or suggestions for this bot, please file an issue against the miekg/dreck repository.

The bot understands the commands that are listed here.

@chrisohaver
Copy link
Member

chrisohaver commented Oct 31, 2018

Ah ha! Thanks for identifying and fixing this. And thanks for the clear explanation of how it was broken.

We should add a test for this ... possibly in coredns/ci.

@yongtang yongtang merged commit e332c8d into coredns:master Oct 31, 2018
@fturib
Copy link
Contributor

fturib commented Oct 31, 2018

@miekg , @chrisohaver : in your opinion,

  • How critical is this issue for CoreDNS in Kubernetes ?
  • Should we plan another release of CoreDNS to integrate into kubernetes v1.13 before its release ?

@chrisohaver
Copy link
Member

IMO - Critical enough that we should include this in the k8s 1.13 release (via a new release CoreDNS 1.2.6).

@johnbelamaric
Copy link
Member

johnbelamaric commented Oct 31, 2018 via email

@miekg
Copy link
Member

miekg commented Oct 31, 2018 via email

@codecov-io
Copy link

Codecov Report

Merging #2255 into master will decrease coverage by 0.04%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2255      +/-   ##
==========================================
- Coverage   56.26%   56.21%   -0.05%     
==========================================
  Files         203      203              
  Lines       10154    10159       +5     
==========================================
- Hits         5713     5711       -2     
- Misses       4009     4015       +6     
- Partials      432      433       +1
Impacted Files Coverage Δ
plugin/loop/setup.go 45.83% <0%> (-0.98%) ⬇️
plugin/loop/loop.go 22.5% <0%> (-2.5%) ⬇️
plugin/file/reload.go 65.78% <0%> (-5.27%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d42e0d4...1d9f8bf. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants