Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c-ares 1.22.1 (and newer) does not change servers on a request timeout #650

Closed
b-spencer opened this issue Dec 7, 2023 · 2 comments
Closed

Comments

@b-spencer
Copy link
Contributor

b-spencer commented Dec 7, 2023

If I understand correctly, with the new dynamic server list behaviour of #594, the first server in the server list is always used by ares__requeue_query(). But, when a query times out in process_timeouts(), there is no call to server_increment_failure(), so the server list always stays in the same order, and thus the retry is send to the same server.

This means that when a DNS server is not responding, c-ares gets stuck querying it forever, at least in all the cases I've seen. Real users often have misconfigurations such as non-functional IPv6 DNS server entries, etc. If a "bad" server is first in the list, no DNS lookups work at all.

All other (similar) calls to ares__requeue_query() are preceded by a call to server_increment_failure(). Adding that call in process_timeouts() immediately before ares__requeue_query() locally fixed the behaviour in my testing so far.

@bradh352
Copy link
Member

bradh352 commented Dec 7, 2023

Hmm, that seems really bad if true. I thought the test cases handled the retry scenarios and would switch servers. Let me investigate.

@bradh352
Copy link
Member

bradh352 commented Dec 7, 2023

ok, confirmed. Created PR #651 which also adds a test case for this. Once all CI/CD tests complete I'll merge into main as well as the 1.23 branch.

bradh352 added a commit that referenced this issue Dec 7, 2023
As of c-ares 1.22.0, server timeouts were erroneously not incrementing server failures meaning the server in use wouldn't rotate.  There was apparently never a test case for this condition.

This PR fixes the bug and adds a test case to ensure it behaves properly.

Fixes Bug: #650 
Fix By: Brad House (@bradh352)
@bradh352 bradh352 closed this as completed Dec 7, 2023
bradh352 added a commit that referenced this issue Dec 7, 2023
As of c-ares 1.22.0, server timeouts were erroneously not incrementing server failures meaning the server in use wouldn't rotate.  There was apparently never a test case for this condition.

This PR fixes the bug and adds a test case to ensure it behaves properly.

Fixes Bug: #650 
Fix By: Brad House (@bradh352)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants