Skip to content

Conversation

@bradh352
Copy link
Member

@bradh352 bradh352 commented Sep 8, 2024

The previous implementation would redirect a query to a failed server based on a timeout and random chance per query. This could lead to issues of having to deal with server timeout scenarios when the server isn't back online yet causing latency issues. Instead, we should continue to use the known good servers for the query itself, but spawn a second query with the same question to a different downed server. That query will be able to be processed in the background and potentially bring the server back online.

Also, when using the rotate option, servers were previously chosen at random from the complete list. This PR changes that to choose only from the servers that share the same highest priority.

Authored-By: Brad House (@bradh352)

@bradh352
Copy link
Member Author

bradh352 commented Sep 8, 2024

@oliverwelsh I reworked your server retry logic you contributed to duplicate the query and use it as a probe rather than scheduling the actual query to the failed server to improve responsiveness. See any issues with this strategy?

@bradh352 bradh352 merged commit 8d36033 into c-ares:main Sep 9, 2024
@bradh352 bradh352 deleted the failed_servers branch September 9, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant