Probe for failed servers instead of redirecting query #877
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous implementation would redirect a query to a failed server based on a timeout and random chance per query. This could lead to issues of having to deal with server timeout scenarios when the server isn't back online yet causing latency issues. Instead, we should continue to use the known good servers for the query itself, but spawn a second query with the same question to a different downed server. That query will be able to be processed in the background and potentially bring the server back online.
Also, when using the
rotateoption, servers were previously chosen at random from the complete list. This PR changes that to choose only from the servers that share the same highest priority.Authored-By: Brad House (@bradh352)