New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support jitter and maxtimeout for retry penalties #606
Support jitter and maxtimeout for retry penalties #606
Conversation
A few things here:
|
Thank you for fast reply! I have fixed 1, 2 and 3. About 4 – new code just add some noise to the retry backoff. Value of the jitter regulates amplitude of this noise. The code could look tricky since I am trying to avoid overflows of timeplus after adding this noise. |
Whats the purpose of the backoff "noise"? I'm just trying to understand the issue its resolving. The current code is just adjusting timeouts for how long it waits for a response when the first round across all servers fails with timeouts. Since c-ares didn't get any responses from any servers in the configured timeout interval, we have to assume that maybe the system configuration is too low for the timeout values, so we should try a round with more of a wait time. The current code appears to try to double the timeout on each full pass across all servers ... yours appears to limit it to less than that in a randomized way. |
The issue is following, I start large cluster (or start large distributed computation) with 10'000+ processes, all of these processes at the start want to resolve addresses of some control hosts (or of some source hosts) and goes to the DNS server. In such case this server can start throttle resolve requests and we want to avoid any synchronisation of retries (that really happens in such kind of situations) |
Interesting. Is there a true need for this to be configurable? Or any reason it shouldn't be always-on? If we could come up with parameters that should be acceptable for all use cases, my preference would be to not add more config options (one because its more to maintain, but two because other people could benefit from this but may not have realized there's a config to fix their usecase). I'm really not tied in any way to the current algorithm in use. I'd think as long as |
"Is there a true need for this to be configurable?" – it is a good question! In our project we actually use fixed value of this jitter for years. But my initial will was to remain unchanged the default behaviour of such a low-level library as c-ares, because my experience tells that engineers often rely on very concrete aspects of behaviour. If we ok with changes of default behaviour, then I can suggest following:
Are you ok with such plan? |
I think that sounds reasonable to me :) |
Looking at this, I think |
I have fixed it, thank you for the remark |
not seeing another commit after my last comment, forget to push it ? |
Oops, fixed |
can you look at 4acd575 to make sure I didn't negatively impact your logic? Just some fixes I noticed. |
We use c-ares for DNS resolving at the project https://github.com/YTsaurus/YTsaurus
In our installations we face a problem with synchronized request to the DNS from hundreds of thousands of processes and I want to add jitter for retry backoffs to smooth these requests over time.