Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle rate limits when updating parents #680

Closed
opensourcegal opened this issue Oct 1, 2021 · 3 comments · Fixed by #709
Closed

Handle rate limits when updating parents #680

opensourcegal opened this issue Oct 1, 2021 · 3 comments · Fixed by #709

Comments

@opensourcegal
Copy link

The following problem can be encountered:

CA with a lot of parents (about hundred), all the parents reside on the same RIR.

Once the refresh cycle kicks in, I guess from the code that all parents are refreshed simultaneously.
This leads to some rate limiting by the parent and the synchronization fails for most with timeout (not all, the first are fine).

Possible solutions:

  • Configurable batch size for the parent refreshs, configurable wait time between batches

  • Configurable ca_refresh times if success or failed
    (if there are f.e. 100 parents and only 10 are allowed by rate limit, try failed ones every 10 minutes, completed ones only every 6 hours)

@opensourcegal
Copy link
Author

Another proposal related to this problem:

  • Make the refresh of a single parent available through CLI, so that important parents (after resource changes) can be refreshed manually without the need of refreshing hundreds of parents which may fail

@timbru
Copy link
Contributor

timbru commented Oct 4, 2021

ah yes, there were some assumptions in writing this code that there would not be 100+ parents, and that they would be different parents. So what it does now is fire up all syncs as asynchronous calls and then it waits to join them all (but not failing the whole job of course if any would have failed). Earlier code was sequential, but then this led to issues that the job could take too long.

I will have a think about this. I believe it would be good if the code solved this without requiring user action. Perhaps it should be smart enough to see that multiple parents are are the same entity (i.e. your CA acts as multiple children under a parent CA) and do those calls sequentially or in smaller batches.

I can also think of failed first.

Btw you can configure the ca_refresh rate - but my guess is that this won't help you in this case as it's the number of parallel calls that seems to be the issue and not so much the frequency.

@timbru
Copy link
Contributor

timbru commented Nov 4, 2021

A fix will be available in release 0.9.3 using the following logic:

  • limit the batch size to 25 parents per cycle
  • process new parents first; i.e. for which no sync has been attempted yet
  • then order parents by last exchange time, minute granularity, plan oldest first
  • for parents with a last exchange in the same minute -> do failed first

Hopefully this means that rate limits will not be triggered, but even if they are all parents still get scheduled eventually.

Relevant default values can be tweaked in config. Default values are:

ca_refresh_parents_batch_size = 25
ca_refresh_seconds = 600

Closing this issue for now.. but of course it can be re-opened if this does not solve the issue.

@timbru timbru closed this as completed Nov 4, 2021
@timbru timbru moved this from In progress to Done in Release 0.9.3 Nov 4, 2021
@timbru timbru changed the title Bulk refresh of CA/parent leads to timeout Handle rate limits when updating parents Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants