Handle rate limits when updating parents #680

opensourcegal · 2021-10-01T22:34:26Z

The following problem can be encountered:

CA with a lot of parents (about hundred), all the parents reside on the same RIR.

Once the refresh cycle kicks in, I guess from the code that all parents are refreshed simultaneously.
This leads to some rate limiting by the parent and the synchronization fails for most with timeout (not all, the first are fine).

Possible solutions:

Configurable batch size for the parent refreshs, configurable wait time between batches
Configurable ca_refresh times if success or failed
(if there are f.e. 100 parents and only 10 are allowed by rate limit, try failed ones every 10 minutes, completed ones only every 6 hours)

opensourcegal · 2021-10-01T22:43:19Z

Another proposal related to this problem:

Make the refresh of a single parent available through CLI, so that important parents (after resource changes) can be refreshed manually without the need of refreshing hundreds of parents which may fail

timbru · 2021-10-04T07:22:43Z

ah yes, there were some assumptions in writing this code that there would not be 100+ parents, and that they would be different parents. So what it does now is fire up all syncs as asynchronous calls and then it waits to join them all (but not failing the whole job of course if any would have failed). Earlier code was sequential, but then this led to issues that the job could take too long.

I will have a think about this. I believe it would be good if the code solved this without requiring user action. Perhaps it should be smart enough to see that multiple parents are are the same entity (i.e. your CA acts as multiple children under a parent CA) and do those calls sequentially or in smaller batches.

I can also think of failed first.

Btw you can configure the ca_refresh rate - but my guess is that this won't help you in this case as it's the number of parallel calls that seems to be the issue and not so much the frequency.

timbru · 2021-11-04T15:35:21Z

A fix will be available in release 0.9.3 using the following logic:

limit the batch size to 25 parents per cycle
process new parents first; i.e. for which no sync has been attempted yet
then order parents by last exchange time, minute granularity, plan oldest first
for parents with a last exchange in the same minute -> do failed first

Hopefully this means that rate limits will not be triggered, but even if they are all parents still get scheduled eventually.

Relevant default values can be tweaked in config. Default values are:

ca_refresh_parents_batch_size = 25
ca_refresh_seconds = 600

Closing this issue for now.. but of course it can be re-opened if this does not solve the issue.

timbru added this to To do in Release 0.9.3 Oct 8, 2021

timbru moved this from To do to In progress in Release 0.9.3 Nov 3, 2021

timbru pushed a commit that referenced this issue Nov 3, 2021

Use max batch size for parent sync (cope with rate limits) (#680)

81a7e96

timbru linked a pull request Nov 3, 2021 that will close this issue

Use max batch size for parent sync (cope with rate limits) (#680) #709

Merged

timbru pushed a commit that referenced this issue Nov 4, 2021

Use max batch size for parent sync (cope with rate limits) (#680)

2580fa9

timbru closed this as completed Nov 4, 2021

timbru moved this from In progress to Done in Release 0.9.3 Nov 4, 2021

timbru changed the title ~~Bulk refresh of CA/parent leads to timeout~~ Handle rate limits when updating parents Nov 5, 2021

timbru pushed a commit that referenced this issue Nov 5, 2021

Use max batch size for parent sync (cope with rate limits) (#680)

7faded1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle rate limits when updating parents #680

Handle rate limits when updating parents #680

opensourcegal commented Oct 1, 2021

opensourcegal commented Oct 1, 2021

timbru commented Oct 4, 2021

timbru commented Nov 4, 2021

Handle rate limits when updating parents #680

Handle rate limits when updating parents #680

Comments

opensourcegal commented Oct 1, 2021

opensourcegal commented Oct 1, 2021

timbru commented Oct 4, 2021

timbru commented Nov 4, 2021