Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudflare throttling for DNS api #1941

Open
vonp opened this issue Dec 1, 2018 · 6 comments
Open

cloudflare throttling for DNS api #1941

vonp opened this issue Dec 1, 2018 · 6 comments

Comments

@vonp
Copy link

vonp commented Dec 1, 2018

we have been dealing with senior CS people at cloudflare (CF) since early 2018.OCT, but they must then turn around and deal with actual network engineers. this has made the process protracted and somewhat opaque.

we had noticed that some app's suddenly were consuming huge amounts of wall-clock for very little being done through CF. this is in a DC where CF has a POP and we normally experience sub-millisecond RTT's. research revealed that CF was tacking on 5000 ms per query through their maintenance api. the engineers have not been able to pin-point what is triggering this event. we have noticed that the throttling goes away within 24 hours and the duration seems independent of whether or not there is any use of CF's api after it starts.

this has also started up during the use of acme.sh for several domains where each of them had 70-84 wildcard sub-domains. we noticed from the logging of the transactions that there was a query for the zone data for each sub-domain since acme.sh does not cache the initial response. it would not be unheard-of for a system-protection mechanism such as throttling to be triggered by many duplicate queries in a short time-frame.

what ever the cause may be, people should be aware that something is causing CF to begin throttling queries when there is a large number of sub-domains being processed from a single domain.tld base.

attached is a commented log of a sub-domain transaction that was submitted to CF engineering that highlights the latency problem.

CF_RR_latency_acme.txt

Neilpang pushed a commit that referenced this issue Dec 28, 2018
1. fix #1977
2. The cache is too long to as a line to save in the conf
@Neilpang
Copy link
Member

it's reverted.

@vonp
Copy link
Author

vonp commented Dec 29, 2018

Neilpang:

from what i infer, you attempted to cache the domain and eliminate the dup's in response to 1941 and that caused 1977/1980 to appear. so, 'revert fix for #1941' seems to indicate that 1941 stands un-corrected/-modified.

@Neilpang
Copy link
Member

@vonp
Yes, it reverted.
I tried to cache the response, but the response is too long to cache, and we must use the api with name=example.com.
We will see if there is anything we can do for this issue.

@vonp
Copy link
Author

vonp commented Dec 30, 2018 via email

@Neilpang
Copy link
Member

Neilpang commented Jan 8, 2019

let's keep it open.

@Neilpang Neilpang reopened this Jan 8, 2019
@vonp
Copy link
Author

vonp commented Jan 8, 2019

FYI, i am making some progress since i am now dealing with a CF engineer who actually works on the bind9 part of the API. here is what i can report:
A) right at this moment adds can be done through this method. if any dups of current RR's are encountred in the uploaded file they are reported back as errors, but the adds all go through.
B) in relatively short order the dups presently handled as errors will be handled as edits through this same submission method. the coding for this is already finished/approved and awaiting the completion of QC/testing/implementation.
C) mass deletes: this is apparently a hot topic amongst CF' clients and is being explored, but not presently on anyone's front burner. i have suggested that using the bind9 method provides a dramatic network/resource conservation move for CF and the engineer has agreed to look at deletes from that angle since it is in his own purview.

so, adds are already on the table with edits coming on-line in the near future. hopefully, deletes will also join the adds/edits in the bind9 method so that there will be an universal method for multiple RR submissions in one "throw".

the only note of caution re this method is the fact that out-of-zone domains must be handled separately … more-or-less as they are handled now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants