New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloudflare throttling for DNS api #1941
Comments
1. fix #1977 2. The cache is too long to as a line to save in the conf
it's reverted. |
Neilpang: from what i infer, you attempted to cache the domain and eliminate the dup's in response to 1941 and that caused 1977/1980 to appear. so, 'revert fix for #1941' seems to indicate that 1941 stands un-corrected/-modified. |
@vonp |
i saw your problem. i may have a working suggestion available if i can
get CF in motion.
when i discovered the forced latency, i revived an effort i underwent
with CF several years ago because i have an on-going need to handle the
obtaining of initial certs. i am available for pro bono work to a
little over 300 non-profits (NGO's) for whom i do systems assistance
and also mentor in linux systems admin for 2-3 dozen military veterans
(vets) per year. naturally, i always stress the need for system-wide
TLS and other security measures. the advent of letsencrypt has made the
TLS part free, your efforts have made that availability utile, and CF
offers both low latency and bandwidth conservation as an affordable
(or, even, free) possibility.
you may have missed the CF proc buried in their voluminous API doc's or
you may have experienced the same thing i did when you previously tried
it and the proc failed. the proc permits one to "bundle" all the RR's
into a bind9 config formated file and submit it in one go. when i first
gave it a try i seem to remember that the problem was it would not
handle MX's and nobody in CF's CS could figure out who could/would fix
it on the engineering level. thus, i stayed with the same RR-by-RR
method of creation that you now are using.
what i do now is start with a mysql template and create a cert-unique
table containing all the RR's. then another script extracts the RR's
and formats them into a bind9 config file. this file is then verified
using bind9's zone-checking utils before submitting it to CF. also,
since CF handles NS and SOA creation, these have to be stripped out of
the valid "named-type" file before going to CF. other than those
changes, CF now works with anything bind9 supports!
although the NGO's have only limited support/need for mysql and almost
no need of bind9, i just install-and-forget these on their systems to
support the cert business. of course the vets go on to commercial work
where these app's are far more prevelent or can be installed and their
need for training is vital.
for your purposes the mysql step should be handled in a different
script-like way to get to the bind9 config file. not only is the
existence of mysql a question, but sys admins are unlikely to permit
you access to mysql anyway. i do know there are app's that can create a
mysql-like table file, but this is probably over-kill for what you need
and actually not a necessity since it merely reflects the way i created
the process.
i would have to get guidance from ISC as to whether their zone-checking
utils are truly "stand-alone" or not. the zone-check step not only
checks syntax, but it also validates params (such as the validity of MX
addr's, for one). so — even though you will need few of the checks ISC
provides — the little 41kb util still is worth putting in your install
pkg to guarantee that the CF submission succeeds and you do not have to
duplicate what already works perfectly.
is this all worth it? i went back and analyzed the last 41 certs we
recently handled. due to the high average number of sub-domains
involved, i figure that it took 18,942 api queries to CF to do that
part. with my limited knowledge of what letsencrypt is doing, i
estimate that the bind9 process would have cut your CF queries to only
164. the JSON response has all the wealth of CF data (RR id, dates, …)
which could be left in a file in the domain's directory you create.
however, your sole interest is in what is in the last JSON object:
'"success": true'. for us, we update the mysql tables for persistence
and add some of the data to a multi-dimension domain-indexed array for
sub-millisecond access (i.e., without the mysql overhead).
right now i am awaiting a CF response to several questions. obviously,
the proc works to load all the RR's for a domain which is virgin or has
had all the RR's deleted: which matches our own set-up use-case. your
use-cases are incremental/decremental and those questions remain to be
answered by CF.
### QUESTIONS TO CF (open since 2018.DEC.01):
? A.) will CF's proc take a bind9 file consisting of only additions and
fold them in? (THIS IS PART OF acme.sh's USE-CASE.)
? B.) last year (mid 2017) someone in CF engineering said that there
was api work being done to handle mass deletes and i neither heard back
nor remembered to followed up. what is the status of this work? (THIS
IS PART OF acme.sh's USE-CASE.)
? C.) if the bind9 file contains matching RR's, what will your proc do
(i.e., reject everything, reject only dups, update existing from dups
and process additions, ignore dups and process additions, delete
everything and start with only what is in the file, or …)? (this
primarily relates to our own use-cases. it might apply to acme.sh if RR
deletes can somehow be handled through a bind9 update submission.)
on my side i certainly can test QUESTION A and let you know what i find
out. i cannot directly test QUESTION B since there is nothing in CF's
doc's yet re how this can be done through the API. of course, i could
play around with variants of the '-X DELETE' curl option to see if
multiple RR's can be packed into the 'data' object, but this might not
be how CF is trying to implement this proc. as to QUESTION C, while it
might not impact you directly, if you desire, i would be happy to
update you when i get something definitive from CF.
i laud your extensive efforts in acme.sh. please feel free to contact
me re any of the foregoing.
…--
Thank you,
Johann
On Sat, 29 Dec 2018 14:00:57 +0000 (UTC) neil ***@***.***> wrote:
@vonp
Yes, it reverted.
I tried to cache the response, but the response is too long to cache,
and we must use the api with `name=example.com`. We will see if
there is anything we can do for this issue.
|
let's keep it open. |
FYI, i am making some progress since i am now dealing with a CF engineer who actually works on the bind9 part of the API. here is what i can report: so, adds are already on the table with edits coming on-line in the near future. hopefully, deletes will also join the adds/edits in the bind9 method so that there will be an universal method for multiple RR submissions in one "throw". the only note of caution re this method is the fact that out-of-zone domains must be handled separately … more-or-less as they are handled now. |
we have been dealing with senior CS people at cloudflare (CF) since early 2018.OCT, but they must then turn around and deal with actual network engineers. this has made the process protracted and somewhat opaque.
we had noticed that some app's suddenly were consuming huge amounts of wall-clock for very little being done through CF. this is in a DC where CF has a POP and we normally experience sub-millisecond RTT's. research revealed that CF was tacking on 5000 ms per query through their maintenance api. the engineers have not been able to pin-point what is triggering this event. we have noticed that the throttling goes away within 24 hours and the duration seems independent of whether or not there is any use of CF's api after it starts.
this has also started up during the use of acme.sh for several domains where each of them had 70-84 wildcard sub-domains. we noticed from the logging of the transactions that there was a query for the zone data for each sub-domain since acme.sh does not cache the initial response. it would not be unheard-of for a system-protection mechanism such as throttling to be triggered by many duplicate queries in a short time-frame.
what ever the cause may be, people should be aware that something is causing CF to begin throttling queries when there is a large number of sub-domains being processed from a single domain.tld base.
attached is a commented log of a sub-domain transaction that was submitted to CF engineering that highlights the latency problem.
CF_RR_latency_acme.txt
The text was updated successfully, but these errors were encountered: