auth: Do an ANY lookup for all types then filter #9007

rgacogne · 2020-04-06T13:26:45Z

Short description

Most of our backends have a very high latency, meaning that it takes a long time to send a query and get the answer, regardless of whether we are asking for one type or several. Our code base often asks for a specific type, and our current code stores separately the answers for ANY queries and the ones for a specific type. This seems wasteful since the answer to an ANY query already contains the records for a more specific one, and our in-memory records cache is must faster than going to the backend. We could save a round-trip by looking for ANY answers when we don't find a specific one in the record cache, but our first query is often for the specific NS type because we are looking for a referral.
This PR converts all lookups to an ANY lookup instead, making sure that we fill the cache as fast as possible to save round-trips to the backend later.

My tests showed roughly one third less queries to the backend in simple cases, and probably more in DNSSEC cases, while achieving higher QPS boundaries (~ +30%). CPU usage is also significantly reduced while replying a real-world PCAP.
The number of entries in the records cache is also significantly lower since we don't need to store a record twice, for ANY and for the exact type itself.

We could easily enable that change for specific backends only if we believe it might have a negative effect on some of them, although testing with the bind backend showed a slight improvement there as well, even though lookups in the bind backend are already quite fast.
I have not tested LMDB.

We could reduce the number of round-trips to the backend even more by getting rid of the 'SOA' special case, since I'm not aware of any backend currently implementing it in a special way.

Checklist

I have:

read the CONTRIBUTING.md document
compiled this code
tested this code
included documentation (including possible behaviour changes)
documented the code
added or modified regression test(s)
added or modified unit test(s)

Habbie · 2020-04-06T13:34:25Z

Our metronome templates currently use query-cache-miss / udp-answers to calculate 'query to DB amplification', is your backend-queries metric very different?

rgacogne · 2020-04-06T13:36:50Z

Our metronome templates currently use query-cache-miss / udp-answers to calculate 'query to DB amplification', is your backend-queries metric very different?

Yes, if only because this PR does currently up to two lookups into the records cache (ANY then specific type) because of the SOA case, but only one will go to the backend. The 'uncached' SOA query is not accounted for in query-cache-miss. It also differs if you have more than one backend. There might be other cases as well.

rgacogne · 2020-04-06T13:58:00Z

Failure of ci/circleci: test-auth-regress-gpgsql is unrelated (ALIAS).

rgacogne · 2020-04-06T14:35:29Z

Failure of test-auth-regress-gsqlite3 this time:

Apr 06 14:11:39 Error resolving for ALIAS google-public-dns-a.google.com., aborting AXFR
Apr 06 14:11:39 Unable to AXFR zone 'example.com' from remote '127.0.0.1:5200' (resolver): AXFR chunk error: Server Failure (This was the first time. Excluding zone from slave-checks until 1586182599)
Apr 06 14:11:39 Signing thread died because of std::exception: Reading from socket in Signing Pipe loop: Connection reset by peer

Habbie · 2020-04-06T14:36:19Z

Ugh, ALIAS was quite stable in CircleCI for a while. I'll see if I can make the ALIAS target something local.

rgacogne · 2020-04-14T13:36:27Z

We could reduce the number of round-trips to the backend even more by getting rid of the 'SOA' special case, since I'm not aware of any backend currently implementing it in a special way.

I looked into that idea a bit, but it's complicated by the fact that some backends (bind and LMDB, as of today) return their best SOA when the qtype passed to the ::lookup() operation is SOA. I'm not sure how useful that optimization is today (I'm pretty sure it's useless for the bind backend since the round-trip to the Ueberbackend is a function call, for LMDB it depends on the cost of opening a RO transaction) but we might not want to lose it.

Habbie · 2020-04-14T13:37:37Z

That optimisation has been requested for the pipebackend, I did not check if we do that today.

rgacogne · 2020-04-14T15:01:18Z

In fact it looks like this optimization is gone in both the Bind and LMDB backends since 6678e5a (#8050).

rgacogne · 2020-04-14T15:16:57Z

The more I look into this "optimization" the less I'm actually convinced it ever worked that way in the bind backend.

rgacogne · 2020-04-29T08:21:19Z

Unless I'm missing something huge, none of the backends I know of (including the not merged DLSO and Cassandra backends) overrides DNSBackend::getAuth(). DNSBackend::getAuth() directly calls DNSBackend::getSOA(), which is not overridden either. DNSBackend::getSOA() does not support sending a higher SOA than the one asked for, and it actually forces the qname to the requested one so sending a higher SOA would not be ignored and cause issues.

cmouse · 2020-05-05T13:30:48Z

pdns/ueberbackend.cc

+
+    for (auto& rec : anyRecs) {
+      if (q.qtype.getCode() == QType::ANY || rec.dr.d_type == q.qtype.getCode()) {
+        rrs.push_back(std::move(rec));


emplace_back?

Won't the move-constructor be called in the exact same way in both cases?

possibly. i just wonder if it could be rrs.emplace_back(rec)?

That would allocate and copy two DNSNames and one shared_ptr<DNSRecordContent> instead of moving them, so I don't think that would be better.

rgacogne · 2020-06-02T07:55:58Z

After discussing it ith @mind04 on the IRC channel, this might cause issues with some setups involving multiple backends. Unless we are willing to take the risk of breaking some setups, we should either limit this new behaviour to single-backend setups (easiest) or make it configurable (enable this optimization for most multiple-backends setups while not breaking the other ones).

zeha · 2020-06-02T08:09:12Z

If I read the diff right, only setups that have the same name in multiple backends could be affected?
If this is true, and if delegations from one backend to another work - great, and those mystical people that do other weird shit need to step out of the shadows and explain what they are doing.

rgacogne · 2020-06-02T08:17:30Z

Yes, my worry is that it might be more common that we think for custom backends, in order not to implement DNSSEC there and delegate that to a second backend instead. That's quite ugly in my humble opinion since it requires duplicating the names and types in the second backend, but it clearly exists..

zeha · 2020-06-02T08:23:06Z

My Problem with this is: there are only rumors of this existing, and nobody can say if this would be a supported thing to do or not. The mere rumor of it existing prevents making stuff better for everyone else.
If this is a usecase, maybe something actually well-supported should be implemented instead.
However my fear is, until we actually break these setups, no code will be written to make anything better - nothing will ever change.

mind04 · 2020-06-02T10:27:07Z

Limiting this behaviour to singe backend setups is a nice middle ground. And maybe add a configuration option to enable this for multi backed setups as well.
This will make it better for most people, without breaking more complex multi backend setups.

klaus3000 · 2020-06-22T13:31:53Z

I just wonder if this approach can be extend to fetch all RRs of a zone from the backend and filtering the label afterwards. Then all RRs of a zone can be put in the query cache. And if PDNS knows know that the query cache either contains all or no RRs of a zone, it could defeat random subdomain attacks as the NXDOMAIN is a result of not finding the requested label in the query cache, with having to ask the backend for every query.

rgacogne · 2020-06-22T14:44:29Z

I just wonder if this approach can be extend to fetch all RRs of a zone from the backend and filtering the label afterwards. Then all RRs of a zone can be put in the query cache. And if PDNS knows know that the query cache either contains all or no RRs of a zone, it could defeat random subdomain attacks as the NXDOMAIN is a result of not finding the requested label in the query cache, with having to ask the backend for every query.

There are several issues with that idea, mostly because the underlying database might be updated, and records might have different TTLs which might mean that we would need to invalidate the whole records cache after the shortest TTL expires.
I think the closest that we can get to that would be:

an in-memory database like the bind or LMDB backends ;
a (partial?) implementation of rfc8198 in the auth.

klaus3000 · 2020-06-22T18:50:02Z

I just wonder if this approach can be extend to fetch all RRs of a zone from the backend and filtering the label afterwards. Then all RRs of a zone can be put in the query cache. And if PDNS knows know that the query cache either contains all or no RRs of a zone, it could defeat random subdomain attacks as the NXDOMAIN is a result of not finding the requested label in the query cache, with having to ask the backend for every query.

There are several issues with that idea, mostly because the underlying database might be updated, and records might have different TTLs which might mean that we would need to invalidate the whole records cache after the shortest TTL expires.

Since when does PDNS Auth respect TTLs for packet/query cache? Actually I think that TTLs should NOT be considered for these caches. TTLs are designed for recursive DNS. Cache policies of the Auth are a decision of the DNS operator - regardless what the TTL says.

I think the closest that we can get to that would be:

an in-memory database like the bind or LMDB backends ;

Lacks replication. I do not know LMDB details, but SOA checks and AXFR does not scale to millions of zones and hundred secondaries.

a (partial?) implementation of rfc8198 in the auth.

Sounds good. But - when you have a random subdomain attack, within a second the aggressiveNSEC(3) cache is filled with the complete zone - so not much difference then loading the whole zone.

Anyways agressiveNSEC(3) caching sounds useful also to reduce the backend queries also without random subdomain attacks. Of course the technique should also be used for queries without the +DO bit set, to improve also non-DNSSEC queries.

rgacogne · 2020-06-23T07:54:48Z

Since when does PDNS Auth respect TTLs for packet/query cache?

We have capped the TTD of the packet and the query caches with the lowest TTL for as long as I can remember.

Actually I think that TTLs should NOT be considered for these caches. TTLs are designed for recursive DNS. Cache policies of the Auth are a decision of the DNS operator - regardless what the TTL says.

I would agree with you if all backends could properly detect a change in the zone, which is not the case for database backends. That means we need to keep the TTD duration in the caches quite small, which reduces the effect of loading the whole zone in memory a lot.
Note that it might not be possible to load everything in for quite some setups anyway, because they have a huge number of very large zones that would not fit in memory at the same time.

Sounds good. But - when you have a random subdomain attack, within a second the aggressiveNSEC(3) cache is filled with the complete zone - so not much difference then loading the whole zone.

The SOA, NSEC and corresponding RRSIG records are loaded, not all the records. This might make a significant difference.

Anyways agressiveNSEC(3) caching sounds useful also to reduce the backend queries also without random subdomain attacks. Of course the technique should also be used for queries without the +DO bit set, to improve also non-DNSSEC queries.

Agreed.

klaus3000 · 2020-06-23T18:47:45Z

Since when does PDNS Auth respect TTLs for packet/query cache?

We have capped the TTD of the packet and the query caches with the lowest TTL for as long as I can remember.

I was not aware of this. It should be mentioned in the docs.

Actually I think that TTLs should NOT be considered for these caches. TTLs are designed for recursive DNS. Cache policies of the Auth are a decision of the DNS operator - regardless what the TTL says.

I would agree with you if all backends could properly detect a change in the zone, which is not the case for database backends. That means we need to keep the TTD duration in the caches quite small, which reduces the effect of loading the whole zone in memory a lot.

I use a database backend. Actually, when I as DNS operator sets a query cache of X, but my customer sets a TTL of Y<X, the PDNS ignores the Admin and follows the user. IMO not fine. So, a TTL should be that TTL regardless of record values - IMO.

Note that it might not be possible to load everything in for quite some setups anyway, because they have a huge number of very large zones that would not fit in memory at the same time.

OF course it should be a config option. I guess for fast backends which do not suffer during random subdomain attacks there is no need to cache the whole zone as the whole zone is already in memory. And if someone with hughe zones uses a DB backend than it probably will not use this feature.

Sounds good. But - when you have a random subdomain attack, within a second the aggressiveNSEC(3) cache is filled with the complete zone - so not much difference then loading the whole zone.

The SOA, NSEC and corresponding RRSIG records are loaded, not all the records. This might make a significant difference.

In my experience the size of "normal" RRs can be neglected compared to NSEC(3) and RRSIG RRs.

Anyways agressiveNSEC(3) caching sounds useful also to reduce the backend queries also without random subdomain attacks. Of course the technique should also be used for queries without the +DO bit set, to improve also non-DNSSEC queries.

Agreed.

Of course there always exists a certain usecase were things may get worse than better. Hence, such features should always be a config option for power users.

rgacogne · 2020-07-07T15:05:22Z

Limiting this behaviour to singe backend setups is a nice middle ground. And maybe add a configuration option to enable this for multi backed setups as well.
This will make it better for most people, without breaking more complex multi backend setups.

I pushed a commit making that behaviour configurable. It's enabled by default, even in multi-backend setups, because I believe it will be fine for most users and will bring a noticeable performance improvement.

zeha · 2020-07-07T20:49:03Z

While I think the config option is a good idea, maybe we want to name it differently, so further performance improvements can be done under the same option. IIRC consistent-backends was suggested at some point.

rgacogne · 2020-07-09T10:21:16Z

While I think the config option is a good idea, maybe we want to name it differently, so further performance improvements can be done under the same option. IIRC consistent-backends was suggested at some point.

Agreed, I'll change the name and the description so it means that can all records for a given name should be unique to a backend. Or perhaps we should have a setting to declare that zones are not spread across multiple backends, which is a bit more strict but would allow more optimizations later?

zeha · 2020-07-09T15:37:01Z

Agreed, I'll change the name and the description so it means that can all records for a given name should be unique to a backend. Or perhaps we should have a setting to declare that zones are not spread across multiple backends, which is a bit more strict but would allow more optimizations later?

@Habbie I hoped you'd chime in here.

I think "no overlays" would make sense as a general idea. Supporting real, delegated sub-zones should still work IMO?

Habbie · 2020-08-12T13:27:56Z

I believe this is good to merge once the option is renamed and defaulted to off!

Habbie · 2020-08-19T10:26:25Z

I believe this is good to merge once the option is renamed and defaulted to off!

I pushed a commit for this.

zeha · 2020-08-19T10:38:03Z

circle-ci build-auth failed with:

unknown location(0): fatal error: in "test_ueberbackend_cc/test_multi_backends_overlay_name": unknown type
test-ueberbackend_cc.cc(779): last checkpoint: "test_multi_backends_overlay_name" test entry
Undefined but needed argument: 'consistent-backends'

Most of our backends have a very high latency, meaning that it takes a long time to send a query and get the answer, regardless of whether we are asking for one type or several. Our code base often asks for a specific type, and our current code stores separately the answers for ANY queries and the ones for a specific type. This seems wasteful since the answer to an ANY query already contains the records for a more specific one, and our in-memory records cache is must faster than going to the backend. We could save a round-trip by looking for ANY answers when we don't find a specific one in the record cache, but our first query is often for the specific NS type because we are looking for a referral. This PR converts all lookups to an ANY lookup instead, making sure that we fill the cache as fast as possible to save round-trips to the backend later. My tests showed roughly one third less queries to the backend in simple cases, and probably more in DNSSEC cases, while achieving higher QPS boundaries (~ +30%). CPU usage is also significantly reduced while replying a real-world PCAP. The number of entries in the records cache is also significantly lower since we don't need to store a record twice, for ANY and for the exact type itself. We could easily enable that change for specific backends only if we believe it might have a negative effect on some of them, although testing with the bind backend showed a slight improvement there as well, even though lookups in the bind backend are already quite fast. I have not tested LMDB. We could reduce the number of round-trips to the backend even more by getting rid of the 'SOA' special case, since I'm not aware of any backend currently implementing it in a special way.

Counting the numbers of queries sent to the backend(s), instead of relying on the number of cache misses.

It controls whether we only send 'ANY' lookups to our backend, instead of a mix of 'ANY' and exact types. This behaviour is enabled by default since it should save a lot of round-trips for most setups, but can be disabled for multi-backends setups that require it.

Habbie · 2020-08-19T10:43:01Z

Thanks, that confused me for a bit, but I have now seen the light and am rebasing :)

Habbie · 2020-08-19T11:17:05Z

Rebased, fixed, pushed. This passes locally. If I enable consistent backends, make check no longer passes, will investigate this.

mind04

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones. Please add logic to prevent/detect answers from multiple zones in the cache.

mind04 · 2020-08-19T11:43:33Z

pdns/ueberbackend.cc

-          d_question.qname=shorter;
-          addNegCache(d_question);
+          d_question.qname = shorter;
+          addNegCache(d_question, d_question.qtype);


d_question.qtype and d_question.zoneId are uninitialized here when getAuth was called with cachedOk = false

Also passing d_question and d_question.qtype seems redundant. The same applies to addCache() a few lines down.

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones.

Will fix, thanks!

Also passing d_question and d_question.qtype seems redundant. The same applies to addCache() a few lines down.

We actually need to be passe to override the qtype in some cases, because d_question.qtype holds the requested qtype but when s_doANYLookupsOnly is set we want to store records for ANY in the cache.

I'm pretty sure you can update d_question.qtype with ANY when s_doANYLookupsOnly is set.

I'm confused, how would we know the requested qtype then, so that we can filter the records in UeberBackend::get(), if we don't store it in d_question.qtype? d_handle.qtype is already set to ANY in that case so we do the correct lookup when we iterate over backends in UeberBackend::handle::get().

pdns/ueberbackend.cc

mind04 · 2020-08-19T12:05:01Z

pdns/ueberbackend.cc

-      rr=*d_cachehandleiter++;;
+  if (d_cached) {
+    if (d_cachehandleiter != d_answers.end()) {
+      rr = *d_cachehandleiter++;


I think we need a check here to make sure the zone_id in rr matches the id passed to lookup()

QC.getEntry already uses the zoneId for the cache lookup, no?

rgacogne · 2020-08-21T09:45:37Z

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones. Please add logic to prevent/detect answers from multiple zones in the cache.

I'm wondering how that works today? I'm not sure this PR changes that behaviour much, the SOA should not appear in more than one zones, the NSs will if we have a parent and a child zone but wasn't that already the case?

Counting the labels would prevent caching entries for the root.

zeha · 2020-09-14T21:03:28Z

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones. Please add logic to prevent/detect answers from multiple zones in the cache.

I'm wondering how that works today? I'm not sure this PR changes that behaviour much, the SOA should not appear in more than one zones, the NSs will if we have a parent and a child zone but wasn't that already the case?

I gave this a stern look and would agree. IMO there are two remaining B->getSOA cases that could go away; and then we're down to a) FindNS and b) pdnsutil benchmark for callers passing -1. None of them should have a problem here though.

zeha · 2020-10-12T13:34:44Z

I'm running this for today (in combination with #9464) and so far it's looking good.

zeha · 2020-10-12T13:36:38Z

For some reason this needs a trivial rebase on master when merging locally, but I don't see why.

Habbie · 2020-10-28T14:11:25Z

I merged #9483 (inspired by, and partially taken from, this PR) instead, as it appears to give the same benefits, while changing a lot less code. Thanks!

rgacogne added auth enhancement needs review labels Apr 6, 2020

cmouse reviewed May 5, 2020

View reviewed changes

rgacogne force-pushed the auth-cache-any branch from 301779c to 9c07b77 Compare May 6, 2020 16:17

zeha mentioned this pull request Jul 7, 2020

auth: Add some UeberBackend unit tests, document multi-backend interactions #9159

Merged

7 tasks

klaus-nicat mentioned this pull request Jul 13, 2020

Auth: Improve Performance for Random Subdomain Attacks with SQL Backends #9326

Open

Habbie added this to the auth-4.4.0-alpha1 milestone Aug 18, 2020

rgacogne and others added 5 commits August 19, 2020 12:42

auth: Add a new 'backend-queries' metric

710d462

Counting the numbers of queries sent to the backend(s), instead of relying on the number of cache misses.

auth: Also declare 'any-lookups-onl' in pdnsutil

8129272

rename option, default to off

83a2797

add consistent-backends setting to test-ueberbackend

b22f0f5

Habbie force-pushed the auth-cache-any branch from 58f650b to b22f0f5 Compare August 19, 2020 11:16

mind04 suggested changes Aug 19, 2020

View reviewed changes

rgacogne added 2 commits August 21, 2020 11:49

auth: Make sure that d_question is initialized even with !cachedOk

1725953

auth: Check that a name is not empty instead of counting the labels

c9dc997

Counting the labels would prevent caching entries for the root.

zeha mentioned this pull request Sep 15, 2020

Remove UeberBackend::getSOA #9470

Merged

7 tasks

mind04 mentioned this pull request Sep 18, 2020

Auth: different aproach to only send ANY lookups to backends #9483

Merged

8 tasks

Habbie modified the milestones: auth-4.4.0-alpha1, auth-4.4.0-alpha2 Sep 24, 2020

Habbie modified the milestones: auth-4.4.0-alpha2, auth-4.5.0-alpha0 Oct 28, 2020

Habbie closed this Oct 28, 2020

rgacogne deleted the auth-cache-any branch October 28, 2020 14:54

auth: Do an ANY lookup for all types then filter #9007

auth: Do an ANY lookup for all types then filter #9007

Conversation

rgacogne commented Apr 6, 2020

Short description

Checklist

Habbie commented Apr 6, 2020

rgacogne commented Apr 6, 2020

rgacogne commented Apr 6, 2020

rgacogne commented Apr 6, 2020

Habbie commented Apr 6, 2020

rgacogne commented Apr 14, 2020

Habbie commented Apr 14, 2020

rgacogne commented Apr 14, 2020

rgacogne commented Apr 14, 2020

rgacogne commented Apr 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgacogne commented Jun 2, 2020

zeha commented Jun 2, 2020

rgacogne commented Jun 2, 2020

zeha commented Jun 2, 2020

mind04 commented Jun 2, 2020

klaus3000 commented Jun 22, 2020

rgacogne commented Jun 22, 2020

klaus3000 commented Jun 22, 2020 • edited

rgacogne commented Jun 23, 2020

klaus3000 commented Jun 23, 2020

rgacogne commented Jul 7, 2020

zeha commented Jul 7, 2020

rgacogne commented Jul 9, 2020

zeha commented Jul 9, 2020

Habbie commented Aug 12, 2020

Habbie commented Aug 19, 2020

zeha commented Aug 19, 2020

Habbie commented Aug 19, 2020

Habbie commented Aug 19, 2020

mind04 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgacogne commented Aug 21, 2020

zeha commented Sep 14, 2020

zeha commented Oct 12, 2020

zeha commented Oct 12, 2020

Habbie commented Oct 28, 2020

klaus3000 commented Jun 22, 2020 •

edited