Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auth: Do an ANY lookup for all types then filter #9007

Closed
wants to merge 8 commits into from

Conversation

rgacogne
Copy link
Member

@rgacogne rgacogne commented Apr 6, 2020

Short description

Most of our backends have a very high latency, meaning that it takes a long time to send a query and get the answer, regardless of whether we are asking for one type or several. Our code base often asks for a specific type, and our current code stores separately the answers for ANY queries and the ones for a specific type. This seems wasteful since the answer to an ANY query already contains the records for a more specific one, and our in-memory records cache is must faster than going to the backend. We could save a round-trip by looking for ANY answers when we don't find a specific one in the record cache, but our first query is often for the specific NS type because we are looking for a referral.
This PR converts all lookups to an ANY lookup instead, making sure that we fill the cache as fast as possible to save round-trips to the backend later.

My tests showed roughly one third less queries to the backend in simple cases, and probably more in DNSSEC cases, while achieving higher QPS boundaries (~ +30%). CPU usage is also significantly reduced while replying a real-world PCAP.
The number of entries in the records cache is also significantly lower since we don't need to store a record twice, for ANY and for the exact type itself.

We could easily enable that change for specific backends only if we believe it might have a negative effect on some of them, although testing with the bind backend showed a slight improvement there as well, even though lookups in the bind backend are already quite fast.
I have not tested LMDB.

We could reduce the number of round-trips to the backend even more by getting rid of the 'SOA' special case, since I'm not aware of any backend currently implementing it in a special way.

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

@Habbie
Copy link
Member

Habbie commented Apr 6, 2020

Our metronome templates currently use query-cache-miss / udp-answers to calculate 'query to DB amplification', is your backend-queries metric very different?

@rgacogne
Copy link
Member Author

rgacogne commented Apr 6, 2020

Our metronome templates currently use query-cache-miss / udp-answers to calculate 'query to DB amplification', is your backend-queries metric very different?

Yes, if only because this PR does currently up to two lookups into the records cache (ANY then specific type) because of the SOA case, but only one will go to the backend. The 'uncached' SOA query is not accounted for in query-cache-miss. It also differs if you have more than one backend. There might be other cases as well.

@rgacogne
Copy link
Member Author

rgacogne commented Apr 6, 2020

Failure of ci/circleci: test-auth-regress-gpgsql is unrelated (ALIAS).

@rgacogne
Copy link
Member Author

rgacogne commented Apr 6, 2020

Failure of test-auth-regress-gsqlite3 this time:

Apr 06 14:11:39 Error resolving for ALIAS google-public-dns-a.google.com., aborting AXFR
Apr 06 14:11:39 Unable to AXFR zone 'example.com' from remote '127.0.0.1:5200' (resolver): AXFR chunk error: Server Failure (This was the first time. Excluding zone from slave-checks until 1586182599)
Apr 06 14:11:39 Signing thread died because of std::exception: Reading from socket in Signing Pipe loop: Connection reset by peer

@Habbie
Copy link
Member

Habbie commented Apr 6, 2020

Ugh, ALIAS was quite stable in CircleCI for a while. I'll see if I can make the ALIAS target something local.

@rgacogne
Copy link
Member Author

We could reduce the number of round-trips to the backend even more by getting rid of the 'SOA' special case, since I'm not aware of any backend currently implementing it in a special way.

I looked into that idea a bit, but it's complicated by the fact that some backends (bind and LMDB, as of today) return their best SOA when the qtype passed to the ::lookup() operation is SOA. I'm not sure how useful that optimization is today (I'm pretty sure it's useless for the bind backend since the round-trip to the Ueberbackend is a function call, for LMDB it depends on the cost of opening a RO transaction) but we might not want to lose it.

@Habbie
Copy link
Member

Habbie commented Apr 14, 2020

That optimisation has been requested for the pipebackend, I did not check if we do that today.

@rgacogne
Copy link
Member Author

In fact it looks like this optimization is gone in both the Bind and LMDB backends since 6678e5a (#8050).

@rgacogne
Copy link
Member Author

The more I look into this "optimization" the less I'm actually convinced it ever worked that way in the bind backend.

@rgacogne
Copy link
Member Author

Unless I'm missing something huge, none of the backends I know of (including the not merged DLSO and Cassandra backends) overrides DNSBackend::getAuth(). DNSBackend::getAuth() directly calls DNSBackend::getSOA(), which is not overridden either. DNSBackend::getSOA() does not support sending a higher SOA than the one asked for, and it actually forces the qname to the requested one so sending a higher SOA would not be ignored and cause issues.


for (auto& rec : anyRecs) {
if (q.qtype.getCode() == QType::ANY || rec.dr.d_type == q.qtype.getCode()) {
rrs.push_back(std::move(rec));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emplace_back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the move-constructor be called in the exact same way in both cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. i just wonder if it could be rrs.emplace_back(rec)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would allocate and copy two DNSNames and one shared_ptr<DNSRecordContent> instead of moving them, so I don't think that would be better.

@rgacogne
Copy link
Member Author

rgacogne commented Jun 2, 2020

After discussing it ith @mind04 on the IRC channel, this might cause issues with some setups involving multiple backends. Unless we are willing to take the risk of breaking some setups, we should either limit this new behaviour to single-backend setups (easiest) or make it configurable (enable this optimization for most multiple-backends setups while not breaking the other ones).

@zeha
Copy link
Collaborator

zeha commented Jun 2, 2020

If I read the diff right, only setups that have the same name in multiple backends could be affected?
If this is true, and if delegations from one backend to another work - great, and those mystical people that do other weird shit need to step out of the shadows and explain what they are doing.

@rgacogne
Copy link
Member Author

rgacogne commented Jun 2, 2020

Yes, my worry is that it might be more common that we think for custom backends, in order not to implement DNSSEC there and delegate that to a second backend instead. That's quite ugly in my humble opinion since it requires duplicating the names and types in the second backend, but it clearly exists..

@zeha
Copy link
Collaborator

zeha commented Jun 2, 2020

My Problem with this is: there are only rumors of this existing, and nobody can say if this would be a supported thing to do or not. The mere rumor of it existing prevents making stuff better for everyone else.
If this is a usecase, maybe something actually well-supported should be implemented instead.
However my fear is, until we actually break these setups, no code will be written to make anything better - nothing will ever change.

@mind04
Copy link
Contributor

mind04 commented Jun 2, 2020

Limiting this behaviour to singe backend setups is a nice middle ground. And maybe add a configuration option to enable this for multi backed setups as well.
This will make it better for most people, without breaking more complex multi backend setups.

@klaus3000
Copy link

I just wonder if this approach can be extend to fetch all RRs of a zone from the backend and filtering the label afterwards. Then all RRs of a zone can be put in the query cache. And if PDNS knows know that the query cache either contains all or no RRs of a zone, it could defeat random subdomain attacks as the NXDOMAIN is a result of not finding the requested label in the query cache, with having to ask the backend for every query.

@rgacogne
Copy link
Member Author

I just wonder if this approach can be extend to fetch all RRs of a zone from the backend and filtering the label afterwards. Then all RRs of a zone can be put in the query cache. And if PDNS knows know that the query cache either contains all or no RRs of a zone, it could defeat random subdomain attacks as the NXDOMAIN is a result of not finding the requested label in the query cache, with having to ask the backend for every query.

There are several issues with that idea, mostly because the underlying database might be updated, and records might have different TTLs which might mean that we would need to invalidate the whole records cache after the shortest TTL expires.
I think the closest that we can get to that would be:

  • an in-memory database like the bind or LMDB backends ;
  • a (partial?) implementation of rfc8198 in the auth.

@klaus3000
Copy link

klaus3000 commented Jun 22, 2020

I just wonder if this approach can be extend to fetch all RRs of a zone from the backend and filtering the label afterwards. Then all RRs of a zone can be put in the query cache. And if PDNS knows know that the query cache either contains all or no RRs of a zone, it could defeat random subdomain attacks as the NXDOMAIN is a result of not finding the requested label in the query cache, with having to ask the backend for every query.

There are several issues with that idea, mostly because the underlying database might be updated, and records might have different TTLs which might mean that we would need to invalidate the whole records cache after the shortest TTL expires.

Since when does PDNS Auth respect TTLs for packet/query cache? Actually I think that TTLs should NOT be considered for these caches. TTLs are designed for recursive DNS. Cache policies of the Auth are a decision of the DNS operator - regardless what the TTL says.

I think the closest that we can get to that would be:

  • an in-memory database like the bind or LMDB backends ;

Lacks replication. I do not know LMDB details, but SOA checks and AXFR does not scale to millions of zones and hundred secondaries.

  • a (partial?) implementation of rfc8198 in the auth.

Sounds good. But - when you have a random subdomain attack, within a second the aggressiveNSEC(3) cache is filled with the complete zone - so not much difference then loading the whole zone.

Anyways agressiveNSEC(3) caching sounds useful also to reduce the backend queries also without random subdomain attacks. Of course the technique should also be used for queries without the +DO bit set, to improve also non-DNSSEC queries.

@rgacogne
Copy link
Member Author

Since when does PDNS Auth respect TTLs for packet/query cache?

We have capped the TTD of the packet and the query caches with the lowest TTL for as long as I can remember.

Actually I think that TTLs should NOT be considered for these caches. TTLs are designed for recursive DNS. Cache policies of the Auth are a decision of the DNS operator - regardless what the TTL says.

I would agree with you if all backends could properly detect a change in the zone, which is not the case for database backends. That means we need to keep the TTD duration in the caches quite small, which reduces the effect of loading the whole zone in memory a lot.
Note that it might not be possible to load everything in for quite some setups anyway, because they have a huge number of very large zones that would not fit in memory at the same time.

Sounds good. But - when you have a random subdomain attack, within a second the aggressiveNSEC(3) cache is filled with the complete zone - so not much difference then loading the whole zone.

The SOA, NSEC and corresponding RRSIG records are loaded, not all the records. This might make a significant difference.

Anyways agressiveNSEC(3) caching sounds useful also to reduce the backend queries also without random subdomain attacks. Of course the technique should also be used for queries without the +DO bit set, to improve also non-DNSSEC queries.

Agreed.

@klaus3000
Copy link

Since when does PDNS Auth respect TTLs for packet/query cache?

We have capped the TTD of the packet and the query caches with the lowest TTL for as long as I can remember.

I was not aware of this. It should be mentioned in the docs.

Actually I think that TTLs should NOT be considered for these caches. TTLs are designed for recursive DNS. Cache policies of the Auth are a decision of the DNS operator - regardless what the TTL says.

I would agree with you if all backends could properly detect a change in the zone, which is not the case for database backends. That means we need to keep the TTD duration in the caches quite small, which reduces the effect of loading the whole zone in memory a lot.

I use a database backend. Actually, when I as DNS operator sets a query cache of X, but my customer sets a TTL of Y<X, the PDNS ignores the Admin and follows the user. IMO not fine. So, a TTL should be that TTL regardless of record values - IMO.

Note that it might not be possible to load everything in for quite some setups anyway, because they have a huge number of very large zones that would not fit in memory at the same time.

OF course it should be a config option. I guess for fast backends which do not suffer during random subdomain attacks there is no need to cache the whole zone as the whole zone is already in memory. And if someone with hughe zones uses a DB backend than it probably will not use this feature.

Sounds good. But - when you have a random subdomain attack, within a second the aggressiveNSEC(3) cache is filled with the complete zone - so not much difference then loading the whole zone.

The SOA, NSEC and corresponding RRSIG records are loaded, not all the records. This might make a significant difference.

In my experience the size of "normal" RRs can be neglected compared to NSEC(3) and RRSIG RRs.

Anyways agressiveNSEC(3) caching sounds useful also to reduce the backend queries also without random subdomain attacks. Of course the technique should also be used for queries without the +DO bit set, to improve also non-DNSSEC queries.

Agreed.

Of course there always exists a certain usecase were things may get worse than better. Hence, such features should always be a config option for power users.

@rgacogne
Copy link
Member Author

rgacogne commented Jul 7, 2020

Limiting this behaviour to singe backend setups is a nice middle ground. And maybe add a configuration option to enable this for multi backed setups as well.
This will make it better for most people, without breaking more complex multi backend setups.

I pushed a commit making that behaviour configurable. It's enabled by default, even in multi-backend setups, because I believe it will be fine for most users and will bring a noticeable performance improvement.

@zeha
Copy link
Collaborator

zeha commented Jul 7, 2020

While I think the config option is a good idea, maybe we want to name it differently, so further performance improvements can be done under the same option. IIRC consistent-backends was suggested at some point.

@rgacogne
Copy link
Member Author

rgacogne commented Jul 9, 2020

While I think the config option is a good idea, maybe we want to name it differently, so further performance improvements can be done under the same option. IIRC consistent-backends was suggested at some point.

Agreed, I'll change the name and the description so it means that can all records for a given name should be unique to a backend. Or perhaps we should have a setting to declare that zones are not spread across multiple backends, which is a bit more strict but would allow more optimizations later?

@zeha
Copy link
Collaborator

zeha commented Jul 9, 2020

Agreed, I'll change the name and the description so it means that can all records for a given name should be unique to a backend. Or perhaps we should have a setting to declare that zones are not spread across multiple backends, which is a bit more strict but would allow more optimizations later?

@Habbie I hoped you'd chime in here.

I think "no overlays" would make sense as a general idea. Supporting real, delegated sub-zones should still work IMO?

@Habbie
Copy link
Member

Habbie commented Aug 12, 2020

I believe this is good to merge once the option is renamed and defaulted to off!

@Habbie Habbie added this to the auth-4.4.0-alpha1 milestone Aug 18, 2020
@Habbie
Copy link
Member

Habbie commented Aug 19, 2020

I believe this is good to merge once the option is renamed and defaulted to off!

I pushed a commit for this.

@zeha
Copy link
Collaborator

zeha commented Aug 19, 2020

circle-ci build-auth failed with:

unknown location(0): fatal error: in "test_ueberbackend_cc/test_multi_backends_overlay_name": unknown type
test-ueberbackend_cc.cc(779): last checkpoint: "test_multi_backends_overlay_name" test entry
Undefined but needed argument: 'consistent-backends'

rgacogne and others added 5 commits August 19, 2020 12:42
Most of our backends have a very high latency, meaning that it takes
a long time to send a query and get the answer, regardless of whether
we are asking for one type or several.
Our code base often asks for a specific type, and our current code
stores separately the answers for ANY queries and the ones for a
specific type. This seems wasteful since the answer to an ANY query
already contains the records for a more specific one, and our
in-memory records cache is must faster than going to the backend.
We could save a round-trip by looking for ANY answers when we don't
find a specific one in the record cache, but our first query is often
for the specific NS type because we are looking for a referral.
This PR converts all lookups to an ANY lookup instead, making sure
that we fill the cache as fast as possible to save round-trips to
the backend later.
My tests showed roughly one third less queries to the backend in simple
cases, and probably more in DNSSEC cases, while achieving higher QPS
boundaries (~ +30%). CPU usage is also significantly reduced while
replying a real-world PCAP.
The number of entries in the records cache is also significantly
lower since we don't need to store a record twice, for ANY and
for the exact type itself.

We could easily enable that change for specific backends only if
we believe it might have a negative effect on some of them, although
testing with the bind backend showed a slight improvement there as well,
even though lookups in the bind backend are already quite fast.
I have not tested LMDB.

We could reduce the number of round-trips to the backend even more
by getting rid of the 'SOA' special case, since I'm not aware of
any backend currently implementing it in a special way.
Counting the numbers of queries sent to the backend(s), instead of
relying on the number of cache misses.
It controls whether we only send 'ANY' lookups to our backend, instead
of a mix of 'ANY' and exact types. This behaviour is enabled by default
since it should save a lot of round-trips for most setups, but can be
disabled for multi-backends setups that require it.
@Habbie
Copy link
Member

Habbie commented Aug 19, 2020

Thanks, that confused me for a bit, but I have now seen the light and am rebasing :)

@Habbie
Copy link
Member

Habbie commented Aug 19, 2020

Rebased, fixed, pushed. This passes locally. If I enable consistent backends, make check no longer passes, will investigate this.

Copy link
Contributor

@mind04 mind04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones. Please add logic to prevent/detect answers from multiple zones in the cache.

d_question.qname=shorter;
addNegCache(d_question);
d_question.qname = shorter;
addNegCache(d_question, d_question.qtype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_question.qtype and d_question.zoneId are uninitialized here when getAuth was called with cachedOk = false

Also passing d_question and d_question.qtype seems redundant. The same applies to addCache() a few lines down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones.

Will fix, thanks!

Also passing d_question and d_question.qtype seems redundant. The same applies to addCache() a few lines down.

We actually need to be passe to override the qtype in some cases, because d_question.qtype holds the requested qtype but when s_doANYLookupsOnly is set we want to store records for ANY in the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you can update d_question.qtype with ANY when s_doANYLookupsOnly is set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, how would we know the requested qtype then, so that we can filter the records in UeberBackend::get(), if we don't store it in d_question.qtype? d_handle.qtype is already set to ANY in that case so we do the correct lookup when we iterate over backends in UeberBackend::handle::get().

pdns/ueberbackend.cc Outdated Show resolved Hide resolved
pdns/ueberbackend.cc Outdated Show resolved Hide resolved
rr=*d_cachehandleiter++;;
if (d_cached) {
if (d_cachehandleiter != d_answers.end()) {
rr = *d_cachehandleiter++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a check here to make sure the zone_id in rr matches the id passed to lookup()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QC.getEntry already uses the zoneId for the cache lookup, no?

@rgacogne
Copy link
Member Author

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones. Please add logic to prevent/detect answers from multiple zones in the cache.

I'm wondering how that works today? I'm not sure this PR changes that behaviour much, the SOA should not appear in more than one zones, the NSs will if we have a parent and a child zone but wasn't that already the case?

@zeha
Copy link
Collaborator

zeha commented Sep 14, 2020

All lookups with id = -1 (SOA and lookups in findNS()) may result in answers from multiple zones. Please add logic to prevent/detect answers from multiple zones in the cache.

I'm wondering how that works today? I'm not sure this PR changes that behaviour much, the SOA should not appear in more than one zones, the NSs will if we have a parent and a child zone but wasn't that already the case?

I gave this a stern look and would agree. IMO there are two remaining B->getSOA cases that could go away; and then we're down to a) FindNS and b) pdnsutil benchmark for callers passing -1. None of them should have a problem here though.

@zeha zeha mentioned this pull request Sep 15, 2020
7 tasks
@Habbie Habbie modified the milestones: auth-4.4.0-alpha1, auth-4.4.0-alpha2 Sep 24, 2020
@zeha
Copy link
Collaborator

zeha commented Oct 12, 2020

I'm running this for today (in combination with #9464) and so far it's looking good.

@zeha
Copy link
Collaborator

zeha commented Oct 12, 2020

For some reason this needs a trivial rebase on master when merging locally, but I don't see why.

@Habbie Habbie modified the milestones: auth-4.4.0-alpha2, auth-4.5.0-alpha0 Oct 28, 2020
@Habbie
Copy link
Member

Habbie commented Oct 28, 2020

I merged #9483 (inspired by, and partially taken from, this PR) instead, as it appears to give the same benefits, while changing a lot less code. Thanks!

@Habbie Habbie closed this Oct 28, 2020
@rgacogne rgacogne deleted the auth-cache-any branch October 28, 2020 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants