Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rec: handle negative answers with scope > 0 as if they have scope 0 #10994

Closed
seanzhang422 opened this issue Nov 17, 2021 · 8 comments · Fixed by #11010
Closed

rec: handle negative answers with scope > 0 as if they have scope 0 #10994

seanzhang422 opened this issue Nov 17, 2021 · 8 comments · Fixed by #11010
Labels

Comments

@seanzhang422
Copy link

  • Program: Recursor
  • Issue type: Bug report

Short description

When dnssec and ecs are both enabled, query to tosei-hotel.co.jp.e.afq.hp.transer.com.cdn.cloudflare.net will timeout with servfail and trace-regex shows looped DNS recursion. This can be reproduced on latest 4.4 and 4.3 but not 4.5.

Environment

Steps to reproduce

  1. Install pdns-recursor 4.4.7 on Ubuntu Focal
  2. recurse.conf
dnssec=validate
local-address=0.0.0.0, ::
loglevel=9
threads=1
edns-subnet-whitelist=cloudflare.net
  1. Run
rec_control trace-regex 'tosei-hotel\.co\.jp\.e\.afq\.hp\.transer\.com\.cdn\.cloudflare\.net' &&
    dig @127.0.0.1 tosei-hotel.co.jp.e.afq.hp.transer.com.cdn.cloudflare.net`

Expected behaviour

Resolution should succeed.

Actual behaviour

Servfail response.

Other information

@omoerbeek omoerbeek added the rec label Nov 17, 2021
@omoerbeek
Copy link
Member

Thanks for the report. I can reproduce on 4.4.x.
The recursion is actually not infinite. If you set max-qperq=260 the resolving works.
With 4.4.x, it looks like this hits a case where the caching of some records is not happening or is not effective, so the recursor needs a lot of queries to resolve the name.

Will investigate some more.

@omoerbeek
Copy link
Member

On further analysis: cloudflare is not following

https://datatracker.ietf.org/doc/html/rfc7871#section-7.4.

Specifically, negative answers should use zero scope. However:

$ dig cdn.cloudflare.net DS @173.245.59.31 +dnssec +subnet=192.0.2.1/24

; <<>> DiG 9.16.22 <<>> cdn.cloudflare.net DS @173.245.59.31 +dnssec +subnet=192.0.2.1/24
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9541
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; CLIENT-SUBNET: 192.0.2.0/24/24

The consequence is that we do not cache this negative indication and re-query this all the time for zone-cut determination.
4.5.x does not suffer because it has a completely different way of determining zone cuts.

@omoerbeek
Copy link
Member

I changed the title to reflect the root cause of the issue.

@omoerbeek omoerbeek changed the title rec<4.5: Infinite recursion loop when dnssec and ecs are both enabled. rec: handle negative answers with scope > 0 as if they have scope 0 Nov 17, 2021
@seanzhang422
Copy link
Author

Thank you Otto for your quick response. For me the max-qperq has to be set to 400 for 4.4.7 to resolve this name. It will still timeout on first query due to the number of queries required but pdns will cache the answer for later queries. I will report to cloudflare and see if they can do anything about it.

Meanwhile, according to RFC7871, it seems OK for resolver to assume scope 0 for negative answers. In reality, I can't think of many scenarios that scoped negative answer would be useful. Can we expect this interoperability issue to be worked around in 4.4.x?

@omoerbeek
Copy link
Member

omoerbeek commented Nov 18, 2021

We are currently investigating how to fix this on the recursor side. After we have a fix, we will check if it can be backported to 4.4.x Thank you for reporting this to Cloudflare!

@hunts
Copy link

hunts commented Nov 22, 2021

Hi from Cloudflare DNS team. Thank you for reporting the issue, we've released a fix for it. You should find it works as expected now.

@omoerbeek
Copy link
Member

I'm now seeing /0 for the negative answers indeed. This is not enough to solve the issue on our side though, since an intermediate result carrying answer records (e.g. as a result of an A query) still gets a scope > 0, marking the whole resolving process variable. So the fix in #11010 is still needed and should be backported.

@omoerbeek
Copy link
Member

As a workaround until a fixed 4.4.x is released , setting qname-minimization=off avoids the problems in cases I tried, though in theory if a scoped intermediate result lands in the cache via another query I think the issue could still be hit.

webfutureiorepo pushed a commit to webfutureio/pdns that referenced this issue Jun 14, 2022
* dnsdist: Don't copy the header twice when editing the ID

As suggested by Otto.

* dnsdist: Make ConnectionToBackend::release() purely virtual

* dnsdist: Also allocate buffers and connect sockets for CLI backends

* Some things you should do when handling untrusted zone files.

* dnsdist: Do not try to reconnect UDP sockets for TCP-only backends

* wipe-cache-typed  should check if a qtype arg is present and valid

* dnsdist: Handle existing EDNS content for SetMacAddrAction/SetEDNSOptionAction

* A SHA-384 DS should not trump a SHA-256 one, so only potentially zap SHA-1

* Reformat a bunch of recursor specific files

* dnsdist: Remove the useless step parameter in TCP and HTTP/2 unit tests

As suggested by Otto.

* Fix a linking issue with GCC 11.1.0

* dnsdist: Fix build without nghttp2

* dnsdist: Add the ability to retain select capabilities at runtime

* Fix regression of carbon-ourname introduced in 58d7ad6
Noted by @mnordhoff

* dnsdist: Reply with NODATA for non-SVC types in the SVC documentation

* Carbon regression test for rec and auth.

Copied from dnsdist one. That test uses two difference instance names, but
rec and auth settings do not allow that.

* dnsdist: Only update 'ednsAdded' and 'optionAdded' if we process the query

* Move to a stream based socket for the control channel

* Use FDWrapper

* Do not read further than the length we received, the string might
be followed by a passed fd.

Interesting to see that OpenBSD chops up recvs based on the sends,
while Linux is happy to read more than was passed to the corresponding
send call if another send was called after that.

* Update security.rst

* Update security.rst

* Remove debug print line flooding logs

Remove debug log line printing a statement into stdout all the time.

* Initial stab at running the recursor regress and bulk tests on GH workflows

Some issues had to be worked around: libfaketime and bulk test network load

* Reduce size of bulk test and run it using a matrix to vary settings

* Separate bulk test deps from regression test deps

* Rename bulk test to make it clear it is mini

* Prep for rec-4.4.7 and rec-4.5.7

* dnsdist: Add 'InCsumErrors' UDP metric

* rec: Add 'InCsumErrors' UDP metric

* auth: Add 'InCsumErrors' UDP metric

* dnsdist: Add IPv6 UDP error metrics

* dnsdist: Fix dumpStats() formatting

* auth: Add IPv6 UDP error metrics

* rec: Add IPv6 UDP error metrics

* dnsdist: Document new UDP error metrics

* Add 'csum' to the list of allowed words (SNMP checksum metrics)

* Remove tabs in misc.cc

* dnsdist: Remove one last forgotten boost::bind in dnsdist.hh

* Remove dnsdist specific tests that remained

* Credentials: EVP_PKEY_CTX_set1_scrypt_salt() takes an `unsigned char*`

* auth: pdns_control needs to be linked against libcrypto now

* dnsdist: Disable 'IncludeDir' tests on GH actions

* Update dnsdist.cc

* cast arg of EVP_PKEY_CTX_set1_pbe_pass() to const void * to satisfy both openssl 1.1 and 3.0

* auth, rec: put some json on /api/v1

* remove a bunch of unnecessary &

* Reformat

* Update pdns/dnsdist.cc

Co-authored-by: Remi Gacogne <github@coredump.fr>

* some updates to CONTRIBUTING.md

* If possible, use SuffixMatchNodeRule() instead of RegexRule

* Update CONTRIBUTING.md

Co-authored-by: Otto Moerbeek <otto.moerbeek@open-xchange.com>

* better docblock note

* namespace

* auth: improve SOA dnsname exception handling

* Fix quad9 example servers that had run together

* dnsdist: Remove unreachable code in HTTP/2 connections cleanup

Reported by Coverity (CID 373724).

* dnsdist: Fix missing 'continue's when cleaning the connections cleaning code

* dnsdist: Refactoring of the TCP connection caches

* dnsdist: Fix a race condition in the XFR regression tests

* dnsdist: Add a unit test for the outgoing connection cache

* dnsdist: Fix formatting of the connections cache unit tests

* dnsdist: Split the list of downstream connections in two, active and idle

This way we can easily keep track of how many idle connections we have,
and try to reuse these first.

* dnsdist: Test that reused connections are moved to the active queue

* dnsdist: Formatting...

* auth: api, check qtype location. Some types only live apex and some are not allowed (or useful) there.

* auth: check domain_id in the info-all-master-query

* auth: api, remove CDS and CDNSKEY for now, in favor of https://www.ietf.org/archive/id/draft-thomassen-dnsop-dnssec-bootstrapping-02.html

* Review remarks from Habbie

* auth: make the zonecache more robust for bad data and save some SOA queries
for dnssec zones

* Use a global timeout for the various recv's we're doing to get a control message

* Limit max arg length

* Use sysconf if needed to get ARG_MAX

* Prep for rec-4.6.0-beta1

* Correct versionadded

* Fix formatting for versionadded

* auth bindbackend: skip rejected zones during list and search, fixes PowerDNS#10885

* stop saying mysql is a good choice for performance

* improve chroot text

* auth-4.5.2: secpoll and changelog

* We need libcurl dev lib for the zone-to-cache function.

Also fix config summary line and print curl feature on --version

* Clarify docs

* Move asan plus ubsan settings to strategy, in preparation for also doing tsan

* Remove other remains of GnuTLS config that was never useful

* Enable tsan build plus tests for rec GH actions

* Run fewer CircleCI bulk tests. These are covered by builbot anyway

* Supress g_stats data races

* Disable bulk test with TSAN for now and add some debug code to api test

* print stderr and stdout for api test

* rec: Allow worker threads to send tasks to handler thread

Extend the ThreadMSG mechanism to allow worker threads to submit
tasks to be executed by the handler thread (one-way only, no
answers can be returned).

* rec: Refactor cache-wiping code into a common function

Eliminates multiple copies of the code and eliminates
inconsistencies between them.

* rec: Add support for NOTIFY operations to wipe cache entries

NOTIFY operations can be sent to trigger removal of cache entries which
match the zone specified in the operation. All entries, regardless of
type, in or below the specified zone, are removed.  Control over
permission to send such operations is provided by an ACL, and control over
zones which can be wiped is provided by a new configuration setting.

The default configuration ignores all NOTIFY operations.

This patch adds:

* 'allow-notify-from' and 'allow-notify-from-file' settings, operating
  almost identically to 'allow-from' and 'allow-from-file' (the only
  difference being the default value).

* 'allow-notify-for' and 'allow-notify-for-file' settings, which provide
  a list of zones for which NOTIFY operations are allowed.

* modification to 'forward-zones-file' setting, allowing zones specified
  there to optionally allow NOTIFY operations.

* 'source-disallowed-notify' metric, counting the number of NOTIFY operations
  which have been denied by the ACL.

* 'zone-disallowed-notify' metric, counting the number of NOTIFY operations
  which have been denied by the zone list.

* API support for modifying 'allow-notify-from' ACL.

* Regression tests for new ACL settings.

* Disable the actual connect() in the test_dnsdisttcp_cc_c unit tests.

They are not needed and cause (at least on OpenBSD) firewall state table
clashes: they remain in a embryotic state because no actual activity
occurs on them due to the rest of the tests using mockup code.

tcpiohandler.cc is not linked into the tests, so define it locally in
test-dnsdisttcp_cc.cc as well.

* Update pdns/recursordist/RECURSOR-MIB.txt

Co-authored-by: Otto Moerbeek <otto.moerbeek@open-xchange.com>

* Return the proper ede on validation failure; fixes PowerDNS#10936

* dnsdist: Apply suggestions from code review (thanks, Otto!)

* Add test

* Add a test for ede sig expired that does not rely on external servers

* some more ()'s for readability

Co-authored-by: Remi Gacogne <github@coredump.fr>

* More strict secpoll check; hope I got the yaml quoting right

* Move check to a shell script

* tweaks and undo error entry

* no else after exit

Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* auth 2136: improve some log messages

* auth 2136: apply new TTL to whole RRset, not only to the added record

fixes PowerDNS#10921

* indent

* dnsdist: Implement filesystem pinning for eBPF maps

This makes the filter (v4, v6 and qnames) maps persistent across a
restart and allow external programs to read and update them without
the need to use dnsdist's console.

* dnsdist: Add a new eBPF map format, support external eBPF programs

Supporting external eBPF programs makes it possible to populate the
eBPF tables from dnsdist, manually or via our dynamic blocking mechanisms,
but to actually do the filtering in an external program, like an XDP one.

We cannot increase the size of eBPF programs if we want to stay
below 4k instructions for older kernels, so this commit implements
a compatibility layer with the new map format.

The 4k limit for unprivileged was removed in 5.2 but the complexity limit remains:
The complexity limit was actually changed several times since the
32k value from its introduction in Linux 3.18: it was raised to 64k
in Linux 4.7, then to 96k in Linux 4.12, again to 128k in Linux 4.14,
and at last to 1M in Linux 5.2.

* dnsdist: Add a sample XDP program and associated python script in contrib

Both contributed by Pierre Grié <pierre.grie@nameshield.net>.

* contrib/xdp.py: Apply the change suggested by Pieter (thanks!)

Co-authored-by: Pieter Lexis <pieter@plexis.eu>

* dnsdist: Apply suggestions from code review on the new eBPF map type

* dnsdist: Switch to a uint8_t for the XDP match action type

* dnsdist: Add ChangeLog and secpoll update for 1.7.0-beta1

* spellcheck: Allow 'XDP', 'Grié'

* dnsdist: Fix the description of 10920 in the ChangeLog, remove useless entries

* Basic notify test.

It turns out a notify increments cache-hits, that feel a bit strange.

(cherry picked from commit 8662d26)

* Do not count notifies in record cache hits/misses

(cherry picked from commit 0fd8cf2)

* Positive instead of negative test

Co-authored-by: Pieter Lexis <pieter@plexis.eu>
(cherry picked from commit 5a642e0)

* Include sys/time.h; needed on musl; fixes PowerDNS#11000

(cherry picked from commit 671ca0d)

* Fix logic botch introduced by notify handing

See PowerDNS#10751, some extra scrutiny review is needed to make sure no
other similat issue remains.

(cherry picked from commit 09a22e8)

* When we drop a notify over TCP, terminate the connection

(cherry picked from commit 9aa6eec)

* Do cache negcache results, even when wasVariable() is true

See https://datatracker.ietf.org/doc/html/rfc7871#section-7.4
Fixes PowerDNS#10994

(cherry picked from commit 2bcec14)

* Condition to HAVE_SYSTEMD_WITH_RUNTIME_DIR_ENV is reversed

(cherry picked from commit fc1f6fb)

* Disable tsan regression runs for rec for now, there is a failure
mode that if it hits makes almost all remaining test fail.  Symptom
is that the auths do not start up properly.

(cherry picked from commit 2ef0d14)

* Fix v6 setup and start using a more modern auth on circleci

(cherry picked from commit 2d0fc47)

* One more occurence of --local-ipv6

(cherry picked from commit 9b3fc86)

* Fix error in test zone that auth-45 does not like

(cherry picked from commit f1f41a8)

* Do not generate eventtrace records if no Lua hook is defined

(cherry picked from commit 6a94813)

* Remove capability requirements from Docker images

(cherry picked from commit f28c81e)

* Additional note on Docker Engine version where the requirement of the additional capability was dropped

(cherry picked from commit 07b24e5)

* Two more features to print

(cherry picked from commit 030c376)

* Try shorter thread names

https://man7.org/linux/man-pages/man3/pthread_setname_np.3.html

       ... The thread name is a
       meaningful C language string, whose length is restricted to 16
       characters, including the terminating null byte ('\0').

(cherry picked from commit f3813e0)

* Make trySetThreadName static

Co-authored-by: Otto Moerbeek <otto.moerbeek@open-xchange.com>
(cherry picked from commit 80f15f6)

* rec-4.6.x: Fix build with OpenSSL 3.0.0

(cherry picked from commit 8535f66)

* CI: dnspython 2.2.0 breaks auth and ixfrdist testing, pin to 2.1.0

(cherry picked from commit 46a40ed)

* servfail.nl changed theuir setup.

I think in an ideal world we should return 9 (DNSSEC key missing) but I don't see a
easy way to do that at the moment.

(cherry picked from commit 2cd34ba)

* stop testing auth+dnsdist on rec branch

* builder: archs for oraclelinux-8; el-8 symlinks

(cherry picked from commit dee53cf)

* builder CI: switch oraclelinux-8 to el-8

cleanup

(cherry picked from commit 1b27721)

* take centos 8-stream from quay

(cherry picked from commit 2f9eddd)

* builder: add el-7 alias for centos-7

(cherry picked from commit 6bd3c9b)

* add ubuntu jammy build target

(cherry picked from commit 6c1e5fd)

* test ubuntu jammy build target

(cherry picked from commit e3d5079)

* rec: Reject non-apex NSEC(3)s that have both the NS and SOA bits set

Ancestor NSEC(3)s have the SOA bit clear (delegation), and the remaining
non-apex ones should not have the NS set.

(cherry picked from commit be5d851)

* rec: The NSEC3 ancestor check must be done against the original owner name

(cherry picked from commit f37a904)

* NSEC -> NSEC3

Co-authored-by: Otto Moerbeek <otto.moerbeek@open-xchange.com>
(cherry picked from commit 78cee42)

* NSEC -> NSEC3

Co-authored-by: Otto Moerbeek <otto.moerbeek@open-xchange.com>
(cherry picked from commit c67b13a)

* If we get a CNAME when asking for a DS, we should give up and return vState::BogusUnableToGetDSs

(cherry picked from commit 271ae63)

* Add a test for the case where an (Insecure) domain fails to get a DS
record because of a CNAME loop, avoiding a SERVFAIL.

(cherry picked from commit c10acee)

* Fix indent

(cherry picked from commit 5db4dca)

* Initialize isNew before calling a exception throwing function

(cherry picked from commit 4043238)

* Allow disabling of processing the root hints

This also make sure we use the right dnssec mode for processing hints
and changes a few log levels to Debug to be less verbose.

(cherry picked from commit e46b0f2)

* Upgrade guide and doc tweaks

(cherry picked from commit 067a807)

* Log an error if pdns.DROP is used as rcode in Lua callbacks

(cherry picked from commit f3f042e)

* Apply suggestions from code review

Co-authored-by: Remi Gacogne <github@coredump.fr>
(cherry picked from commit f7c973d)

* If we get NODATA on an AAAA in followCNAMERecords, try dns64

Fixes PowerDNS#11320

(cherry picked from commit aa59465)

* Add test case for PowerDNS#11320:  followCNAMERecords leads to a result that
should be subject to dns64 processing

(cherry picked from commit 63ad9c9)

* QType ADDR is supposed to be used internally only.

Should fix PowerDNS#11337

(cherry picked from commit 7a27879)

* Backport of 11300 to rec-4.6.x: Use the Lua context stored in SyncRes when calling hooks

* Be more careful using refresh mode only for the record asked.
Otherwise we get bad interaction with QM, as newly discovered
delegation points are stored in the cache, but not seen the QM
algorithm. Might/should fix PowerDNS#11371.

(cherry picked from commit 7502f5f)

* Reinstate refresh mode for {C,D}NAME cache lookups

(cherry picked from commit 3263b3a)

* auth, rec IXFR-in: Fix a case where an incomplete read caused by network error might result in a truncated zone.

As we might break from the loop early, we need to check if the end SOA was seen after the loop.
Also make sure we detect end conditions for both AXFR and IXFR style properly, to avoid processing
data after the end marker.

* CI: dnspython 2.2.0 breaks auth and ixfrdist testing, pin to 2.1.0

(cherry picked from commit 46a40ed)

* stop testing auth+dnsdist on rec branch

* builder: archs for oraclelinux-8; el-8 symlinks

(cherry picked from commit dee53cf)

* builder CI: switch oraclelinux-8 to el-8

cleanup

(cherry picked from commit 1b27721)

* take centos 8-stream from quay

(cherry picked from commit 2f9eddd)

* builder: add el-7 alias for centos-7

(cherry picked from commit 6bd3c9b)

* rec: Fix the path to the recursor's UBSan suppression file in forks

* docs: Pin jinja2 to < 3.1.0

Jinja2 3.1.0 removed deprecated code that is still used by sphinx
1.8.x, and it looks like our custom sphinx extensions are not working
with more recent versions of sphinx..

See:
- pallets/jinja#1631
- readthedocs/readthedocs.org#9037

and

- PowerDNS#7712

The exact error is:
```
Extension error:
Could not import extension sphinx.builders.latex (exception: cannot import name 'contextfunction' from 'jinja2' (/dnsdist/pdns/dnsdistdist/.venv/lib/python3.7/site-packages/jinja2/__init__.py))
```

(cherry picked from commit 92ad297)

* 4.6.x has no waitForTCPSocket plus counts are different due to rpz loading changes in master

* rec: Backport of 11496 to rec-4.6.x: Prevent segfault with empty allow-from-file and allow-from options

Co-authored-by: Remi Gacogne <remi.gacogne@powerdns.com>
Co-authored-by: Otto <otto.moerbeek@open-xchange.com>
Co-authored-by: Eugen Mayer <136934+EugenMayer@users.noreply.github.com>
Co-authored-by: Peter van Dijk <peter.van.dijk@powerdns.com>
Co-authored-by: phonedph1 <20867105+phonedph1@users.noreply.github.com>
Co-authored-by: Pieter Lexis <pieter.lexis@powerdns.com>
Co-authored-by: Remi Gacogne <github@coredump.fr>
Co-authored-by: Kees Monshouwer <mind04@monshouwer.org>
Co-authored-by: Nivex <nivex@nivex.net>
Co-authored-by: Frank Louwers <frank@louwers.be>
Co-authored-by: Kevin P. Fleming <kevin@km6g.us>
Co-authored-by: Kevin P. Fleming <kpfleming@users.noreply.github.com>
Co-authored-by: Otto Moerbeek <otto@pepper.intra.drijf.net>
Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
Co-authored-by: Remi Gacogne <rgacogne+github@valombre.net>
Co-authored-by: Pieter Lexis <pieter@plexis.eu>
Co-authored-by: Nico Vaatstra <nico.vaatstra@open-xchange.com>
Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants