Control listener aborts with weird message #1662

Closed
afics opened this Issue Aug 12, 2014 · 6 comments

Projects

None yet

3 participants

@afics
afics commented Aug 12, 2014

OS: Fedora 20
pdns version: 3.3.1, installed via yum from the official repos
Backends: BIND
DNSSEC: Yes

Aug 12 12:28:02 host pdns[2206]: Reload was requested
Aug 12 12:28:02 host pdns[2206]: Notification request for domain 'zone2' received from operator
Aug 12 12:28:02 host pdns[2206]: Zone 'zone2' (/zones/zone2) needs reloading
Aug 12 12:28:02 host pdns[2206]: Zone 'zone2' (/zones/zone2) reloaded
Aug 12 12:28:02 host pdns[2206]: Zone 'zone1' (/zones/zone1) needs reloading
Aug 12 12:28:02 host pdns[2206]: Zone 'zone1' (/zones/zone1) reloaded
Aug 12 12:28:02 host pdns[2206]: Fatal error in control listener: Zone for 'zone1' in '/zones/zone1' being reloaded
Aug 12 12:28:02 host pdns[2206]: Control listener aborted, please file a bug!

My actual zone names are in the form domain.tld
This happens after I run the following script to deploy my locally (re)signed zones.

scp zone1.signed root@host:
ssh root@host mv /root/zone1.signed /srv/dns/named/zones/zone1
ssh root@host pdns_control reload
ssh root@host pdns_control notify zone1
@Habbie Habbie added this to the auth-3.4.0 milestone Aug 12, 2014
@cmouse
Contributor
cmouse commented Aug 12, 2014

This is fully repeatable with just one zone. I used the following setup:

  • Make the environment read-only for pdns_server process
  • Provision one pre-signed bind zonefile and configure it
  • I used following settings:
allow-axfr-ips=127.0.0.0/8,::1
disable-axfr=no
disable-tcp=no
master=yes
setgid=nogroup
setuid=nobody
slave=yes
launch=bind
bind-config=/home/cmouse/pdns/etc/named.conf
bind-dnssec-db=/home/cmouse/pdns/etc/ro/bind.db
daemon=no 
loglevel=65535 
query-logging=yes 
log-dns-queries=yes 
  • Start powerdns process
  • Run pdns_control reload && pdns_control notify zonename repeatedly (3-10 times)
  • Wait for about 10 minutes
  • You can no longer get any responses for lookup or notify and with gdb you should see several threads stuck on pthreads_rwlocks_wrlocks() waiting for s_state_lock
@cmouse
Contributor
cmouse commented Aug 12, 2014

PowerDNS log when stuck

Aug 12 22:19:14 Reading random entropy from '/dev/urandom'
Aug 12 22:19:14 This is a standalone pdns
Aug 12 22:19:14 Listening on controlsocket in './pdns.controlsocket'
Aug 12 22:19:14 UDP server bound to 0.0.0.0:53
Aug 12 22:19:14 TCP server bound to 0.0.0.0:53
Aug 12 22:19:14 PowerDNS Authoritative Server git-rec-3.6.0-296-g19cf1f6 (cmouse@desk01) (C) 2001-2014 PowerDNS.COM BV
Aug 12 22:19:14 Using 64-bits mode. Built on 20140812200939 by cmouse@desk01, gcc 4.8.2.
Aug 12 22:19:14 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Aug 12 22:19:14 Set effective group id to 65534
Aug 12 22:19:14 Set effective user id to 65534
Aug 12 22:19:14 Creating backend connection for TCP
Aug 12 22:19:14 [bindbackend] Parsing 1 domain(s), will report when done
Aug 12 22:19:14 Master/slave communicator launching
Aug 12 22:19:14 [bindbackend] parsing 'z-ix.net' from file '/home/cmouse/pdns/etc/z-ix.net.txt'
Aug 12 22:19:14 Query: select content from domainmetadata where domain='z-ix.net' and kind='NSEC3PARAM'
Aug 12 22:19:14 [bindbackend] Done parsing domains, 0 rejected, 1 new, 0 removed
Aug 12 22:19:14 No new unfresh slave domains, 0 queued for AXFR already
Aug 12 22:19:14 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:19:14 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:14 Returning: 'SOA' of 'z-ix.net', content: 'ns0.z-ix.net. dnsmaster.z-ix.net. 2014081256 10800 3600 604800 10800', prio: 0
Aug 12 22:19:14 End of answers
Aug 12 22:19:14 No master domains need notifications
Aug 12 22:19:14 About to create 3 backend threads for UDP
Aug 12 22:19:14 Done launching threads, ready to distribute questions
Aug 12 22:19:21 Reload was requested
Aug 12 22:19:21 Notification request for domain 'z-ix.net' received from operator
Aug 12 22:19:21 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:19:21 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:21 Returning: 'SOA' of 'z-ix.net', content: 'ns0.z-ix.net. dnsmaster.z-ix.net. 2014081256 10800 3600 604800 10800', prio: 0
Aug 12 22:19:21 End of answers
Aug 12 22:19:21 Lookup for 'NS' of 'z-ix.net'
Aug 12 22:19:21 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:21 Returning: 'NS' of 'z-ix.net', content: '*.net', prio: 0
Aug 12 22:19:21 Returning: 'NS' of 'z-ix.net', content: '*.net', prio: 0
Aug 12 22:19:21 Returning: 'NS' of 'z-ix.net', content: '*.net', prio: 0
Aug 12 22:19:21 End of answers
Aug 12 22:19:21 Lookup for 'ANY' of 'ns0.q-ix.net'
Aug 12 22:19:21 Found no authoritative zone for ns0.q-ix.net
Aug 12 22:19:21 Lookup for 'ANY' of 'ns0.sotecware.net'
Aug 12 22:19:21 Found no authoritative zone for ns0.sotecware.net
Aug 12 22:19:21 Lookup for 'ANY' of 'ns0.z-ix.net'
Aug 12 22:19:21 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:21 Returning: 'A' of 'ns0.z-ix.net', content: '*.54', prio: 0
Aug 12 22:19:21 Returning: 'AAAA' of 'ns0.z-ix.net', content: '*:cafe:0:1', prio: 0
Aug 12 22:19:21 Returning: 'RRSIG' of 'ns0.z-ix.net', content: 'A 10 3 10800 20140911131133 20140812131133 57041 z-ix.net. g4Oi...', prio: 0
Aug 12 22:19:21 Returning: 'RRSIG' of 'ns0.z-ix.net', content: 'AAAA 10 3 10800 20140911131133 20140812131133 57041 z-ix.net. hm1A...', prio: 0
Aug 12 22:19:21 Returning: 'RRSIG' of 'ns0.z-ix.net', content: 'NSEC 10 3 10800 20140911131133 20140812131133 57041 z-ix.net. J/q4....', prio: 0
Aug 12 22:19:21 End of answers
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to *.54:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to *.248:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to [*:b01::1]:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to [*:cafe:0:1]:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to [*:1305::2]:53
Aug 12 22:19:22 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:22 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:22 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:23 Received unsuccessful notification report for 'z-ix.net' from *.54:53, error: Not Implemented
Aug 12 22:19:23 Removed from notification list: 'z-ix.net' to *.54:53Not Implemented
Aug 12 22:19:23 Received unsuccessful notification report for 'z-ix.net' from *.248:53, error: Query Refused
Aug 12 22:19:23 Removed from notification list: 'z-ix.net' to *.248:53Query Refused
Aug 12 22:19:25 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [2*:b01::1]:53: Network is unreachable
Aug 12 22:19:25 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:25 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:28 Reload was requested
Aug 12 22:19:28 Notification request for domain 'z-ix.net' received from operator
Aug 12 22:19:28 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:19:28 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:28 Returning: 'SOA' of 'z-ix.net', content: 'ns0.z-ix.net. dnsmaster.z-ix.net. 2014081256 10800 3600 604800 10800', prio: 0
Aug 12 22:19:28 End of answers
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to *.54:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to *.248:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to [*:b01::1]:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to [*:cafe:0:1]:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to [*:1305::2]:53
Aug 12 22:19:29 Error trying to resolve '[*:e101:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:e101:b01::1]:53: Network is unreachable
Aug 12 22:19:29 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:29 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to *:1305::2]:53: Network is unreachable
Aug 12 22:19:30 Received unsuccessful notification report for 'z-ix.net' from *.54:53, error: Not Implemented
Aug 12 22:19:30 Removed from notification list: 'z-ix.net' to *.54:53Not Implemented
Aug 12 22:19:30 Received unsuccessful notification report for 'z-ix.net' from *.248:53, error: Query Refused
Aug 12 22:19:30 Removed from notification list: 'z-ix.net' to *.248:53Query Refused
Aug 12 22:19:30 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:30 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:30 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:32 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:32 Error trying to resolve ' [*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:32 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:37 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:37 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:37 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:39 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:39 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:39 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:46 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:46 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:46 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:56 Notification for z-ix.net to [*:b01::1]:53 failed after retries
Aug 12 22:19:56 Notification for z-ix.net to [*:cafe:0:1]:53 failed after retries
Aug 12 22:19:56 Notification for z-ix.net to [*:1305::2]:53 failed after retries
Aug 12 22:20:03 Notification for z-ix.net to [*:b01::1]:53 failed after retries
Aug 12 22:20:03 Notification for z-ix.net to [*:cafe:0:1]:53 failed after retries
Aug 12 22:20:03 Notification for z-ix.net to *:1305::2]:53 failed after retries
Aug 12 22:20:14 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:20:14 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:20:14 Zone 'z-ix.net' (/home/cmouse/pdns/etc/z-ix.net.txt) needs reloading
Aug 12 22:20:14 Query: select content from domainmetadata where domain='z-ix.net' and kind='NSEC3PARAM'
Aug 12 22:27:56 Remote 127.0.0.1 wants 'z-ix.net|SOA', do = 0, bufsize = 1680: packetcache MISS
Aug 12 22:28:01 Remote 127.0.0.1 wants 'z-ix.net|SOA', do = 0, bufsize = 1680: packetcache MISS
Aug 12 22:28:06 Remote 127.0.0.1 wants 'z-ix.net|SOA', do = 0, bufsize = 1680: packetcache MISS
Aug 12 22:28:16 Reload was requested
Aug 12 22:28:22 Notification request for domain 'z-ix.net' received from operator
Aug 12 22:28:22 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:28:22 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:28:22 Zone 'z-ix.net' (/home/cmouse/pdns/etc/z-ix.net.txt) needs reloading
Aug 12 22:28:22 Query: select content from domainmetadata where domain='z-ix.net' and kind='NSEC3PARAM'
 ^CAug 12 22:32:28 PDNS instance is going down, taking children with us
Killed
@cmouse
Contributor
cmouse commented Aug 12, 2014

It appears that the culrpit could be this, as this is the first thread to get stuck.

#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
#1  0x0000000000694eba in WriteLock::WriteLock (this=0x7ffff3b8b1d0, lock=0xb84c40 <Bind2Backend::s_state_lock>) at ../../pdns/lock.hh:63
#2  0x0000000000687e5d in Bind2Backend::safePutBBDomainInfo (bbd=...) at bindbackend2.cc:166
#3  0x000000000068fbba in Bind2Backend::queueReloadAndStore (this=0x7fffe4000a50, id=1) at bindbackend2.cc:856
#4  0x000000000069195f in Bind2Backend::lookup (this=0x7fffe4000a50, qtype=..., qname=..., pkt_p=0x0, zoneId=-1) at bindbackend2.cc:1063
#5  0x000000000070d7f8 in DNSBackend::getSOA (this=0x7fffe4000a50, domain=..., sd=..., p=0x0) at dnsbackend.cc:246
#6  0x00000000006891d2 in Bind2Backend::getUpdatedMasters (this=0x7fffe4000a50, changedDomains=0x7ffff3b8bcb0) at bindbackend2.cc:303
#7  0x000000000074c3d0 in UeberBackend::getUpdatedMasters (this=0x7ffff3b8bd90, domains=0x7ffff3b8bcb0) at ueberbackend.cc:298
#8  0x0000000000787c93 in CommunicatorClass::masterUpdateCheck (this=0xb85800 <Communicator>, P=0x7ffff3b8bd80) at mastercommunicator.cc:129
#9  0x000000000078c5aa in CommunicatorClass::mainloop (this=0xb85800 <Communicator>) at communicator.cc:102
#10 0x000000000078cc5d in CommunicatorClass::launchhelper (p=0xb85800 <Communicator>) at communicator.hh:173
#11 0x00007ffff7436182 in start_thread (arg=0x7ffff3b8c700) at pthread_create.c:312
#12 0x00007ffff5ea038d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
@cmouse
Contributor
cmouse commented Aug 19, 2014

The final reason was a double-locking in getUpdatedMasters, where the same lock is first locked read-only and then deeper in the stack separate attempt is made for write lock.

@ahupowerdns ahupowerdns added a commit that referenced this issue Aug 19, 2014
@ahupowerdns ahupowerdns should fix #1662, reloading while checking for domains that need to b…
…e notified in BIND, causing lock
c7287b6
@ahupowerdns ahupowerdns added a commit that closed this issue Aug 19, 2014
@ahupowerdns ahupowerdns should fix #1662, reloading while checking for domains that need to b…
…e notified in BIND, causing lock
c7287b6
@Habbie
Member
Habbie commented Aug 19, 2014

@afics @cmouse can you test?

@Habbie Habbie reopened this Aug 19, 2014
@cmouse
Contributor
cmouse commented Aug 19, 2014

I cannot repeat this issue anymore. So it would appear fixed.

@Habbie Habbie closed this Aug 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment