New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gmysql backend, dig axfr results in "Got bad packet: bad label type" #5494

Closed
thechile opened this Issue Jul 5, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@thechile

thechile commented Jul 5, 2017

I am running the following packages on Centos 7 server

pdns-backend-mysql-4.0.4-1pdns.el7.x86_64
pdns-4.0.4-1pdns.el7.x86_64
pdns-backend-geoip-4.0.4-1pdns.el7.x86_64
dnsdist-0.0.1559gc3b0dc6-1pdns.el7.x86_64
pdns-recursor-4.0.5-1pdns.el7.x86_64
...
MariaDB-common-10.1.24-1.el7.centos.x86_64
MariaDB-client-10.1.24-1.el7.centos.x86_64
MariaDB-shared-10.1.24-1.el7.centos.x86_64
MariaDB-server-10.1.24-1.el7.centos.x86_64

Every 2nd or 3rd AXFR for one of our zones fails with the below error, followed by 'some' number of bytes of data. In this example it was 3096 but it can be higher or lower.

;; Got bad packet: bad label type
3096 bytes

Every time i get a failed axfr, this error shows in logs.

pdns_server[1734]: Jul 05 06:35:01 AXFR of domain '______.com' initiated by 216.8.2.7
Jul 05 06:35:01 pdns_server[1734]: Jul 05 06:35:01 AXFR of domain '______.com' allowed: client IP 216.8.2.7 is in allow-axfr-ips
Jul 05 06:35:01 pdns_server[1734]: Jul 05 06:35:01 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
Jul 05 06:35:01 pdns_server[1734]: Jul 05 06:35:01 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
Jul 05 06:35:01 pdns_server[1734]: Jul 05 06:35:01 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
Jul 05 06:35:02 pdns_server[1734]: Jul 05 06:35:02 AXFR of domain '______.com' to 216.98.52.7 finished

We are sending various zones to 3rd party DNS providers and this specific is 347311 bytes, 5391 lines, when successfully dumped to file.

@pieterlexis

This comment has been minimized.

Member

pieterlexis commented Jul 5, 2017

Can you enable query-logging by running pdns_control set query-logging on and provide these logs.

Furthermore, can you tcpdump the failed AXFR so we can see why dig complains?

@thechile

This comment has been minimized.

thechile commented Jul 5, 2017

Here is the query log output, looking again at the logs.. this message is logged every time I do a AXFR.

I tried with another zone, next biggest which is 142078 bytes and 2456 lines/records. The dig AXFR never fails.. perhaps the zone size causes a problem ? Seems odd that it's intermittent.

pdns_server[1734]: Jul 05 08:15:43 AXFR of domain '____.com' initiated by 216.8.2.7
pdns_server[1734]: Jul 05 08:15:43 AXFR of domain '____.com' allowed: client IP 216.8.2.7 is in allow-axfr-ips
pdns_server[1734]: Jul 05 08:15:43 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
pdns_server[1734]: Jul 05 08:15:43 Query: SELECT content,ttl,prio,type,domain_id,disabled,name,auth FROM records WHERE disabled=0 and type=? and name=?
pdns_server[1734]: Jul 05 08:15:43 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
pdns_server[1734]: Jul 05 08:15:43 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
pdns_server[1734]: Jul 05 08:15:43 Query: select content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=? and domainmetadata.kind=?
pdns_server[1734]: Jul 05 08:15:43 Query: select content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=? and domainmetadata.kind=?
pdns_server[1734]: Jul 05 08:15:43 Query: select content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=? and domainmetadata.kind=?
pdns_server[1734]: Jul 05 08:15:43 Query: SELECT content,ttl,prio,type,domain_id,disabled,name,auth FROM records WHERE (disabled=0 OR ?) and domain_id=? order by name, type
pdns_server[1734]: Jul 05 08:15:43 AXFR of domain '____.com' to 216.8.2.7 finished

I can't share the tcpdump on github i'm afraid because it will expose too much data for a very important corporate zone.

@ahupowerdns

This comment has been minimized.

Member

ahupowerdns commented Jul 5, 2017

This defect is almost certainly caused/triggered by the contents of the zone. Can you run pdnsutil check-zone _____.com?
Finally, please read https://blog.powerdns.com/2016/01/18/open-source-support-out-in-the-open/ on the limits of our free support on corporate data that is too important to share.

@thechile

This comment has been minimized.

thechile commented Jul 5, 2017

here is output.. some dupes for CNAME and A records. Can fix them but wouldn't have thought they would cause AXFR to fail. We are upgrading from a old PDNS 2.x installation which runs the same zone and data... both are running in parallel and the axfr works fine against old server.

pdnsutil check-zone ___.com
Jul 05 08:36:40 Reading random entropy from '/dev/urandom'
Jul 05 08:36:40 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
Jul 05 08:36:40 gmysql Connection successful. Connected to database 'powerdns' on '/var/lib/mysql/mysql.sock'.
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
[Error] Duplicate record found in rrset: .....
Checked 5417 records of '___.com', 30 errors, 0 warnings.
@ahupowerdns

This comment has been minimized.

Member

ahupowerdns commented Jul 5, 2017

see what happens if you remove the duplicate records - we have little else to go on.

@thechile

This comment has been minimized.

thechile commented Jul 6, 2017

i removed dnsdist from the equation and the problem went away. If i perform the axfr direct to pdns then it works fine.. when via dnsdist it is broken.

@thechile

This comment has been minimized.

thechile commented Jul 6, 2017

So to start with i had the following configuration, so i could control the axfr whitelist from the pdns configuration but in my trials i wasn't able to get the EDNS settings to work with allow-axfr-ips. i.e. the origin source IP from dnsdist wasn't used during the allow-axfr-ips compare. Maybe someone can confirm if this should work.. does allow-axfr-ips compare source address from EDNS data passed from dnsdist ?

broken configuration

pdns.conf

allow-axfr-ips=1.1.1.1, 2.2.2.2
edns-subnet-processing=yes

dnsdist.conf

setECSOverride(true)
setECSSourcePrefixV4(32)
setECSSourcePrefixV6(128)
newServer({ ... useClientSubnet=true .. })

current configuration

But since the above didn't work i moved the whitelist back to dnsdist i.e.

axfr_whitelist = newNMG()
axfr_whitelist:addMask('1.1.1.1/32')
axfr_whitelist:addMask('2.2.2.2/32')
addAction(AndRule({OrRule({QTypeRule(dnsdist.AXFR), QTypeRule(dnsdist.IXFR)}), NotRule(NetmaskGroupRule(axfr_whitelist))}), RCodeAction(dnsdist.REFUSED))

And this works as far as controlling who can instigate a AXFR.

But that is secondary to the main problem which seems to be that dnsdist is preventing a successful AXFR for one of our zones.

If i remove dnsdist and bind pdns to public_interface:53 then the dig AXFR works every time. As soon as i put dnsdist back in front it fails. I can repeat this and every time i see the same behavior. For the other six smaller zones, the axfr works fine. For the larger zone i mentioned previously it fails 95% of the time.

I stripped the dnsdist configuration right back to just below in case i had done something silly but this still fails the axfr.

local onfe_ipaddr          = "xx.xx.xx.xx"

controlSocket("127.0.0.1")
setLocal("0.0.0.0:53", true, true)
setACL({"0.0.0.0/0", "::/0"})

onfe_dest = newNMG()
onfe_dest:addMask(onfe_ipaddr .. "/32")

axfr_whitelist = newNMG()
axfr_whitelist:addMask('xx.xx.xx.xx/32')
addAction(AndRule({OrRule({QTypeRule(dnsdist.AXFR), QTypeRule(dnsdist.IXFR)}), NotRule(NetmaskGroupRule(axfr_whitelist))}), RCodeAction(dnsdist.REFUSED))

newServer{address=onfe_ipaddr .. ":5353", source="eth1", name="authoritative", qps=1000000, order=1, useClientSubnet=true}

addAction(NetmaskGroupRule(onfe_dest, false), PoolAction(""))
setServerPolicy(firstAvailable)

I ran dnsdist in verbose mode but i couldn't see anything in logs during the failed axfr request. I don't think this has anything to do with the zone contents.. It works directly against pdns so there seems to be a bug in dnsdist. Please let me know what i can do to assist fix.

@rgacogne

This comment has been minimized.

Member

rgacogne commented Jul 6, 2017

I couldn't reproduce this issue, but that's not surprising given that it seems to only fail for specific zones. If you can't share a network trace, could you try to look at it and describe its content? Specifically the differences between what dnsdist receives from the server and what it sends back to the client would be very useful.

@rgacogne rgacogne added dnsdist and removed auth labels Jul 6, 2017

@rgacogne rgacogne added this to the dnsdist-1.2.0 milestone Jul 6, 2017

@rgacogne rgacogne referenced this issue Jul 6, 2017

Merged

dnsdist: Fix TCP short writes handling #5501

2 of 6 tasks complete
@rgacogne

This comment has been minimized.

Member

rgacogne commented Jul 6, 2017

I believe this issue is fixed by #5501.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment