Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.7.0 Product crash each 20 seconds with "Floating point exception" #11383

Closed
dtouzeau opened this issue Mar 1, 2022 · 10 comments · Fixed by #11546
Closed

v1.7.0 Product crash each 20 seconds with "Floating point exception" #11383

dtouzeau opened this issue Mar 1, 2022 · 10 comments · Fixed by #11546

Comments

@dtouzeau
Copy link

dtouzeau commented Mar 1, 2022

  • Program: dnsdist
  • Issue type: Bug report

Short description

Product crash each 20 seconds with "Floating point exception"

Environment

  • Operating system: Debian 10
  • Software version: 1.7.0 or dnsdist-0.0.23469.0.master.g5c9086aa11
  • Software source: Compiled yourself

Steps to reproduce

No lua used.

  1. running dnsdist -C /etc/dnsdist.conf --disable-syslog

Actual behaviour

Product crash without giving info, nothing in log just "Floating point exception" when running from console

ACL allowing queries from: 10.0.0.0/8, 127.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Console ACL allowing connections from: 0.0.0.0/0
Webserver launched on 127.0.0.1:5600
Warning, this configuration can use more than 10155 file descriptors, web server and console connections not included, and the current limit is 1024.
Accepting control connections on 127.0.0.1:3199
You can increase this value by using LimitNOFILE= in the systemd unit file or ulimit.
Marking downstream DNS20-5 (192.168.1.118:53) as 'up'
Marking downstream DNS16-3 (192.168.90.10:53) as 'up'
Marking downstream DNS17-4 (192.168.92.99:53) as 'up'
Marking downstream DNS12-1 (8.8.8.8:53) as 'up'
Marking downstream DNS12-1 (8.8.4.4:53) as 'up'
Marking downstream DNS14-2 (1.1.1.1:53) as 'up'
Marking downstream DNS14-2 (1.0.0.1:53) as 'up'
Marking downstream dns1 (8.8.8.8:53) as 'up'
Marking downstream dns2 (1.1.1.1:53) as 'up'
> Error while retrieving the security update for version dnsdist-0.0.23469.0.master.g5c9086aa11: Unable to get a valid Security Status update
Not validating response for security status update, this is a non-release version.
Floating point exception

Configuration


addLocal('127.0.0.1',{reusePort=true,cpus={0}, numberOfShards=16 })
addLocal('127.0.0.1',{reusePort=true,cpus={1}, numberOfShards=16 })
addLocal('127.0.0.1',{reusePort=true,cpus={2}, numberOfShards=16 })
addLocal('127.0.0.1',{reusePort=true,cpus={3}, numberOfShards=16 })
addLocal('0.0.0.0',{reusePort=true,cpus={0}, numberOfShards=16 })
addLocal('0.0.0.0',{reusePort=true,cpus={1}, numberOfShards=16 })
addLocal('0.0.0.0',{reusePort=true,cpus={2}, numberOfShards=16 })
addLocal('0.0.0.0',{reusePort=true,cpus={3}, numberOfShards=16 })
-- * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-- Forged Hosts
addAction('vcenter.touzeau.maison', SpoofAction('192.168.1.251'))
addAction("vcenter", SpoofCNAMEAction("vcenter.touzeau.maison"))
-- * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-- ACLS BEGIN -----------------------------------------
-- *****************************************************************
-- Rule [ touzeau.maison ] Type: 1

-- Group touzeau domains (19)
selector5 = AndRule{ selector5, OrRule{ RegexRule('touzeau.maison') } }
-- Group touzeau domains (19)
DSTDOM5 = newSuffixMatchNode()
DSTDOM5:add(newDNSName("touzeau.maison"))
DSTDOM5:add(newDNSName("touzeau.maison."))
selector5 = AndRule{ selector5, SuffixMatchNodeRule(DSTDOM5) }
-- BUILD_ACLS_LB_SERVERS
-- Group 192.168.1.118 (20)
-- Cache enabled = 0
newServer({address="192.168.1.118", name="DNS20-5", qps=1, pool="Pool5"})
addAction(selector5,  PoolAction("Pool5"))
-- *****************************************************************
-- Rule [ To Public Google DNS ] Type: 1

-- Group domain to query (11)
selector1 = AndRule{ selector1, OrRule{ RegexRule('google.com'), RegexRule('googlebot.com'), RegexRule('googlevideo.com'), RegexRule('kaspersky.com'), RegexRule('kaspersky-labs.com') } }
-- Group domain to query (11)
DSTDOM1 = newSuffixMatchNode()
DSTDOM1:add(newDNSName("google.com"))
DSTDOM1:add(newDNSName("google.com."))
DSTDOM1:add(newDNSName("googlebot.com"))
DSTDOM1:add(newDNSName("googlebot.com."))
DSTDOM1:add(newDNSName("googlevideo.com"))
DSTDOM1:add(newDNSName("googlevideo.com."))
DSTDOM1:add(newDNSName("kaspersky.com"))
DSTDOM1:add(newDNSName("kaspersky.com."))
DSTDOM1:add(newDNSName("kaspersky-labs.com"))
DSTDOM1:add(newDNSName("kaspersky-labs.com."))
selector1 = AndRule{ selector1, SuffixMatchNodeRule(DSTDOM1) }
-- BUILD_ACLS_LB_SERVERS
-- Group Google Public DNS (12)
-- Cache enabled = 1
newServer({address="8.8.8.8", name="DNS12-1", qps=1, pool="Pool1"})
newServer({address="8.8.4.4", name="DNS12-1", qps=1, pool="Pool1"})
addAction(selector1,  PoolAction("Pool1"))
-- *****************************************************************
-- Rule [ Explicit rule for the FireWall DNS ] Type: 1
-- Group Artica DNS Firewall (13)
-- Selector, src IP addresses
SRCIP2 = newNMG()
SRCIP2:addMask('192.168.92.99/32')
selector2 = AndRule{ selector2, NetmaskGroupRule(SRCIP2) }
-- BUILD_ACLS_LB_SERVERS
-- Group Cloudflare Public DNS servers (14)
-- Cache enabled = 0
newServer({address="1.1.1.1", name="DNS14-2", qps=1, pool="Pool2"})
newServer({address="1.0.0.1", name="DNS14-2", qps=1, pool="Pool2"})
addAction(selector2,  PoolAction("Pool2"))
-- *****************************************************************
-- Rule [ articatech.int to the Active Directory server ] Type: 1

-- Group internal domains (15)
selector3 = AndRule{ selector3, OrRule{ RegexRule('articatech.int') } }
-- Group internal domains (15)
DSTDOM3 = newSuffixMatchNode()
DSTDOM3:add(newDNSName("articatech.int"))
DSTDOM3:add(newDNSName("articatech.int."))
selector3 = AndRule{ selector3, SuffixMatchNodeRule(DSTDOM3) }
-- BUILD_ACLS_LB_SERVERS
-- Group Active Directory DNS server (16)
-- Cache enabled = 0
newServer({address="192.168.90.10", name="DNS16-3", qps=1, pool="Pool3"})
addAction(selector3,  PoolAction("Pool3"))
-- *****************************************************************
-- Rule [ Use FireWall DNS for all ] Type: 1

selector4 = AndRule{ selector4, makeRule("0.0.0.0/0") }
-- BUILD_ACLS_LB_SERVERS
-- Group FireWall DNS (17)
-- Cache enabled = 1
newServer({address="192.168.92.99", name="DNS17-4", qps=1, pool="Pool4"})
addAction(selector4,  PoolAction("Pool4"))
-- ACLS END -----------------------------------------
-- 0 Restrictions
setACL({'192.168.0.0/16','10.0.0.0/8','172.16.0.0/12','127.0.0.0/8'})
webserver("127.0.0.1:5600")
setKey("uPDd2yOtg16DT8r71fZ5BNOwTuuWAGNtrcv4g8ovsYE=")
setWebserverConfig({password="",apiKey="$scrypt$ln=10,p=1,r=8$i1RTaZzCtsuG0NaCQ8UiHA==$M5TCdd96U7QPKfH8jIJvEtDSCewXjxXNir8ZmH8RI5Y=",acl="127.0.0.1,10.10.1.33,192.168.1.1,10.1.14.136" })
newServer({address="8.8.8.8", useClientSubnet=true, name="dns1", pool="defaults",qps=10}) 
newServer({address="1.1.1.1", useClientSubnet=true, name="dns2", pool="defaults", qps=10}) 
local dbr = dynBlockRulesGroup()
dbr:setQueryRate(30, 10, "Exceeded query rate", 3000)
dbr:setRCodeRate(DNSRCode.NXDOMAIN, 20, 10, "Exceeded NXD rate", 3000)
dbr:setRCodeRate(DNSRCode.SERVFAIL, 20, 10, "Exceeded ServFail rate", 300)
dbr:setQTypeRate(DNSQType.ANY, 5, 10, "Exceeded ANY rate", 300)
dbr:setResponseByteRate(10000, 10, "Exceeded resp BW rate", 300)
controlSocket('127.0.0.1:3199')
setKey("uPDd2yOtg16DT8r71fZ5BNOwTuuWAGNtrcv4g8ovsYE=")
setConsoleACL('0.0.0.0/0')
addAction('google.com', SpoofAction('216.239.38.12'))
setServerPolicy(leastOutstanding)
addAction({'touzeau.biz', 'touzeau.biz.'}, PoolAction("defaults"));
setECSOverride(true)
setECSSourcePrefixV4(32)
setECSSourcePrefixV6(128)
addAction("192.168.0.0/16", PoolAction("defaults"))
addAction("10.0.0.0/8", PoolAction("defaults"))
addAction("172.16.0.0/12", PoolAction("defaults"))
addAction("127.0.0.0/8", PoolAction("defaults"))
addAction(PoolAvailableRule(""), PoolAction(""))
addAction(MaxQPSIPRule(5, 32, 48), DelayAction(100))
PacketCache1 = newPacketCache(100000, {maxTTL=100000, minTTL=0, temporaryFailureTTL=60, staleTTL=60, dontAge=false})
getPool("Pool1"):setCache(PacketCache1)
PacketCache4 = newPacketCache(10000, {maxTTL=10000, minTTL=0, temporaryFailureTTL=60, staleTTL=60, dontAge=false})
getPool("Pool4"):setCache(PacketCache4)
pcdefaults = newPacketCache(0, {maxTTL=172800, minTTL=3600, temporaryFailureTTL=3600, staleTTL=60, dontAge=false})
@Habbie
Copy link
Member

Habbie commented Mar 1, 2022

With your config, but no traffic, on version 1.8.0-alpha0.557.master.g5c9086aa1 or 1.7.0, I cannot reproduce the problem.

  • Are you comfortable with using a debugger like gdb? It would be great to know where exactly in dnsdist this is happening.
  • Do you think you can see if you can make it fail with a smaller config?

@dtouzeau
Copy link
Author

dtouzeau commented Mar 1, 2022

Hi
Using gdb

Thread 43 "dnsdist/main" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fff56ffd700 (LWP 8537)]
0x000055555568d341 in DNSDistPacketCache::purgeExpired (this=0x555555e4cf40, upTo=0, now=now@entry=1646160324) at dnsdist-cache.cc:293
293	dnsdist-cache.cc: No such file or directory.
(gdb) Killed

@omoerbeek
Copy link
Member

I can reproduce with this extra config line, which is missing form the posted config

getPool(""):setCache(pcdefaults)

Smells like a divide by zero

@jsoref
Copy link
Contributor

jsoref commented Mar 1, 2022

This has to be this:

const size_t maxPerShard = upTo / d_shardCount;

@dtouzeau
Copy link
Author

dtouzeau commented Mar 1, 2022

if i add this
getPool(""):setCache(pcdefaults)

This should fix the issue ?

@omoerbeek
Copy link
Member

omoerbeek commented Mar 1, 2022

if i add this getPool(""):setCache(pcdefaults)

This should fix the issue ?

NO, I'm saying I can reproduce the issue. Your posted config is missing a line to provoke the issue.

@omoerbeek
Copy link
Member

omoerbeek commented Mar 1, 2022

The issue is that you are creating a packet cache with zero entries. That leads to a cache with 0 shards and a division by zero. Dnsdist warns on startup:

The number of entries (0) in the packet cache is smaller than the number of shards (20), decreasing the number of shards to 0

@dtouzeau
Copy link
Author

dtouzeau commented Mar 1, 2022

Yes i see that...
pcdefaults = newPacketCache(0,...
This did not make sense, i will fix it.. Ok not a bug from you but a watchdog can consolidate the code

@dtouzeau
Copy link
Author

dtouzeau commented Mar 1, 2022

Confirmed, adding newPacketCache(1000,... fix the issue, service is stable
thanks

@dtouzeau dtouzeau closed this as completed Mar 1, 2022
@omoerbeek
Copy link
Member

This is still something that should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants