Recursor segmentation fault at startup #735

Closed
Habbie opened this Issue Apr 26, 2013 · 8 comments

Projects

None yet

1 participant

@Habbie
Member
Habbie commented Apr 26, 2013

Sometimes the recursor starts not at the first try. If this happened, there is segfault message in syslog. I need about 20 tries to force the error.

This error is not related to 3.5-rc. I saw it already in 3.4-pre.
[[BR]]

Apr  9 14:02:30 node26 pdns_recursor[32675]: PowerDNS recursor 3.5-rc5 (C) 2001-2013 PowerDNS.COM BV (Apr  9 2013, 10:21:53, gcc 4.3.4 [gcc-4_3-branch revision 152973]) starting up
Apr  9 14:02:30 node26 pdns_recursor[32675]: PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Apr  9 14:02:30 node26 pdns_recursor[32675]: Operating in 64 bits mode
Apr  9 14:02:30 node26 pdns_recursor[32675]: Reading random entropy from '/dev/urandom'
Apr  9 14:02:30 node26 pdns_recursor[32675]: Only allowing queries from: 0.0.0.0/0, ::/0
Apr  9 14:02:30 node26 pdns_recursor[32675]: Will not send queries to: 127.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 169.254.0.0/16, 192.168.0.0/16, 172.16.0.0/12, ::1/128, fe80::/10, 0.0.0.0, ::
Apr  9 14:02:30 node26 pdns_recursor[32675]: Enabling IPv6 transport for outgoing queries
Apr  9 14:02:30 node26 pdns_recursor[32675]: Inserting rfc 1918 private space zones
Apr  9 14:02:30 node26 pdns_recursor[32675]: Calling daemonize, going to background
Apr  9 14:02:30 node26 pdns_recursor[32676]: Set effective group id to 2
Apr  9 14:02:30 node26 pdns_recursor[32676]: Set effective user id to 2
Apr  9 14:02:30 node26 pdns_recursor[32676]: Launching 6 threads
Apr  9 14:02:30 node26 pdns_recursor[32676]: Done priming cache with root hints
Apr  9 14:02:30 node26 pdns_recursor[32676]: Done priming cache with root hints
Apr  9 14:02:30 node26 pdns_recursor[32676]: Done priming cache with root hints
Apr  9 14:02:30 node26 pdns_recursor[32676]: Done priming cache with root hints
Apr  9 14:02:30 node26 pdns_recursor[32676]: Done priming cache with root hints
Apr  9 14:02:30 node26 pdns_recursor[32676]: Done priming cache with root hints
Apr  9 14:02:30 node26 pdns_recursor[32676]: Refreshed . records
Apr  9 14:02:30 node26 pdns_recursor[32676]: Refreshed . records
Apr  9 14:02:30 node26 pdns_recursor[32676]: Refreshed . records
Apr  9 14:02:30 node26 pdns_recursor[32676]: Refreshed . records
Apr  9 14:02:30 node26 pdns_recursor[32676]: Refreshed . records
Apr  9 14:02:30 node26 pdns_recursor[32676]: Refreshed . records
Apr  9 14:02:31 node26 kernel: [22197556.564621] pdns_recursor[32678]: segfault at 3b5 ip 000000000055e378 sp 00007f86fa844400 error 4 in pdns_recursor[400000+1c9000]

'''Config:'''

aaaa-additional-processing=off
allow-from=0.0.0.0/0, ::/0
disable-edns=yes
disable-edns-ping=yes
local-address=###.###.###.###
local-port=53
log-common-errors=no
logging-facility=0
max-cache-entries=6000000
max-packetcache-entries=2000000
max-mthreads=2048
max-tcp-per-client=10
query-local-address6=####:####:####:####:####:####:####:####
server-id=DNS
setgid=daemon
setuid=daemon
threads=6
version-string=DNS

[[BR]]
'''make:'''

/usr/bin/make 'OPTFLAGS=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables' LUA=1 STATIC=semi LUA_CPPFLAGS_CONFIG=-I/usr/include/ LUA_LIBS_CONFIG=-llua
@Habbie Habbie was assigned Apr 26, 2013
@Habbie Habbie closed this Apr 26, 2013
@Habbie
Member
Habbie commented Apr 26, 2013

Author: anon
And it seems that it happens only if I restart a busy process.

@Habbie
Member
Habbie commented Apr 26, 2013

Author: ahu
Winfried,

Can you recompile without the various -f flags? I don't know what these do and they might be obscuring the issue.

Can you raise -O2 to -O3?

Thanks.

@Habbie
Member
Habbie commented Apr 26, 2013

Author: ahu
Ok, we double checked and think it unlikely your -f options are causing the issue, but we'd still appreciate a run without them. Thanks!

@Habbie
Member
Habbie commented Apr 26, 2013

Author: anon
'''New make:'''

/usr/bin/make OPTFLAGS=-O3 LUA=1 STATIC=semi LUA_CPPFLAGS_CONFIG=-I/usr/include/ LUA_LIBS_CONFIG=-llua

'''Same crash:'''

Apr  9 16:15:37 node26 kernel: [22205541.090467] pdns_recursor[3431]: segfault at 75 ip 0000000000561778 sp 00007f71e1effdc0 error 4 in pdns_recursor[400000+1cd000]
@Habbie
Member
Habbie commented Apr 26, 2013

Author: ahu
Thank you. Can you recompile with -ggdb and after the crash attempt to do:
echo addr2line -e ./pdns_recursor:
7f71e1effdc0

(the number from 'sp'). Alternatively, launch pdns in gdb:

gdb --args ./pdns_recursor --daemon=no

On crash, enter 'bt'.

Thanks again for testing!

@Habbie
Member
Habbie commented Apr 26, 2013

Author: ahu
(not the sp number, try the ip number, but gdb works best!)

@Habbie
Member
Habbie commented Apr 26, 2013

Author: anon
We see many messages like this:

...
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
Apr 09 16:57:05 startDoResolve problem: Making a socket for resolver: Bad file descriptor
...

Bert Hubert wrote:

> Can you increase the number of available file descriptors?
> ulimit -n 16384 or so.

This solved the "file descriptor" problem. Thanks!

> Can you apply http://wiki.powerdns.com/trac/changeset/3153 and retry?

Thus, the segfaults are gone. Great!

@Habbie
Member
Habbie commented Apr 26, 2013

Author: ahu
Fixed in 3153, good catch Winfried!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment