New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR 3.0.17] Reccurent crash of the server #2321

Open
gkoudjou opened this Issue Oct 3, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@gkoudjou

gkoudjou commented Oct 3, 2018

Issue type

  • Defect - Crash or memory corruption.

Defect

How to reproduce the issue

  • Compile the FR 3.0.17 with the fix detailled here => #2300
  • Configure FreeRadius and activate module sqlippool
  • Configure SQL module to use a PostgresSQL database
  • Configure rlm_perl to update information in the DNS (see below for the content of the script)
  • After some time, FR crash. See below for details

Output of [radiusd|freeradius] -X showing issue occurring

Wed Oct  3 02:05:57 2018 : Info: rlm_perl: STOP 47500204821140167030818085A0B455 xx.xx.xx.xx
Wed Oct  3 02:05:57 2018 : Info: rlm_perl: STOP 475002048211401680010AC08D50BB31 xx.xx.xx.xx
Wed Oct  3 02:05:57 2018 : Info: rlm_perl: DNS update succeeded. 
CAUGHT SIGNAL: Aborted
Calling: gdb -silent -x /appli/freeradius/panic.gdb /usr/sbin/radiusd 446 2>&1 | tee /var/radius/gdb-radiusd-446.log
Wed Oct  3 02:05:57 2018 : Info: rlm_perl: DNS update succeeded. 
Panic action exited with 0

gdb-radiusd-446.log

Full backtrace from LLDB or GDB

Here is just an extract. The full GDB file is attached.

#0  0x0000003fb78ac6cd in waitpid () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003fb783e609 in do_system () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000003fb783e940 in system () from /lib64/libc.so.6
No symbol table info available.
#3  0x00000033f3a0c7a1 in fr_fault (sig=<value optimized out>) at src/lib/debug.c:725
        disable = false
        cmd = "gdb -silent -x /appli/projects/IC1/freeradius/panic.gdb /usr/sbin/radiusd 446 2>&1 | tee /var/projects/IC1/radius/gdb-radiusd-446.log", '\000' <repeats 398 times>
        out = 0x7fffea9688f1 ".log"
        left = 403
        ret = <value optimized out>
        p = 0x33f3c3cb7f ".log"
        q = 0x0
        code = <value optimized out>
#4  <signal handler called>
No symbol table info available.
#5  0x0000003fb7832625 in raise () from /lib64/libc.so.6
No symbol table info available.
#6  0x0000003fb7833e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#7  0x0000003fb7870537 in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#8  0x0000003fb7875f4e in malloc_printerr () from /lib64/libc.so.6
No symbol table info available.
#9  0x0000003fb7876353 in malloc_consolidate () from /lib64/libc.so.6
No symbol table info available.
#10 0x0000003fb7879bd8 in _int_malloc () from /lib64/libc.so.6
No symbol table info available.
#11 0x0000003fb787ab55 in malloc () from /lib64/libc.so.6
No symbol table info available.
#12 0x00000033f360a96b in talloc_pool () from /usr/lib64/libtalloc.so.2
No symbol table info available.
#13 0x000000000041c1c8 in acct_socket_recv (listener=0x17faae0) at src/main/listen.c:1644

Here is an extract of the PERL script used to update the DNS if it can give informations where the bug might be : (data are anonymised)

sub dns_register() {
    my $label = $RAD_CHECK{'xxx-label-xxx'};
    my $address = $RAD_REQUEST{'Framed-IP-Address'};
    my $msisdn = $RAD_REQUEST{'Calling-Station-Id'};

    # Create the update packet for the primary zone.
    my $primary_update = new Net::DNS::Update($DNS_ZONE);
    $primary_update->push( pre => yxdomain($DNS_ZONE) );
    $primary_update->push( update => rr_del("${label}.${DNS_ZONE}. ${TTL} A") );
    $primary_update->push( update => rr_del("${label}.${DNS_ZONE}. ${TTL} TXT") );
    $primary_update->push( update => rr_add("${label}.${DNS_ZONE}. ${TTL} A ${address}") );
    $primary_update->push( update => rr_add("${label}.${DNS_ZONE}. ${TTL} TXT ${msisdn}") );
    $primary_update->push( additional => &tsig() );

    my $primary_reply = $resolver->send($primary_update);
    return &validate_reply($primary_reply, 'DNS_Update');
}

sub validate_reply {
    my $reply = $_[0];
    my $description = $_[1];
    if ($reply) {
        if ($reply->header->rcode eq 'NOERROR') {
            &log(L_INFO, "${description} succeeded.\n");
            return RLM_MODULE_OK;
        } else {
            &log(L_ERR, "${description} failed: " . $reply->header->rcode);
            return RLM_MODULE_FAIL;
        }
    }
    &log(L_ERR, "${description} failed: no reply.");
    return RLM_MODULE_FAIL;
}

Thanks for any help

@arr2036

This comment has been minimized.

Show comment
Hide comment
@arr2036

arr2036 Oct 3, 2018

Member

Nothing to do with Perl unless its corrupting memory somehow. Looks like another accounting request came in and hit a corrupted parent context during request alloc.

Member

arr2036 commented Oct 3, 2018

Nothing to do with Perl unless its corrupting memory somehow. Looks like another accounting request came in and hit a corrupted parent context during request alloc.

@gkoudjou

This comment has been minimized.

Show comment
Hide comment
@gkoudjou

gkoudjou Oct 3, 2018

Considering this hypothesis, do you have any way to fix it please ?

I notice the two following line in the GDB output but don't know how to interpret them. Could someone give a hint ?

cmd = "gdb -silent -x /appli/projects/IC1/freeradius/panic.gdb /usr/sbin/radiusd 446 2>&1 | tee /var/projects/IC1/radius/gdb-radiusd-446.log", '\000' <repeats 398 times>
src_ipaddr = {af = 2, ipaddr = {ip4addr = {s_addr = 381782794}, ip6addr = {__in6_u = {__u6_addr8 = "\n\213\301\026", '\000' <repeats 11 times>, __u6_addr16 = {35594, 5825, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {381782794, 0, 0, 0}}}}, prefix = 32 ' ', scope = 0}

What does this repeat 'xxx times' means ?

Here is my thread pool configuration :

thread pool {
        start_servers = 50
        max_servers = 512
        min_spare_servers = 20
        max_spare_servers = 50
#       max_queue_size = 65536
        max_requests_per_server = 0
        auto_limit_acct = no
}

gkoudjou commented Oct 3, 2018

Considering this hypothesis, do you have any way to fix it please ?

I notice the two following line in the GDB output but don't know how to interpret them. Could someone give a hint ?

cmd = "gdb -silent -x /appli/projects/IC1/freeradius/panic.gdb /usr/sbin/radiusd 446 2>&1 | tee /var/projects/IC1/radius/gdb-radiusd-446.log", '\000' <repeats 398 times>
src_ipaddr = {af = 2, ipaddr = {ip4addr = {s_addr = 381782794}, ip6addr = {__in6_u = {__u6_addr8 = "\n\213\301\026", '\000' <repeats 11 times>, __u6_addr16 = {35594, 5825, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {381782794, 0, 0, 0}}}}, prefix = 32 ' ', scope = 0}

What does this repeat 'xxx times' means ?

Here is my thread pool configuration :

thread pool {
        start_servers = 50
        max_servers = 512
        min_spare_servers = 20
        max_spare_servers = 50
#       max_queue_size = 65536
        max_requests_per_server = 0
        auto_limit_acct = no
}
@alandekok

This comment has been minimized.

Show comment
Hide comment
@alandekok

alandekok Oct 3, 2018

Member

Please try using the latest v3.0.x code. There have been many, many, fixes that have gone in since 3.0.17 was released. This may be a bug which has already been found and fixed. In that case, it's not a good use of our time to track it down && fix it again.

Member

alandekok commented Oct 3, 2018

Please try using the latest v3.0.x code. There have been many, many, fixes that have gone in since 3.0.17 was released. This may be a bug which has already been found and fixed. In that case, it's not a good use of our time to track it down && fix it again.

@gkoudjou

This comment has been minimized.

Show comment
Hide comment
@gkoudjou

gkoudjou Oct 3, 2018

Okay. I will build from the branch and give feedback.

Just before the issue is closed/put in standby : we updated the DNS via the rlm_perl module. We wonder if there are known issue about rlm_perl because there are many warning found in the log about this module being blocked. Plus, is it more appropriate to use this instead of a custom exec one ? (the DNS is updated in Start and Stop Acct)

Could someone have any ideas why we get such this and what would be the impacts of such warning ?

Wed Sep 26 10:25:15 2018 : WARNING: (66) WARNING: Module rlm_perl became unblocked

Thanks again

PS : we have the same warning on rlm_rest module

gkoudjou commented Oct 3, 2018

Okay. I will build from the branch and give feedback.

Just before the issue is closed/put in standby : we updated the DNS via the rlm_perl module. We wonder if there are known issue about rlm_perl because there are many warning found in the log about this module being blocked. Plus, is it more appropriate to use this instead of a custom exec one ? (the DNS is updated in Start and Stop Acct)

Could someone have any ideas why we get such this and what would be the impacts of such warning ?

Wed Sep 26 10:25:15 2018 : WARNING: (66) WARNING: Module rlm_perl became unblocked

Thanks again

PS : we have the same warning on rlm_rest module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment