Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault at 58 ip 00007f95d54ddabd sp 00007f95c5ba50f0 error 4 in libpython2.7.so.1.0 #408

Closed
avdoshkin opened this issue Aug 20, 2013 · 45 comments
Labels
defect category: a defect or misbehaviour v3.x.x meta: relates to the v3.0.x branch v4.0.x meta: relates to the v4.0.x branch

Comments

@avdoshkin
Copy link

gdb /home/radius/sbin/radiusd core.radiusd.15180_ 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/radius/sbin/radiusd...done.
[New LWP 15208]
[New LWP 15210]
[New LWP 15209]
[New LWP 15180]
[New LWP 15206]
[New LWP 15207]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/radius/sbin/radiusd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f95d54ddabd in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
(gdb) thread apply all bt

Thread 6 (Thread 0x7f95c63a8700 (LWP 15207)):
#0  0x00007f95d6b64fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000042273c in request_handler_thread (arg=0x2dbeea0) at threads.c:499
#2  0x00007f95d6b5ee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f95d606fccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f95c6ba9700 (LWP 15206)):
#0  0x00007f95d6b64fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000042273c in request_handler_thread (arg=0x2dbee60) at threads.c:499
#2  0x00007f95d6b5ee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f95d606fccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f95d71c9700 (LWP 15180)):
#0  0x00007f95d6069033 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f95d6d9bd10 in fr_event_loop (el=0x2f07da0) at event.c:391
#2  0x0000000000409524 in main (argc=<optimized out>, argv=<optimized out>) at radiusd.c:421

Thread 3 (Thread 0x7f95c53a6700 (LWP 15209)):
#0  0x00007f95d6b64fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000042273c in request_handler_thread (arg=0x2ce0700) at threads.c:499
#2  0x00007f95d6b5ee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f95d606fccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f95c4ba5700 (LWP 15210)):
#0  0x00007f95d6b64fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000042273c in request_handler_thread (arg=0x2cc82d0) at threads.c:499
#2  0x00007f95d6b5ee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f95d606fccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f95c5ba7700 (LWP 15208)):
#0  0x00007f95d54ddabd in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#1  0x00007f95d54ad605 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#2  0x00007f95d54ad7bd in ?? () from /usr/lib/libpython2.7.so.1.0
#3  0x00007f95d5591e83 in PyObject_Call () from /usr/lib/libpython2.7.so.1.0
#4  0x00007f95d55761cf in ?? () from /usr/lib/libpython2.7.so.1.0
#5  0x00007f95d5591e83 in PyObject_Call () from /usr/lib/libpython2.7.so.1.0
#6  0x00007f95d5582720 in ?? () from /usr/lib/libpython2.7.so.1.0
#7  0x00007f95d5592f08 in ?? () from /usr/lib/libpython2.7.so.1.0
#8  0x00007f95d5591e83 in PyObject_Call () from /usr/lib/libpython2.7.so.1.0
#9  0x00007f95d54e146d in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#10 0x00007f95d54e35eb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#11 0x00007f95d54e35eb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#12 0x00007f95d54ad605 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#13 0x00007f95d54ad7bd in ?? () from /usr/lib/libpython2.7.so.1.0
#14 0x00007f95d5591e83 in PyObject_Call () from /usr/lib/libpython2.7.so.1.0
#15 0x00007f95d55761cf in ?? () from /usr/lib/libpython2.7.so.1.0
---Type <return> to continue, or q <return> to quit--- 
#16 0x00007f95d5591e83 in PyObject_Call () from /usr/lib/libpython2.7.so.1.0
#17 0x00007f95d555fe94 in ?? () from /usr/lib/libpython2.7.so.1.0
#18 0x00007f95d54e0fd3 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#19 0x00007f95d54e35eb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#20 0x00007f95d54ad605 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#21 0x00007f95d54ad7bd in ?? () from /usr/lib/libpython2.7.so.1.0
#22 0x00007f95d5591e83 in PyObject_Call () from /usr/lib/libpython2.7.so.1.0
#23 0x00007f95d55d7a40 in PyObject_CallFunctionObjArgs () from /usr/lib/libpython2.7.so.1.0
#24 0x00007f95d595df81 in do_python (worker=1, funcname=0x7f95d595e732 "post_auth", pFunc=0x2e256e0, request=0x2e382a0, inst=<optimized out>) at rlm_python.c:485
#25 do_python (request=0x2e382a0, pFunc=0x2e256e0, funcname=0x7f95d595e732 "post_auth", worker=1, inst=<optimized out>) at rlm_python.c:393
#26 0x000000000041e1c1 in call_modsingle (request=0x2e382a0, component=7, sp=<optimized out>) at modcall.c:304
#27 modcall (component=7, c=0x2e9c830, request=<optimized out>) at modcall.c:684
#28 0x000000000041c9a3 in indexed_modcall (comp=7, idx=8, request=0x2e382a0) at modules.c:750
#29 0x00000000004127d5 in dhcp_process (request=0x2e382a0) at dhcpd.c:294
#30 0x000000000042bf5e in radius_handle_request (request=0x2e382a0, fun=0x412740 <dhcp_process>) at event.c:3817
#31 0x0000000000422aef in request_handler_thread (arg=0x2ce06c0) at threads.c:544
#32 0x00007f95d6b5ee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#33 0x00007f95d606fccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#34 0x0000000000000000 in ?? ()
@arr2036
Copy link
Member

arr2036 commented Aug 20, 2013

FreeRADIUS version? Python sub version?

@avdoshkin
Copy link
Author

Python 2.7.3, radiusd: FreeRADIUS Version 2.2.1, for host x86_64-unknown-linux-gnu, built on Aug 20 2013 at 15:37:04

@avdoshkin
Copy link
Author

What other information do you need? First time post core dump.

@arr2036
Copy link
Member

arr2036 commented Aug 20, 2013

Is it reproducible easily? Does it occur when running single threaded?

@arr2036
Copy link
Member

arr2036 commented Aug 20, 2013

Well I have no idea what's going on, it's occurring deep inside the python code.

@avdoshkin
Copy link
Author

Not easily reproducible, played only under load. On the stand mode radiusd-X does not play. How else can I help?

@avdoshkin
Copy link
Author

radiusd: FreeRADIUS Version 2.2.0. No such problems, months are running under load.

@avdoshkin
Copy link
Author

Sorry, this bug will be fixed?

@arr2036
Copy link
Member

arr2036 commented Aug 22, 2013

It's not clear why it's happening. It seems like it's the result of improper locking, but the current scheme being used should work, unless there's some additional python magic i've missed.

The TLS macros are new as well so there could be a bug in those. Unfortunately there are three different flavours so it's difficult to test them all on one platform.

Could you add:

RDEBUG("New threadstate %p", (void *) my_thread_state);

Just above:

if (!my_thread_state) {

(Line 416 rlm_python.c)

rebuild and run the server with -x -f -l stdout

The number of messages should equal the initial starting size of your thread pool, and you not increase on subsequent requests unless the server is under load.

The address should change each time as well.

@avdoshkin
Copy link
Author

Sorry, there must be so?

  1. https://github.com/FreeRADIUS/freeradius-server/blob/v2.x.x/src/modules/rlm_python/rlm_python.c
  2. if (!my_thread_state) {
    //add line
    RDEBUG("New threadstate %p", (void *) my_thread_state);
    //add line
    my_thread_state = PyThreadState_New(inst->main_thread_state->interp);
    if (!my_thread_state) {
    radlog(L_ERR, "Failed initialising local PyThreadState on first run");
    PyGILState_Release(gstate);
    return RLM_MODULE_FAIL;
    }
  3. The problem is reproduced on alert server in 30 minutes. I use rlm_python and dhcp.
  4. The load on the server is not a big battle, you can start-x-f-l stdout

@avdoshkin
Copy link
Author

So should write to the log [python] New threadstate (nil)?

@arr2036
Copy link
Member

arr2036 commented Aug 22, 2013

On 22 Aug 2013, at 17:00, avdoshkin notifications@github.com wrote:

So should write to the log [python] New threadstate (nil)?

Sorry that should be above if (!my_thread_state)


Reply to this email directly or view it on GitHub.

@avdoshkin
Copy link
Author

if (worker) {
PyThreadState *my_thread_state;
my_thread_state = fr_thread_local_init(local_thread_state, do_python_cleanup);
if (!my_thread_state) {
RDEBUG("New threadstate %p", (void *) my_thread_state);
my_thread_state = PyThreadState_New(inst->main_thread_state->interp);
if (!my_thread_state) {
RDEBUG("New threadstate %p", (void *) my_thread_state);
radlog(L_ERR, "Failed initialising local PyThreadState on first run");
PyGILState_Release(gstate);
return RLM_MODULE_FAIL;
}
ret = fr_thread_local_set(local_thread_state, my_thread_state);
if (ret != 0) {
radlog(L_ERR, "Failed storing PyThreadState in TLS: %s", strerror(ret));
PyThreadState_Clear(my_thread_state);
PyThreadState_Delete(my_thread_state);
PyGILState_Release(gstate);
return RLM_MODULE_FAIL;
}
}

  1. gdb http://pastebin.com/kSKPDfbH
  2. stdout http://pastebin.com/vJ3Xd8VM

@arr2036
Copy link
Member

arr2036 commented Aug 22, 2013

No... Just use this.

    gstate = PyGILState_Ensure();
    if (worker) {
        PyThreadState *my_thread_state;
        my_thread_state = fr_thread_local_init(local_thread_state, do_python_cleanup);
        if (!my_thread_state) {
            my_thread_state = PyThreadState_New(inst->main_thread_state->interp);
            RDEBUG("New threadstate %p", (void *) my_thread_state);
            if (!my_thread_state) {
                radlog(L_ERR, "Failed initialising local PyThreadState on first run");
                PyGILState_Release(gstate);
                return RLM_MODULE_FAIL;
            }

            ret = fr_thread_local_set(local_thread_state, my_thread_state);
            if (ret != 0) {
                radlog(L_ERR, "Failed storing PyThreadState in TLS: %s", strerror(ret));
                PyThreadState_Clear(my_thread_state);
                PyThreadState_Delete(my_thread_state);
                PyGILState_Release(gstate);
                return RLM_MODULE_FAIL;
            }
        }
        RDEBUG("Using threadstate %p", (void *) my_thread_state);
        prev_thread_state = PyThreadState_Swap(my_thread_state);    /* Swap in our local thread state */
    }

@avdoshkin
Copy link
Author

What else to do to find the error?

@arr2036
Copy link
Member

arr2036 commented Aug 23, 2013

You've given me GDB logs but not the logs from the server... As the error occurs within the python code, and as the thread state is not on the stack at the time, they are useless.

I need the logs from the server itself...

@avdoshkin
Copy link
Author

@avdoshkin
Copy link
Author

I can help with something else?

@arr2036
Copy link
Member

arr2036 commented Aug 23, 2013

OK. That debug output is helpful. You notice the thread state is the same for every request? I would expect that to change.

It's probably the pthread variant of TLS (or more, the wrappers around it). Could you edit src/include/autoconf.h, and change:

/* #undef HAVE_THREAD_TLS */

to

#define HAVE_THREAD_TLS 1

Then:

make clean
make

This should enable compiler support for TLS instead of relying on pthread. If this works, then it would appear to be a bug in the src/includes/threads.h file which should be relatively easy to fix.

@avdoshkin
Copy link
Author

Now do, always happy to help!

@avdoshkin
Copy link
Author

In file https://github.com/FreeRADIUS/freeradius-server/blob/v2.x.x/src/include/autoconf.h.in
change:

/* Define if the compiler supports __thread */
#undef HAVE_THREAD_TLS

changed to:
#define HAVE_THREAD_TLS 1

ok?

@arr2036
Copy link
Member

arr2036 commented Aug 23, 2013

On 23 Aug 2013, at 15:15, Igor Avdoshkin notifications@github.com wrote:

In file https://github.com/FreeRADIUS/freeradius-server/blob/v2.x.x/src/include/autoconf.h.in
change:

/* Define if the compiler supports __thread */
#undef HAVE_THREAD_TLS

changed to:
#define HAVE_THREAD_TLS 1

ok?

No, src/include/autoconf.h

autoconf.h will only appear after you have run ./configure

@avdoshkin
Copy link
Author

Good!

@avdoshkin
Copy link
Author

Sorry, you're with someone of the city and the country?

@arr2036
Copy link
Member

arr2036 commented Aug 23, 2013

I didn't understand your question. If you're using a translator please post the message in your original language as well.

@avdoshkin
Copy link
Author

В какой стране живете?

@avdoshkin
Copy link
Author

After the hotfix is working freeradius, wait for the night, in the morning we'll see.

@arr2036
Copy link
Member

arr2036 commented Aug 23, 2013

Excellent. :) and England.

@avdoshkin
Copy link
Author

@avdoshkin
Copy link
Author

Ideas for solving the problem will be?

@arr2036
Copy link
Member

arr2036 commented Aug 26, 2013

What's odd is the thread state is still the same. I have some holiday time today, so i'll look at this myself.

@avdoshkin
Copy link
Author

Good! How will I test?

@arr2036
Copy link
Member

arr2036 commented Aug 26, 2013

On 26 Aug 2013, at 10:37, Igor Avdoshkin notifications@github.com wrote:

Good! How will I test? If you do not have conditions like mine?

I really doubt it's a product of being used as a DHCP server. There were some modifications in between 2.2.0 and current HEAD to support the equivalent of thread local storage within Python and they're probably what's causing the issue.

I can send parallel requests which will probably trigger the issue sooner.

-Arran

@avdoshkin
Copy link
Author

When I test your ideas?

@arr2036
Copy link
Member

arr2036 commented Aug 26, 2013

When I have some code. Seeing as I had to work through my holiday, I guess it won't be today. You are welcome to try and tackle the problem in the interim.

@avdoshkin
Copy link
Author

Hi! What's new found in the code?

@arr2036
Copy link
Member

arr2036 commented Sep 16, 2013

Temporary fix is in v2.x.x branch now.

@avdoshkin
Copy link
Author

Hi! Check?

@arr2036
Copy link
Member

arr2036 commented Sep 16, 2013

On 16 Sep 2013, at 17:09, Igor Avdoshkin notifications@github.com wrote:

Hi! Check?

Yes, thanks.

@avdoshkin
Copy link
Author

On a production server modules(python,dhcp) running!

@arr2036
Copy link
Member

arr2036 commented Sep 17, 2013

Cool thanks for the feedback :)

@alandekok
Copy link
Member

Seems to be fixed.

@arr2036 arr2036 reopened this Apr 10, 2014
@arr2036
Copy link
Member

arr2036 commented Apr 10, 2014

No, it's not.

@arr2036
Copy link
Member

arr2036 commented Apr 10, 2014

It was 'fixed' by setting the THREAD_UNSAFE flag, the underlying cause isn't clear.

@arr2036 arr2036 added v4.0.x meta: relates to the v4.0.x branch and removed v2.x.x labels Jun 23, 2015
@arr2036 arr2036 added this to the 3.1.0 release milestone Jun 23, 2015
@arr2036 arr2036 closed this as completed May 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect category: a defect or misbehaviour v3.x.x meta: relates to the v3.0.x branch v4.0.x meta: relates to the v4.0.x branch
Projects
None yet
Development

No branches or pull requests

3 participants