Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process stuck when freeing an authentication packet #2274

Open
aend opened this issue Aug 16, 2018 · 5 comments
Open

process stuck when freeing an authentication packet #2274

aend opened this issue Aug 16, 2018 · 5 comments

Comments

@aend
Copy link

@aend aend commented Aug 16, 2018

Issue type

  • Defect - Crash or memory corruption.

Defect

Issue was seen only once until now:
the application stopped processing authentication request and a CPU core was busy at 100%.
Radiusd was stuck in this situation for about 2 hour and was forcefully restarted.

Full backtrace from LLDB or GDB

Stack trace generated while the application was running

#0  0x00007f7d69d6b935 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#1  0x00007f7d69d6b9f7 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#2  0x00000000004301ae in request_done (request=0x25f81d0, action=action@entry=2) at src/main/process.c:816
#3  0x0000000000431f9a in request_receive (ctx=ctx@entry=0x28cc3b0, listener=listener@entry=0x25a4c90, packet=0x28cc410, client=client@entry=0x25af260, fun=fun@entry=0x410b70 <rad_authenticate>) at src/main/process.c:1672
#4  0x000000000041c436 in auth_socket_recv (listener=0x25a4c90) at src/main/listen.c:1571
#5  0x000000000042ee5b in event_socket_handler (xel=<optimized out>, fd=<optimized out>, ctx=<optimized out>) at src/main/process.c:4619
#6  0x00007f7d6a67464c in fr_event_loop (el=0x24f5d10) at src/lib/event.c:649
#7  0x0000000000435b51 in radius_event_process () at src/main/process.c:5694
#8  0x000000000040fe2a in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:587

stack extracted when inspecting with GDB 5 minutes later

#0  0x00007f7d69d6b935 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#1  0x00007f7d69d6b9f7 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#2  0x00000000004301ae in request_done (request=0x25f81d0, action=action@entry=2) at src/main/process.c:816
#3  0x0000000000431f9a in request_receive (ctx=ctx@entry=0x28cc3b0, listener=listener@entry=0x25a4c90, packet=0x28cc410, client=client@entry=0x25af260, fun=fun@entry=0x410b70 <rad_authenticate>)
    at src/main/process.c:1672
#4  0x000000000041c436 in auth_socket_recv (listener=0x25a4c90) at src/main/listen.c:1571
#5  0x000000000042ee5b in event_socket_handler (xel=<optimized out>, fd=<optimized out>, ctx=<optimized out>) at src/main/process.c:4619
#6  0x00007f7d6a67464c in fr_event_loop (el=0x24f5d10) at src/lib/event.c:649
#7  0x0000000000435b51 in radius_event_process () at src/main/process.c:5694
#8  0x000000000040fe2a in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:587

request being processed by talloc_free

{number = 144117722, timestamp = 1534343974, data = 0x0, listener = 0x25a4c90, client = 0x0, packet = 0x25f7ee0, username = 0x25f9c10, password = 0x0, reply = 0x25f8380, config = 0x7f771001bbe0,  state_ctx = 0x26f77b0, state = 0x0, proxy_listener = 0x0, proxy = 0x0, proxy_reply = 0x0, home_server = 0x0, home_pool = 0x0, process = 0x430040 <request_done>, response_delay = {tv_sec = 0, tv_usec = 0},   timer_action = FR_ACTION_TIMER, ev = 0x0, handle = 0x410b70 <rad_authenticate>, rcode = RLM_MODULE_OK, module = 0x44e0d3 "", component = 0x454c45 "<REQUEST_CLEANUP_DELAY>", delay = 5,   master_state = REQUEST_COUNTED, child_state = REQUEST_DONE, child_pid = 140176639453248, root = 0x66b180 <main_config>, simul_max = 0, simul_count = 0, simul_mpp = 0, priority = RAD_LISTEN_AUTH,   in_request_hash = false, in_proxy_hash = false, num_proxied_requests = 0, num_proxied_responses = 0, server = 0x24b5c10 "auth", parent = 0x0, log = {func = 0x7f7d6a89ca70 <vradlog_request>,   lvl = L_DBG_LVL_OFF, indent = 0 '\000'}, options = 2, coa = 0x0, num_coa_requests = 0}
@alandekok
Copy link
Member

@alandekok alandekok commented Aug 16, 2018

You are running an old version of the server. If you can reproduce this with 3.0.17, then please re-open the issue.

@alandekok alandekok closed this Aug 16, 2018
@aend
Copy link
Author

@aend aend commented Aug 16, 2018

Sorry, I did not specify it, but the issue happened when using 3.0.17

@alandekok alandekok reopened this Aug 16, 2018
@alandekok
Copy link
Member

@alandekok alandekok commented Aug 16, 2018

It sounds like a memory issue. If it only happens once and isn't reproducible, then it's hard to track down.

Looking at the talloc code, it seems like there are situations where it can get stuck when something goes wrong. So that's bad.

@aend
Copy link
Author

@aend aend commented Aug 23, 2018

I had another instance of the issue. How can I help in debugging it ?

@alandekok
Copy link
Member

@alandekok alandekok commented Aug 23, 2018

It's difficult to track down, unfortunately. I'm not even sure where to begin, as we can't reproduce it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants