Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process stuck when freeing an authentication packet #2274

Open
aend opened this Issue Aug 16, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@aend
Copy link

aend commented Aug 16, 2018

Issue type

  • Defect - Crash or memory corruption.

Defect

Issue was seen only once until now:
the application stopped processing authentication request and a CPU core was busy at 100%.
Radiusd was stuck in this situation for about 2 hour and was forcefully restarted.

Full backtrace from LLDB or GDB

Stack trace generated while the application was running

#0  0x00007f7d69d6b935 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#1  0x00007f7d69d6b9f7 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#2  0x00000000004301ae in request_done (request=0x25f81d0, action=action@entry=2) at src/main/process.c:816
#3  0x0000000000431f9a in request_receive (ctx=ctx@entry=0x28cc3b0, listener=listener@entry=0x25a4c90, packet=0x28cc410, client=client@entry=0x25af260, fun=fun@entry=0x410b70 <rad_authenticate>) at src/main/process.c:1672
#4  0x000000000041c436 in auth_socket_recv (listener=0x25a4c90) at src/main/listen.c:1571
#5  0x000000000042ee5b in event_socket_handler (xel=<optimized out>, fd=<optimized out>, ctx=<optimized out>) at src/main/process.c:4619
#6  0x00007f7d6a67464c in fr_event_loop (el=0x24f5d10) at src/lib/event.c:649
#7  0x0000000000435b51 in radius_event_process () at src/main/process.c:5694
#8  0x000000000040fe2a in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:587

stack extracted when inspecting with GDB 5 minutes later

#0  0x00007f7d69d6b935 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#1  0x00007f7d69d6b9f7 in _tc_free_internal () from /usr/lib64/libtalloc.so.2
#2  0x00000000004301ae in request_done (request=0x25f81d0, action=action@entry=2) at src/main/process.c:816
#3  0x0000000000431f9a in request_receive (ctx=ctx@entry=0x28cc3b0, listener=listener@entry=0x25a4c90, packet=0x28cc410, client=client@entry=0x25af260, fun=fun@entry=0x410b70 <rad_authenticate>)
    at src/main/process.c:1672
#4  0x000000000041c436 in auth_socket_recv (listener=0x25a4c90) at src/main/listen.c:1571
#5  0x000000000042ee5b in event_socket_handler (xel=<optimized out>, fd=<optimized out>, ctx=<optimized out>) at src/main/process.c:4619
#6  0x00007f7d6a67464c in fr_event_loop (el=0x24f5d10) at src/lib/event.c:649
#7  0x0000000000435b51 in radius_event_process () at src/main/process.c:5694
#8  0x000000000040fe2a in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:587

request being processed by talloc_free

{number = 144117722, timestamp = 1534343974, data = 0x0, listener = 0x25a4c90, client = 0x0, packet = 0x25f7ee0, username = 0x25f9c10, password = 0x0, reply = 0x25f8380, config = 0x7f771001bbe0,  state_ctx = 0x26f77b0, state = 0x0, proxy_listener = 0x0, proxy = 0x0, proxy_reply = 0x0, home_server = 0x0, home_pool = 0x0, process = 0x430040 <request_done>, response_delay = {tv_sec = 0, tv_usec = 0},   timer_action = FR_ACTION_TIMER, ev = 0x0, handle = 0x410b70 <rad_authenticate>, rcode = RLM_MODULE_OK, module = 0x44e0d3 "", component = 0x454c45 "<REQUEST_CLEANUP_DELAY>", delay = 5,   master_state = REQUEST_COUNTED, child_state = REQUEST_DONE, child_pid = 140176639453248, root = 0x66b180 <main_config>, simul_max = 0, simul_count = 0, simul_mpp = 0, priority = RAD_LISTEN_AUTH,   in_request_hash = false, in_proxy_hash = false, num_proxied_requests = 0, num_proxied_responses = 0, server = 0x24b5c10 "auth", parent = 0x0, log = {func = 0x7f7d6a89ca70 <vradlog_request>,   lvl = L_DBG_LVL_OFF, indent = 0 '\000'}, options = 2, coa = 0x0, num_coa_requests = 0}
@alandekok

This comment has been minimized.

Copy link
Member

alandekok commented Aug 16, 2018

You are running an old version of the server. If you can reproduce this with 3.0.17, then please re-open the issue.

@alandekok alandekok closed this Aug 16, 2018

@aend

This comment has been minimized.

Copy link
Author

aend commented Aug 16, 2018

Sorry, I did not specify it, but the issue happened when using 3.0.17

@alandekok alandekok reopened this Aug 16, 2018

@alandekok

This comment has been minimized.

Copy link
Member

alandekok commented Aug 16, 2018

It sounds like a memory issue. If it only happens once and isn't reproducible, then it's hard to track down.

Looking at the talloc code, it seems like there are situations where it can get stuck when something goes wrong. So that's bad.

@aend

This comment has been minimized.

Copy link
Author

aend commented Aug 23, 2018

I had another instance of the issue. How can I help in debugging it ?

@alandekok

This comment has been minimized.

Copy link
Member

alandekok commented Aug 23, 2018

It's difficult to track down, unfortunately. I'm not even sure where to begin, as we can't reproduce it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.