Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ctrl-C doesnt kill off server in radiusd -X #1604
Comments
|
What version? Fine here on v3.1.x HEAD. |
|
3.1.x HEAD - i can CTRL-C when its starting up...so theres some point alan On 6 May 2016 at 10:40, Matthew Newton notifications@github.com wrote:
|
|
git bisect? :-) |
|
It will now always gracefully shut down, which means it calls the detach methods of the modules, closes open file descriptors, frees memory etc... This is so triggers work correctly. If you run it under a debugger and send it a SIGINT you should be able to see where it's hanging. Probably in a module. |
|
Nah, no need to go hunting, I know what the general cause probably is. -m was removed fairly recently. It's not an issue with removing -m and enabling its functionality by default, just that this has probably exposed a pre-existing issue. |
|
...go on... ;-) On 6 May 2016 at 14:23, Arran Cudbard-Bell notifications@github.com wrote:
|
|
That's about as specific as it gets :p If you can see where it's hanging we can probably fix it. |
|
detail (/dev/shm/detail): Polling for detail file Program received signal SIGTERM, Terminated. Program received signal SIGTERM, Terminated. Program received signal SIGTERM, Terminated. detail (/dev/shm/detail): Polling for detail file Program received signal SIGTERM, Terminated. Program received signal SIGTERM, Terminated. Program received signal SIGTERM, Terminated. detail (/dev/shm/detail): Polling for detail file |
|
|
|
its late. i'm tired. how do i get to type that in? the gdb window is filled with the radiusd -X output, if I ctrl-C in that window and signal sigint etc I get told "No symbol "sigint" in current context." |
|
ah, maybe it doesn't do the name translation. |
|
(gdb) signal 2 |
|
Hmm I guess it's that detail thread not being killed correctly. |
:100644 100644 c312260... 95280f1... M src/modules/rlm_detail/rlm_detail.c ? |
|
The exfiles patch is the one breaking SIGINT? I fail to see how that works... And you can't really test SIGINT trapping with gdb. That's because gdb will trap the signal, and return you to the gdb prompt |
|
Shouldn't do if you raise the signal from within gdb? |
|
aha! getting closer..... i had 5 ldap connections defined ldap ldap1 {} etc..... and calling each one in some virtual servers..... hence lots of connections...hence needing to use a higher ulimit for radius/root user... I've now reverted that 'best practice' a little - just talking to one ldap server for now.... so my redundant-load-balance now only calls ldap1 and now ctrl-C works again! and more interesting....look at the bottom line that I now get... Ready to process requests alan |
|
The last line just means that we're missing some synchronization on exit, which is a low priority to fix. It doesn't cause any problems during normal operation because that code path is only used on exit. I also don't see how opening more file descriptors causes CTRL-C to stop working. I'll put it down to a platform-specific bug TBH. |
|
ctrl-C now failing again on 3.1.x (was working until last week) |
|
Works for me on OSX and on Ubuntu 15.10. Can you try with a default configuration, see if it works, and then try to isolate it from that. |
|
Nothing much has changed in the 3.1.x code since last week apart from reverting a chunk of code that's better in v4. TBH I'd suspect it's some race type condition triggered by the number of instances of rlm_ldap or similar that you're running, and because -m is now default so FR tries to clean up nicely. Although the cleanest thing is to try and fix the bug, one option is just to restore previous non-"-m" behaviour for 3.1.x, and try and fix it up so everything exits cleanly in v4. If the latest commit broke it again, then it should be fixed in v4 anyway. |
|
Restoring non '-m' means the exit triggers don't work correctly. |
|
I'd prefer to figure out why it works for most systems, but not for Buxey's. The fix should be simple once we know what it is. |
|
I've taken a standard config + four or so different instantiations of ldap each with 64 pool servers, called in redundant load-balance, which is about as similar as I can work out from the info provided so far. And can't reproduce diddly squat. @alanbuxey as Alan D said can you chop your config down to the minimal amount that triggers this? Agree that's probably going to be the only way to find it. |
|
Forget that... reproduced it :-) |
|
If there is an ldap search underway and you hit Ctrl-C, you'll have to hit it repeatedly thread pool.max_servers + 1 number of times before you exit, or if the ldap search responds. So if your ldap search gets stuck and you have a crazy number of servers in the pool, then you'll have to hit Ctrl-C a crazy number of times before the server stops. i.e. each time you hit Ctrl-C the current ldap search stops and it moves on to the next server in the pool.
All default config except mods-enabled/ldap symlink exists, only config changed from default:
What happens here is the ctrl-c interrupts Essentially the main event loop has signalled itself to quit ( Should This is of course assuming I've hit the same issue... |
|
On Jun 27, 2016, at 5:30 PM, Matthew Newton notifications@github.com wrote:
Hmm... that's not good.
That's just rude. Is the OpenLDAP client library catching CTRL-C? If so, why?
It should also close the old connection. Or even better, just return something like LDAP_PROC_INTR, which would mean that the caller keeps the same connection, but just retries.
That's the problem with threads... any of that signalling is hard to get right.
The thread pool code could do pthread_kill(..., SIGTERM) for all child threads when it's told to exit... And yes, the connection pool should stop doing work when the server is exiting.
Sounds right. Alan DeKok. |
|
Actually, libldap shows:
so hitting Ctrl-C is breaking the select(). Which I guess is fair enough. So I think it's down to the selection pool realising that the server is on its way out, and not returning a new connection to use. In the
But ultimately it's the same: if Matthew |
mcnewton
added defect v3.1.x
labels
Jun 27, 2016
|
looks like it - as you say, many many many ctrl-Cs (we have 5 ldap servers On 27 June 2016 at 23:13, Matthew Newton notifications@github.com wrote:
|
|
This should be fixed in v4.0.x by server re-architecture. We're planing to have one connection per thread. |
|
That would be good. I hacked around with some ideas to fix this, but ran out of time. But in the case of -X single threaded ISTR the signal handler still needs to do more than just record the fact that the server needs to quit - otherwise the connection pool just gets interrupted and then immediately reconnects with the next connection. |
|
The long-term solution is to differentiate EINTR due to CTRL-C from EINTR due to other sources. We'll need to update every piece of code which checks for EINTR to also check for "server is supposed to shut down", and then do the right thing. For the connection pool, it's relatively simple. Just refuse to continue after EINTR, and refuse to return new connections after SIGQUIT (via checking global variable, or some such thing). For 4.0, if every thread is responsible for it's own connections, which can make it harder. But... each thread also has it's own event loop. Which means that it can check for a global variable there, and then refuse to further service the event loop after SIGQUIT. |
alandekok
added a commit
that referenced
this issue
Sep 25, 2016
|
|
alandekok |
82f556b
|
|
Please try git v3.0.x head. I've pushed a patch which should fix it. Test it soon, so we can get release 3.0.12 out. |
|
is this patch in 3.1.x too? of not, trivial for me to cherry-pick that into alan On 25 September 2016 at 16:08, Alan DeKok notifications@github.com wrote:
|
|
You'll have to see if it goes to 3.1.x. I just did it for 3.0 Sent from my iPhone
|
|
@alanbuxey FYI just had a look at this; doesn't work cleanly on 3.1: it's fine for the main server, but included tools don't define main_config, so it bombs out during linking. Presumably the same in v4. |
|
thanks. i think its the same in v4 (maybe will go when the new
code/methods etc are in place?)
…
|
alanbuxey commentedMay 6, 2016
when starting server in debug mode
radiusd -X
can no longer Ctrl-C in the terminal to kill it - just carries on running.... have to 'kill -9 %PID' in another window