-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlocks on crash caused calling free/malloc #324
Comments
Really? There should be failsaves in place that detects recursive crashes (which would not be deadlock of course but still bad) and to the best of my knowledge there's no mutex in place within the signal handler that would trigger a deadlock. Please provide a code snipped that can reproduce this exact issue. .... of course if your code is crashing inside |
Yeah, agreed, the root problem is in the code that called free() on a bad pointer. The issue, however, is that free() and malloc() both actually do lock a mutex. So when free() raises a signal, the mutex is locked. In the above stack trace, it looks like the backtrace() function itself is trying to allocate memory via a series of calls to dl_open() and it's support functions, which of course causes the deadlock. Which means, this may only be a problem if the backtrace crosses module boundries. I'm not certain exactly how backtrace() works. see: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#_int_free for proof that free() locks a mutex along with malloc(). Even if that wasn't an issue though, there is still this: Which definitely allocates memory, due to the requirement of: I'm not really expecting you to fix this since I honestly have no idea what you would do about it and still provide backtrace functionality, but I thought you'd like to know about the issue. I'll try to come up with a small, self contained example that causes this behavior. The code that's doing this currently isn't my code, and is far to complicated (and this behavior is far too rare) for it to be useful to you. |
Well, in trying to create a small, simple, self contained example, I have experienced utter failure. It looks like _int_free() only locks the mutex in certain conditions, and backtrace() only calls dlopen() and friends, thus attempting to allocate memory, in certain conditions. The backtraces I'm getting from my experimental code looks nothing like the backtrace above. I've spent all the time on it that I can today. I'll try to come back to it at some point, but this is not low hanging fruit. All I can say is that allocating memory from a signal handler is generally considered bad form, for exactly this reason. Perhaps you could just add an option for turning off the backtrace functionality. If that option already exists and I'm missing it, I apologize. |
Thanks for looking into this at depth. Yes, You can turn off the signal handler or provide your own without backtrace. In case of stack trace when you have a fatal signal you have already left the land of defined control. G3log does the best it can in this situation and since 2010 I’ve never seen this issue happen. I.e it’s extremely robust. At this point I’m not considering any changes since
|
So, I did some more research, and I found this:
Might be of interest, whether it fixes my particular problem or not. It seems like it might actually be effective in my case though. Other than that, I see no changes that you could make since the source of the deadlock is backtrace() allocating memory from a signal handler. I will look into disabling the signal handler from g3log, since I already have my own implemented anyway, and doing the backtrace manually from a safe location. Thanks for being responsive to this. |
@lenonk Did you find any fix for it? |
No, there is no fix for it. The best you can do is turn off glog's backtrace handling, and minimize potential problems by callling backtrace() once after your initialization to make sure everything it needs is is preloaded. Make sure you don't allocate any memory on fatal signals and write the backtrace to a file using backtrace_symbols_fd() instead of to the console using backtrace_symbols(). I don't remember why any of this works since it's been almost 5 years, but I didn't have any issues with it after that. |
@lenonk I actually am getting segfault in malloc due to writing to freed memory so when backtrace is called it is again going to malloc which is waiting on lock. I don't think these kind of things can be fixed. |
I know there are workarounds, but I thought the following might be of interest:
#0 0x00007f47d12ec3ae in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f47d127135b in _L_lock_10288 () from /lib64/libc.so.6
#2 0x00007f47d126eb83 in malloc () from /lib64/libc.so.6
#3 0x00007f47d358ffdd in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f47d3595f52 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#5 0x00007f47d35915e9 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#6 0x00007f47d359599a in _dl_open () from /lib64/ld-linux-x86-64.so.2
#7 0x00007f47d131af80 in do_dlopen () from /lib64/libc.so.6
#8 0x00007f47d35915e9 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x00007f47d131b0d7 in __libc_dlopen_mode () from /lib64/libc.so.6
#10 0x00007f47d12f2b75 in init () from /lib64/libc.so.6
#11 0x00007f47d3007e03 in pthread_once () from /lib64/libpthread.so.0
#12 0x00007f47d12f2ca4 in backtrace () from /lib64/libc.so.6
#13 0x00007f47d27ac496 in g3::internal::stackdump(char const*) () from /var/alertlogic/lib/libg3logger.so
#14 0x00007f47d27ad897 in (anonymous namespace)::signalHandler(int, siginfo*, void*) () from /lib/libg3logger.so
#15
#16 0x00007f47d1269f93 in malloc_consolidate () from /lib64/libc.so.6
#17 0x00007f47d126cc88 in _int_free () from /lib64/libc.so.6
Any crash that happens from free() or malloc() will cause a deadlock if backtraces are enabled. This is due to the attempt to allocate memory from g3log's signal handler.
The text was updated successfully, but these errors were encountered: