Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks on crash caused calling free/malloc #324

Closed
lenonk opened this issue Sep 19, 2019 · 8 comments
Closed

Deadlocks on crash caused calling free/malloc #324

lenonk opened this issue Sep 19, 2019 · 8 comments

Comments

@lenonk
Copy link

lenonk commented Sep 19, 2019

I know there are workarounds, but I thought the following might be of interest:

#0 0x00007f47d12ec3ae in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f47d127135b in _L_lock_10288 () from /lib64/libc.so.6
#2 0x00007f47d126eb83 in malloc () from /lib64/libc.so.6
#3 0x00007f47d358ffdd in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f47d3595f52 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#5 0x00007f47d35915e9 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#6 0x00007f47d359599a in _dl_open () from /lib64/ld-linux-x86-64.so.2
#7 0x00007f47d131af80 in do_dlopen () from /lib64/libc.so.6
#8 0x00007f47d35915e9 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x00007f47d131b0d7 in __libc_dlopen_mode () from /lib64/libc.so.6
#10 0x00007f47d12f2b75 in init () from /lib64/libc.so.6
#11 0x00007f47d3007e03 in pthread_once () from /lib64/libpthread.so.0
#12 0x00007f47d12f2ca4 in backtrace () from /lib64/libc.so.6
#13 0x00007f47d27ac496 in g3::internal::stackdump(char const*) () from /var/alertlogic/lib/libg3logger.so
#14 0x00007f47d27ad897 in (anonymous namespace)::signalHandler(int, siginfo*, void*) () from /lib/libg3logger.so
#15
#16 0x00007f47d1269f93 in malloc_consolidate () from /lib64/libc.so.6
#17 0x00007f47d126cc88 in _int_free () from /lib64/libc.so.6

Any crash that happens from free() or malloc() will cause a deadlock if backtraces are enabled. This is due to the attempt to allocate memory from g3log's signal handler.

@lenonk lenonk changed the title Deadlocks on crash caused by double free Deadlocks on crash caused calling free/malloc Sep 19, 2019
@KjellKod
Copy link
Owner

KjellKod commented Sep 19, 2019

Really? There should be failsaves in place that detects recursive crashes (which would not be deadlock of course but still bad) and to the best of my knowledge there's no mutex in place within the signal handler that would trigger a deadlock.

Please provide a code snipped that can reproduce this exact issue. .... of course if your code is crashing inside free or malloc you have bigger concerns ;)

@lenonk
Copy link
Author

lenonk commented Sep 19, 2019

Yeah, agreed, the root problem is in the code that called free() on a bad pointer. The issue, however, is that free() and malloc() both actually do lock a mutex. So when free() raises a signal, the mutex is locked. In the above stack trace, it looks like the backtrace() function itself is trying to allocate memory via a series of calls to dl_open() and it's support functions, which of course causes the deadlock. Which means, this may only be a problem if the backtrace crosses module boundries. I'm not certain exactly how backtrace() works.

see: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#_int_free for proof that free() locks a mutex along with malloc().

Even if that wasn't an issue though, there is still this:
char* real_name = abi::__cxa_demangle(mangled_name, 0, 0, &status);

Which definitely allocates memory, due to the requirement of:
free(real_name);
so that's always going to be a deadlock in the case of crashing due to freeing a bad pointer if we make it that far.

I'm not really expecting you to fix this since I honestly have no idea what you would do about it and still provide backtrace functionality, but I thought you'd like to know about the issue.

I'll try to come up with a small, self contained example that causes this behavior. The code that's doing this currently isn't my code, and is far to complicated (and this behavior is far too rare) for it to be useful to you.

@lenonk
Copy link
Author

lenonk commented Sep 19, 2019

Well, in trying to create a small, simple, self contained example, I have experienced utter failure. It looks like _int_free() only locks the mutex in certain conditions, and backtrace() only calls dlopen() and friends, thus attempting to allocate memory, in certain conditions. The backtraces I'm getting from my experimental code looks nothing like the backtrace above.

I've spent all the time on it that I can today. I'll try to come back to it at some point, but this is not low hanging fruit. All I can say is that allocating memory from a signal handler is generally considered bad form, for exactly this reason. Perhaps you could just add an option for turning off the backtrace functionality. If that option already exists and I'm missing it, I apologize.

@KjellKod
Copy link
Owner

Thanks for looking into this at depth. Yes, You can turn off the signal handler or provide your own without backtrace.

In case of stack trace when you have a fatal signal you have already left the land of defined control. G3log does the best it can in this situation and since 2010 I’ve never seen this issue happen. I.e it’s extremely robust.

At this point I’m not considering any changes since

  1. No code is shown that can pin point the issue and there are a myriad other ways that could trigger the issue (dll misuse, forking, general memory corruption).

  2. You have the option of not using the signal handler. See the documentation and CMake configurable options.

@lenonk
Copy link
Author

lenonk commented Sep 20, 2019

So, I did some more research, and I found this:

   *  backtrace() and backtrace_symbols_fd() don't call malloc()
      explicitly, but they are part of libgcc, which gets loaded
      dynamically when first used.  Dynamic loading usually triggers a
      call to malloc(3).  If you need certain calls to these two
      functions to not allocate memory (in signal handlers, for
      example), you need to make sure libgcc is loaded beforehand.

Might be of interest, whether it fixes my particular problem or not. It seems like it might actually be effective in my case though.

Other than that, I see no changes that you could make since the source of the deadlock is backtrace() allocating memory from a signal handler. I will look into disabling the signal handler from g3log, since I already have my own implemented anyway, and doing the backtrace manually from a safe location.

Thanks for being responsive to this.

@narayana-dev
Copy link

@lenonk Did you find any fix for it?

@lenonk
Copy link
Author

lenonk commented Jul 5, 2024

@lenonk Did you find any fix for it?

No, there is no fix for it. The best you can do is turn off glog's backtrace handling, and minimize potential problems by callling backtrace() once after your initialization to make sure everything it needs is is preloaded. Make sure you don't allocate any memory on fatal signals and write the backtrace to a file using backtrace_symbols_fd() instead of to the console using backtrace_symbols(). I don't remember why any of this works since it's been almost 5 years, but I didn't have any issues with it after that.

@narayana-dev
Copy link

@lenonk I actually am getting segfault in malloc due to writing to freed memory so when backtrace is called it is again going to malloc which is waiting on lock. I don't think these kind of things can be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants