Infinite recursion in TLS tramp guard #98

cuviper · 2016-06-16T00:08:49Z

Using systemtap, I have tried to instrument the mutex_entry static probe point in libpthread.so. When any new thread tries to take a lock, it recurses on itself until the stack overflows. The sequence looks something like:

Thread reaches the mutex_entry address and starts the instrumentation.
We call DYNINST_lock_tramp_guard() to make sure we aren't recursing.
This accesses the TLS DYNINST_tls_tramp_guard as an offset from __tls_get_addr().
Since it's a new thread, this has to set up the TLS addr, calling tls_get_addr_tail().
This takes the rtld lock, re-entering __pthread_mutex_lock().
Hey there, mutex_entry... repeat until the stack runs out and we hit SIGSEGV.

I don't think there's anything actually special about the exact mutex_entry point here -- it's just a conveniently low-level place to probe. Probably any point within __tls_get_addr() or below will have similar hazards.

The text was updated successfully, but these errors were encountered:

cuviper · 2016-06-16T00:13:05Z

I'll work on getting an easy reproducer for this scenario, but I wanted to get this problem description up now so others can start thinking about it.

Another troublesome aspect is that the thread's SIGSEGV leads to the mutator stalling too. I haven't characterized that issue yet though.

wrwilliams · 2016-06-16T14:23:04Z

Two questions to think about: is it preferable to pay the costs for safety at mutator design time, at instrumentation time, or at run time, and do we need a solution to scale to an arbitrary number of threads?

cuviper · 2016-06-16T18:06:40Z

is it preferable to pay the costs for safety at mutator design time, at instrumentation time, or at run time

It would be an extremely high burden to design our mutator without any potential recursion, so that tramp guards could be disabled. Especially since systemtap gives the user a lot of freedom in what to do in their instrumentation. The only other option I see from the mutator design is to forbid instrumenting anywhere in the __tls_get_addr chain -- we might have to do that, but mutex_entry sure is a desirable point to look at, so I'd hate to forbid it.

I can't really evaluate the costs at instrumentation time or run time until we have proposals for what those fixes would entail. I have one hack written to insert a global atomic tramp counter before tls is accessed, to limit how deep it can go, but this has its own drawbacks. I'm also exploring whether there are any ways to reconfigure TLS itself. I'm curious to hear what ideas you have.

do we need a solution to scale to an arbitrary number of threads?

It sure would be nice. I know this is why we switched to a TLS tramp guard in the first place.

wrwilliams · 2016-06-16T18:29:23Z

Bart and I had a talk about this; all solutions we have come up with thus far are kind of gross. Recall that we do have the ability to instrument or execute RPCs at thread creation time. So here's what I can figure as our options, in two orthogonal sets:

TLS CREATION TIME

Force TLS for the new thread to be created at thread creation, and in a safe manner
Allow TLS to be created as it's used (current)

ENSURING SAFETY

Have the __tls_get_addr called by the RTlib execute original uninstrumented code
Ensure that anything in the __tls_get_addr chain emulates tramp guards through some non-TLS mechanism
Forbid anything in the __tls_get_addr chain from making function calls (limiting, but better than 'you can't instrument')
(blue-sky, but there may be something in this) Translate function calls during the __tls_get_addr chain to snippets that record parameters and then make the calls at the exit of __tls_get_addr

With respect to DYNINST_lock_tramp_guard executing unmodified __tls_get_addr, there are multiple approaches: redirecting that particular call to an unmodified copy is probably the simplest, but I'm not positive I'm seeing all the angles on it.

cuviper · 2016-06-16T20:04:03Z

Force TLS for the new thread to be created at thread creation, and in a safe manner

There's a similar sort of problem with signal handlers that use TLS variables. The general advice there is to touch TLS early in each thread before a signal can get to it.

The safe way in this case seems like it would require freezing all threads, removing or disabling ld/libc/libpthread instrumentation, touch TLS in the new thread, re-instrument and resume. A rather heavy solution.

Have the __tls_get_addr called by the RTlib execute original uninstrumented code

That sounds great, but I don't think this call chain can be statically determined. It looks like __rtld_lock_lock_recursive goes through a function pointer to reach __pthread_mutex_lock.

I spoke with Carlos O'Donell about this issue, and he let me in on the secret that is Static TLS, via the tls-model "initial-exec". This can be chosen either by -ftls-model or a direct attribute on the variable. It's a "secret" in that this is a limited resource, only meant to be used by system libraries. But I think we can justify that RTlib is a special enough case, with very targeted usage and a small need - we can even reduce this to one byte. With this model applied, DYNINST_lock_tramp_guard() gets very simple with no calls at all. I will play with this a bit more and then probably send a pull request.

Briefly reviewing the MSDN articles on TLS and thread, it sounds like Windows has a pretty similar static model for __declspec(thread), suggesting TlsAlloc for general DLLs that might be loaded with LoadLibrary. But I guess RTlib has squeezed in OK so far...

This was referenced Jun 16, 2016

RTlib: use static TLS for the tramp guard #99

Merged

SIGSEGV in a thread hangs the mutator #100

Closed

wrwilliams closed this as completed in #99 Jun 17, 2016

wrwilliams added the bug label Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite recursion in TLS tramp guard #98

Infinite recursion in TLS tramp guard #98

cuviper commented Jun 16, 2016

cuviper commented Jun 16, 2016

wrwilliams commented Jun 16, 2016

cuviper commented Jun 16, 2016

wrwilliams commented Jun 16, 2016

cuviper commented Jun 16, 2016

Infinite recursion in TLS tramp guard #98

Infinite recursion in TLS tramp guard #98

Comments

cuviper commented Jun 16, 2016

cuviper commented Jun 16, 2016

wrwilliams commented Jun 16, 2016

cuviper commented Jun 16, 2016

wrwilliams commented Jun 16, 2016

cuviper commented Jun 16, 2016