Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite recursion in TLS tramp guard #98

Closed
cuviper opened this issue Jun 16, 2016 · 5 comments
Closed

Infinite recursion in TLS tramp guard #98

cuviper opened this issue Jun 16, 2016 · 5 comments
Labels

Comments

@cuviper
Copy link
Contributor

cuviper commented Jun 16, 2016

Using systemtap, I have tried to instrument the mutex_entry static probe point in libpthread.so. When any new thread tries to take a lock, it recurses on itself until the stack overflows. The sequence looks something like:

  • Thread reaches the mutex_entry address and starts the instrumentation.
  • We call DYNINST_lock_tramp_guard() to make sure we aren't recursing.
  • This accesses the TLS DYNINST_tls_tramp_guard as an offset from __tls_get_addr().
  • Since it's a new thread, this has to set up the TLS addr, calling tls_get_addr_tail().
  • This takes the rtld lock, re-entering __pthread_mutex_lock().
  • Hey there, mutex_entry... repeat until the stack runs out and we hit SIGSEGV.

I don't think there's anything actually special about the exact mutex_entry point here -- it's just a conveniently low-level place to probe. Probably any point within __tls_get_addr() or below will have similar hazards.

@cuviper
Copy link
Contributor Author

cuviper commented Jun 16, 2016

I'll work on getting an easy reproducer for this scenario, but I wanted to get this problem description up now so others can start thinking about it.

Another troublesome aspect is that the thread's SIGSEGV leads to the mutator stalling too. I haven't characterized that issue yet though.

@wrwilliams
Copy link
Member

Two questions to think about: is it preferable to pay the costs for safety at mutator design time, at instrumentation time, or at run time, and do we need a solution to scale to an arbitrary number of threads?

@cuviper
Copy link
Contributor Author

cuviper commented Jun 16, 2016

is it preferable to pay the costs for safety at mutator design time, at instrumentation time, or at run time

It would be an extremely high burden to design our mutator without any potential recursion, so that tramp guards could be disabled. Especially since systemtap gives the user a lot of freedom in what to do in their instrumentation. The only other option I see from the mutator design is to forbid instrumenting anywhere in the __tls_get_addr chain -- we might have to do that, but mutex_entry sure is a desirable point to look at, so I'd hate to forbid it.

I can't really evaluate the costs at instrumentation time or run time until we have proposals for what those fixes would entail. I have one hack written to insert a global atomic tramp counter before tls is accessed, to limit how deep it can go, but this has its own drawbacks. I'm also exploring whether there are any ways to reconfigure TLS itself. I'm curious to hear what ideas you have.

do we need a solution to scale to an arbitrary number of threads?

It sure would be nice. I know this is why we switched to a TLS tramp guard in the first place.

@wrwilliams
Copy link
Member

Bart and I had a talk about this; all solutions we have come up with thus far are kind of gross. Recall that we do have the ability to instrument or execute RPCs at thread creation time. So here's what I can figure as our options, in two orthogonal sets:

TLS CREATION TIME

  1. Force TLS for the new thread to be created at thread creation, and in a safe manner
  2. Allow TLS to be created as it's used (current)

ENSURING SAFETY

  1. Have the __tls_get_addr called by the RTlib execute original uninstrumented code
  2. Ensure that anything in the __tls_get_addr chain emulates tramp guards through some non-TLS mechanism
  3. Forbid anything in the __tls_get_addr chain from making function calls (limiting, but better than 'you can't instrument')
  4. (blue-sky, but there may be something in this) Translate function calls during the __tls_get_addr chain to snippets that record parameters and then make the calls at the exit of __tls_get_addr

With respect to DYNINST_lock_tramp_guard executing unmodified __tls_get_addr, there are multiple approaches: redirecting that particular call to an unmodified copy is probably the simplest, but I'm not positive I'm seeing all the angles on it.

@cuviper
Copy link
Contributor Author

cuviper commented Jun 16, 2016

Force TLS for the new thread to be created at thread creation, and in a safe manner

There's a similar sort of problem with signal handlers that use TLS variables. The general advice there is to touch TLS early in each thread before a signal can get to it.

The safe way in this case seems like it would require freezing all threads, removing or disabling ld/libc/libpthread instrumentation, touch TLS in the new thread, re-instrument and resume. A rather heavy solution.

Have the __tls_get_addr called by the RTlib execute original uninstrumented code

That sounds great, but I don't think this call chain can be statically determined. It looks like __rtld_lock_lock_recursive goes through a function pointer to reach __pthread_mutex_lock.


I spoke with Carlos O'Donell about this issue, and he let me in on the secret that is Static TLS, via the tls-model "initial-exec". This can be chosen either by -ftls-model or a direct attribute on the variable. It's a "secret" in that this is a limited resource, only meant to be used by system libraries. But I think we can justify that RTlib is a special enough case, with very targeted usage and a small need - we can even reduce this to one byte. With this model applied, DYNINST_lock_tramp_guard() gets very simple with no calls at all. I will play with this a bit more and then probably send a pull request.

Briefly reviewing the MSDN articles on TLS and thread, it sounds like Windows has a pretty similar static model for __declspec(thread), suggesting TlsAlloc for general DLLs that might be loaded with LoadLibrary. But I guess RTlib has squeezed in OK so far...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants