-
-
Notifications
You must be signed in to change notification settings - Fork 416
Fix for issue 11981 - deadlock during thread initialization on Posix #718
Conversation
// until it has been initialized. Manual entry and exit | ||
// avoids TLS access (see issue 11981). | ||
synchronized(Thread.criticalRegionLock) obj.m_isInCriticalRegion = true; | ||
scope( exit ) synchronized(Thread.criticalRegionLock) obj.m_isInCriticalRegion = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to rely on the critical regions implementation for core functionality, because adding it was quite controversial and there are a few more ideas for non-suspendable threads.
Also this pull is slightly incorrect. Calling malloc in the signal handler (through TLS access) is wrong because __tls_get_addr isn't async safe. Here you mitigate this problem because rt.tlsgc.init does a TLS access itself, thus any further TLS access in this thread will no longer call malloc.
It's still possible that malloc is already locked in one another thread while SIGUSR1 is delivered before this critical region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple fix would be to go back to pthread_getspecific
for Thread.sm_this
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've overlooked the fact that malloc may still be already locked. I'll dig into history and see what can be done using pthread_set/getspecific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit convoluted, but the initial change was made here #456.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. I'll leave this request open while working on the new one, in case someone has more ideas or problems to point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this 👍.
Here's an updated version. |
// to avoid TLS access in signal handlers (malloc deadlock) | ||
// when using shared libraries, see issue 11981. | ||
status = pthread_key_create( &Thread.sm_this, null ); | ||
assert( status == 0 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create the key in a shared static this()
and destroy it in a shared static ~this()
.
You could define them close to getThis/setThis, then a single of those comments would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Makes sense.
...Except we can't. create/delete should happen in thread_init() and thread_term().
To clarify: thread_init() is run before even shared static ctors, and main thread is initialized then as well. I.e. it's too late to create the key in shared static ctor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your right, I forgot about that.
Interesting. Darwin_32 failed with a bus error. Does it use version(Posix) instead of version(OSX)? |
Auto-merge toggled on |
Fix for issue 11981 - deadlock during thread initialization on Posix
Thanks! |
Thanks for fixing the bug. |
I think that's your bounty https://www.bountysource.com/issues/1337742-shared-library-segv, go claim it. |
Oh wow, thanks for tracking this! Although ever since that issue has been marked as a duplicate I was somewhat inclined to doublecheck. That get_nprocs() call throws me off. Just haven't found time yet to match the environment of that issue (lbc and ld). Now I'm a bit more inclined :) |
Well, I checked an Ubuntu 12.10 and the ld.bfd related bug causes every test to fail, not just the host one. |
Looks like it. I've ran some tests too and commented in bugzilla. |
Fix for issue 11981 - deadlock during thread initialization on Posix
Issue 11981: deadlock during thread initialization on Posix