Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

Fix for issue 11981 - deadlock during thread initialization on Posix #718

Merged
merged 4 commits into from
Feb 10, 2014

Conversation

radcapricorn
Copy link
Contributor

Issue 11981: deadlock during thread initialization on Posix

// until it has been initialized. Manual entry and exit
// avoids TLS access (see issue 11981).
synchronized(Thread.criticalRegionLock) obj.m_isInCriticalRegion = true;
scope( exit ) synchronized(Thread.criticalRegionLock) obj.m_isInCriticalRegion = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to rely on the critical regions implementation for core functionality, because adding it was quite controversial and there are a few more ideas for non-suspendable threads.
Also this pull is slightly incorrect. Calling malloc in the signal handler (through TLS access) is wrong because __tls_get_addr isn't async safe. Here you mitigate this problem because rt.tlsgc.init does a TLS access itself, thus any further TLS access in this thread will no longer call malloc.
It's still possible that malloc is already locked in one another thread while SIGUSR1 is delivered before this critical region.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple fix would be to go back to pthread_getspecific for Thread.sm_this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've overlooked the fact that malloc may still be already locked. I'll dig into history and see what can be done using pthread_set/getspecific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit convoluted, but the initial change was made here #456.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. I'll leave this request open while working on the new one, in case someone has more ideas or problems to point out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this 👍.

@radcapricorn
Copy link
Contributor Author

Here's an updated version.

// to avoid TLS access in signal handlers (malloc deadlock)
// when using shared libraries, see issue 11981.
status = pthread_key_create( &Thread.sm_this, null );
assert( status == 0 );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create the key in a shared static this() and destroy it in a shared static ~this().
You could define them close to getThis/setThis, then a single of those comments would suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Makes sense.

...Except we can't. create/delete should happen in thread_init() and thread_term().

To clarify: thread_init() is run before even shared static ctors, and main thread is initialized then as well. I.e. it's too late to create the key in shared static ctor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your right, I forgot about that.

@radcapricorn
Copy link
Contributor Author

Interesting. Darwin_32 failed with a bus error. Does it use version(Posix) instead of version(OSX)?
Oh, I see, version(Posix) includes version(OSX).

@MartinNowak
Copy link
Member

Auto-merge toggled on

MartinNowak added a commit that referenced this pull request Feb 10, 2014
Fix for issue 11981 - deadlock during thread initialization on Posix
@MartinNowak MartinNowak merged commit 4ce7f1d into dlang:master Feb 10, 2014
@radcapricorn radcapricorn deleted the issue_11981 branch February 11, 2014 04:43
@radcapricorn
Copy link
Contributor Author

Thanks!

@MartinNowak
Copy link
Member

Thanks for fixing the bug.

@MartinNowak
Copy link
Member

I think that's your bounty https://www.bountysource.com/issues/1337742-shared-library-segv, go claim it.

@radcapricorn
Copy link
Contributor Author

Oh wow, thanks for tracking this! Although ever since that issue has been marked as a duplicate I was somewhat inclined to doublecheck. That get_nprocs() call throws me off. Just haven't found time yet to match the environment of that issue (lbc and ld). Now I'm a bit more inclined :)

@MartinNowak
Copy link
Member

Well, I checked an Ubuntu 12.10 and the ld.bfd related bug causes every test to fail, not just the host one.

@radcapricorn
Copy link
Contributor Author

Looks like it. I've ran some tests too and commented in bugzilla.

dnadlinger pushed a commit to ldc-developers/druntime that referenced this pull request Jul 6, 2014
Fix for issue 11981 - deadlock during thread initialization on Posix
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants