Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Fix deadlock in static constructor initialization #93875

Merged
merged 6 commits into from
Oct 25, 2023

Conversation

lambdageek
Copy link
Member

@lambdageek lambdageek commented Oct 23, 2023

If two threads (A and B) need to call the static constructors for 3 classes X, Y, Z in this order:

Thread A: X { Z, Y }
Thread B: Y { Z, X }

where the cctor for X in thread A invokes the cctor for Z and for Y, and the cctor for Y in thread B invokes the cctor for Z and X, then we can get into a situation where A and B both start the cctors for X and Y (so they will be in the type_initialization_hash for those two classes) and then both will try to init Z. In that case it could be that A will be responsible for initializing Z and B will block. Then A could finish initializing Z but B may not have woken up yet (and so it will be in blocked_thread_hash waiting for Z). At that point A (who is at this point already need to init Y) may think that it can wait for B to finish initializing Y. That is we get to
mono_coop_cond_timedwait (&lock->cond, &lock->mutex, timeout_ms) with "lock" being the lock for Y. But in fact thread B will not be able to complete initializing Y because it will attempt to init X next - but meanwhile we did not indicate that thread A is blocked. As a result in thread A the timed wait will eventually timeout. And in this case we need to go back to the top and now correctly detect that A is waiting for Y and B is waiting for X. (At that point there is a cctor deadlock and ECMA rules allow one of the threads to return without calling the cctor)

The old code here used to do an infinite wait:

  while (!lock->done)
    mono_coop_cond_wait (&lock->cond, &lock->mutex)

This cannot succeed because "lock" (in thread A it's the lock for Y) will not be signaled since B (who is supposed to init Y) will instead block on the cctor for X.

Fixes #93778

If two threads (A and B) need to call the static constructors for 3
classes X, Y, Z in this order:

Thread A: X, Z, Y
Thread B: Y, Z, X

where the cctor for X in thread A invokes the cctor for Z and for Y, and the
cctor for Y in thread B invokes the cctor for Z and X, then we can get
into a situation where A and B both start the cctors for X and Y (so
they will be in the type_initialization_hash for those two classes)
and then both will try to init Z.  In that case it could be that A
will be responsible for initializing Z and B will block.  Then A could
finish initializing Z but B may not have woken up yet (and so it will
be in blocked_thread_hash waiting for Z).  At that point A (who is at
this point already need to init Y) may think that it can wait for B to
finish initializing Y.  That is we get to
`mono_coop_cond_timedwait (&lock->cond, &lock->mutex, timeout_ms)`
with "lock" being the lock for `Y`.  But in fact thread B will not be
able to complete initializing Y because it will attempt to init X next
- but meanwhile we did not indicate that thread A is blocked.
As a result in thread A the timed wait will eventually timeout.  And
in this case we need to go back to the top and now correctly detect
that A is waiting for Y and B is waiting for X.  (At that point there
is a cctor deadlock and ECMA rules allow one of the threads to return
without calling the cctor)

The old code here used to do an infinite wait:
  while (!lock->done)
    mono_coop_cond_wait (&lock->cond, &lock->mutex)

This cannot succeed because "lock" (in thread A it's the lock for Y)
will not be signaled since B (who is supposed to init Y) will instead
block on the cctor for X.

Fixes dotnet#93778
@lambdageek lambdageek marked this pull request as ready for review October 23, 2023 19:05
@lambdageek
Copy link
Member Author

/backport to release/8.0-staging

@github-actions
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/6632161553

@lambdageek
Copy link
Member Author

I wonder if I need a test where a static cctor dependency takes a long time to unblock (longer than the timeout that I added here). Just to verify that I didn't break the ability for one thread to wait for another to complete.

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Implementation details looks reasonable and since PR just switched from a infinite wait to a timeout wait with a retry loop following the current patterns shouldn't introduce any new side effects.

@lambdageek
Copy link
Member Author

/backport to release/8.0-staging

@github-actions
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/6640645768

@lambdageek lambdageek merged commit 3cd6455 into dotnet:main Oct 25, 2023
160 checks passed
liveans pushed a commit to liveans/dotnet-runtime that referenced this pull request Nov 9, 2023
* [mono] Fix deadlock in static constructor initialization

If two threads (A and B) need to call the static constructors for 3
classes X, Y, Z in this order:

Thread A: X, Z, Y
Thread B: Y, Z, X

where the cctor for X in thread A invokes the cctor for Z and for Y, and the
cctor for Y in thread B invokes the cctor for Z and X, then we can get
into a situation where A and B both start the cctors for X and Y (so
they will be in the type_initialization_hash for those two classes)
and then both will try to init Z.  In that case it could be that A
will be responsible for initializing Z and B will block.  Then A could
finish initializing Z but B may not have woken up yet (and so it will
be in blocked_thread_hash waiting for Z).  At that point A (who is at
this point already need to init Y) may think that it can wait for B to
finish initializing Y.  That is we get to
`mono_coop_cond_timedwait (&lock->cond, &lock->mutex, timeout_ms)`
with "lock" being the lock for `Y`.  But in fact thread B will not be
able to complete initializing Y because it will attempt to init X next
- but meanwhile we did not indicate that thread A is blocked.
As a result in thread A the timed wait will eventually timeout.  And
in this case we need to go back to the top and now correctly detect
that A is waiting for Y and B is waiting for X.  (At that point there
is a cctor deadlock and ECMA rules allow one of the threads to return
without calling the cctor)

The old code here used to do an infinite wait:
  while (!lock->done)
    mono_coop_cond_wait (&lock->cond, &lock->mutex)

This cannot succeed because "lock" (in thread A it's the lock for Y)
will not be signaled since B (who is supposed to init Y) will instead
block on the cctor for X.

Fixes dotnet#93778

* Add test case

* remove prototyping log messages

* disable mt test on wasm

* code review feedback
lambdageek added a commit that referenced this pull request Nov 14, 2023
…alization (#93943)


Backport of #93875 to release/8.0-staging

* [mono] Fix deadlock in static constructor initialization

If two threads (A and B) need to call the static constructors for 3
classes X, Y, Z in this order:

Thread A: X, Z, Y
Thread B: Y, Z, X

where the cctor for X in thread A invokes the cctor for Z and for Y, and the
cctor for Y in thread B invokes the cctor for Z and X, then we can get
into a situation where A and B both start the cctors for X and Y (so
they will be in the type_initialization_hash for those two classes)
and then both will try to init Z.  In that case it could be that A
will be responsible for initializing Z and B will block.  Then A could
finish initializing Z but B may not have woken up yet (and so it will
be in blocked_thread_hash waiting for Z).  At that point A (who is at
this point already need to init Y) may think that it can wait for B to
finish initializing Y.  That is we get to
`mono_coop_cond_timedwait (&lock->cond, &lock->mutex, timeout_ms)`
with "lock" being the lock for `Y`.  But in fact thread B will not be
able to complete initializing Y because it will attempt to init X next
- but meanwhile we did not indicate that thread A is blocked.
As a result in thread A the timed wait will eventually timeout.  And
in this case we need to go back to the top and now correctly detect
that A is waiting for Y and B is waiting for X.  (At that point there
is a cctor deadlock and ECMA rules allow one of the threads to return
without calling the cctor)

The old code here used to do an infinite wait:
  while (!lock->done)
    mono_coop_cond_wait (&lock->cond, &lock->mutex)

This cannot succeed because "lock" (in thread A it's the lock for Y)
will not be signaled since B (who is supposed to init Y) will instead
block on the cctor for X.

Fixes #93778

* Add test case

* remove prototyping log messages

* disable mt test on wasm

* better issues.target exclusion

* code review feedback


Co-authored-by: Aleksey Kliger <alklig@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Nov 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mono] Deadlock on TypeInitializationLock in mono_runtime_class_init_full
3 participants