Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[3.1] Fix Unix named mutex crash during some race conditions #28039

Closed
wants to merge 2 commits into from

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented May 12, 2020

Port of dotnet/runtime#36268 to 3.1

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the Mutex class.

  • When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
  • There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
  • For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
    • When a process releases its last reference to the mutex
      • If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
      • If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
        • The mutex gets another reference from within the same process
        • The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
    • The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
  • There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fix for dotnet/runtime#34271 in 3.1

Customer impact

  • Random crashes were seen on Unixes
  • When a named mutex is locked and the last reference to it is released from a different thread in any process, the underlying pthread mutex object is destroyed and that has undefined behavior. Crashes were seen where some data associated with the destroyed pthread mutex continues to be used when acquiring another pthread mutex in racy situations, causing a crash because the memory containing the destroyed pthread mutex is unmapped.
  • There was also a race where a thread has acquired the lock of a named mutex, between that thread exiting and abandoning the mutex, and another thread attempting to destroy the mutex

Regression?

No

Testing

  • PAL named mutex tests on Linux and OSX including in stress mode (1 hour) in debug and release
  • System.Threading tests including mutex tests on Linux and OSX with debug and release coreclr
  • PAL, coreclr, and libraries tests

Risk

Low:

  • It is a bit of a corner case, where a lock is acquired and the mutex is disposed without releasing the lock, especially necessary for this issue is that the thread that disposes the mutex is different from the one that holds the lock
  • Since the fix involves extending the lifetime of a mutex in such a case, the underlying shared named mutex may observably live longer than before. That is necessary for correctness, as the behavior there was incorrect before and leading to the crashes.

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class.
- When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
- There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
- For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
  - When a process releases its last reference to the mutex
    - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
    - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
      - The mutex gets another reference from within the same process
      - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
  - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
- There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fixes dotnet/runtime#34271 in 3.1
@janvorli
Copy link
Member

@kouvel can you please update the description with the standard template for porting requests? See e.g. #28036.

@kouvel
Copy link
Member Author

kouvel commented May 12, 2020

update the description with the standard template for porting requests

Done

@janvorli janvorli added the Servicing-consider Issue for next servicing release review label May 14, 2020
@jeffschwMSFT jeffschwMSFT removed the Servicing-consider Issue for next servicing release review label May 18, 2020
@kouvel
Copy link
Member Author

kouvel commented Dec 11, 2020

@jeffschwMSFT my understanding was that this PR was approved for servicing but that it's relatively low in priority and would be taken for servicing at an appropriate time. It doesn't have the approval label and has been open for quite some time, it seems more likely that 3.1 would run out of its servicing timeline before this would be considered. Should we just close this PR and the dependent PR dotnet/corefx#42917? If it is to be considered at some point perhaps it can be reopened for consideration.

@kouvel
Copy link
Member Author

kouvel commented Dec 11, 2020

If it helps I haven't seen any new reports of this issue

@kouvel
Copy link
Member Author

kouvel commented Dec 11, 2020

Closing for now, can reevaluate later if desired

@kouvel kouvel closed this Dec 11, 2020
@kouvel kouvel deleted the NamedMutexFix31 branch December 11, 2020 11:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants