Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash caused by orphaned locks #44071

Closed
kevingosse opened this issue Oct 30, 2020 · 3 comments · Fixed by #107168
Closed

Crash caused by orphaned locks #44071

kevingosse opened this issue Oct 30, 2020 · 3 comments · Fixed by #107168
Assignees
Labels
area-System.Threading bug in-pr There is an active PR which will close this issue when it is merged tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@kevingosse
Copy link
Contributor

I just run into an issue on .NET Framework, but it impacts .NET Core as well. In both cases, it causes the app to crash with an access violation.

Repro code:

static void Main(string[] args)
{
    var locks = new object[4];

    for (int i = 0; i < locks.Length; i++)
    {
        locks[i] = new object();
    }

    for (int i = 0; i < 4; i++)
    {
        var thread = new Thread(state =>
        {
            Monitor.Enter(locks[(int)state]);
        });

        thread.Start(i);
        thread.Join();
    }

    // Force the thread to be collected, orphaning the lock
    GC.Collect(2, GCCollectionMode.Forced, true, true);
    GC.WaitForPendingFinalizers();
    GC.Collect(2, GCCollectionMode.Forced, true, true);

    // Promote to a syncblock
    for (int i = 0; i < locks.Length; i++)
    {
        _ = locks[i].GetHashCode();
    }

    // Trigger the crash
    for (int i = 0; i < 4; i++)
    {
        var thread = new Thread(state =>
        {
            _ = Monitor.IsEntered(locks[(int)state]);
        });

        thread.Start(i);
        thread.Join();
    }
}

First, a thread locks an object. A thin-lock is used. The thread dies without releasing the lock.
Later, the thin-lock is promoted to a syncblock (either because of a call to GetHashCode, or because another thread tries to acquire the lock). When it does, the runtime tries to map the thin-lock id to the address of the thread: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/syncblk.cpp#L2214
If the cleaned up thread was the last in the map, it causes the IdToThreadWithValidation method to return NULL (that's the if ((size_t)result <= m_idToThreadCapacity) check) :

Thread *IdToThreadWithValidation(DWORD id)
{
WRAPPER_NO_CONTRACT;
CrstHolder ch(&m_Crst);
Thread *result = NULL;
if (id <= m_highestId)
result = m_idToThread[id];
// m_idToThread may have Thread*, or the next free slot
if ((size_t)result <= m_idToThreadCapacity)
result = NULL;
_ASSERTE(result == NULL || ((size_t)result & 0x3) == 0 || ((Thread*)result)->GetThreadId() == id);
return result;
}

If it does, then GetSyncBlock sets the thread address in the syncblock to -1:

if (pThread == NULL)
{
// The lock is orphaned.
pThread = (Thread*) -1;
}

Later, another thread tries to acquire the lock. ObjectNative::IsLockHeld is called, which in turn calls GetThreadOwningMonitorLock (https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/syncblk.cpp#L1804). The method fetches the thread address from the syncblock, checks if it's null, but omits to check if it's equal to -1. This causes the access violation when trying to dereference it to fetch the thread id:

Thread* pThread = psb->GetMonitor()->GetHoldingThread();
if(pThread == NULL)
{
*pThreadId = 0;
*pAcquisitionCount = 0;
return FALSE;
}
else
{
*pThreadId = pThread->GetThreadId();
*pAcquisitionCount = psb->GetMonitor()->GetRecursionLevel();
return TRUE;
}

@wahmedswl
Copy link

wahmedswl commented Oct 30, 2020

@kevingosse i guess the only thing left is to fix it, how much detailed it is :-P

@kevingosse
Copy link
Contributor Author

The fix would probably to set pThread to NULL instead of -1. It seems that other places in the code already expect to find NULL for an orphaned lock, so I don't know why -1 was used in this scenario in particular.

@jkotas jkotas added this to the 6.0.0 milestone Oct 30, 2020
@jkotas jkotas added the tenet-reliability Reliability/stability related issue (stress, load problems, etc.) label Oct 30, 2020
@mangod9 mangod9 modified the milestones: 6.0.0, 7.0.0 Jul 20, 2021
@ShreyasJejurkar
Copy link
Contributor

More context and more detail => https://youtu.be/k_tavcIrrss

@mangod9 mangod9 modified the milestones: 7.0.0, 8.0.0 Aug 3, 2022
@kouvel kouvel modified the milestones: 8.0.0, 9.0.0 Aug 10, 2023
@mangod9 mangod9 modified the milestones: 9.0.0, Future Aug 1, 2024
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Threading bug in-pr There is an active PR which will close this issue when it is merged tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants