Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid clearing thread local handles for already unloaded loader #99998

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexey-zakharov
Copy link

@alexey-zakharov alexey-zakharov commented Mar 20, 2024

Motivation:

Fix crash in ThreadLocalBlock::FreeTLM code on thread exit which happens during Assembly unloading when Assembly contains a class with a ThreadStatic variable.

Details:

There seem to be a race condition between LoaderAllocator cleanup during garbage collection and thread locals cleanup on thread exit. Racing stacks are the following:

ALC/LoaderAllocator cleanup

 	coreclr.dll!EEToProfInterfaceImpl::ModuleUnloadStarted(unsigned __int64 moduleId) Line 3735	C++	Symbols loaded.
 	coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *,unsigned __int64),unsigned __int64>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *, unsigned __int64) callback, HRESULT * pHR, unsigned __int64 <args_0>) Line 284	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::DoOneProfilerIteration(ProfilerInfo *) Line 199	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::IterateProfilers(ProfilerCallbackType) Line 207	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::DoProfilerCallback(ProfilerCallbackType) Line 295	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::ModuleUnloadStarted(unsigned __int64) Line 691	C++	Symbols loaded.
 	coreclr.dll!Module::Destruct() Line 662	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ClassLoader::FreeModules() Line 1884	C++	Symbols loaded.
 	coreclr.dll!ClassLoader::~ClassLoader() Line 1946	C++	Symbols loaded.
 	coreclr.dll!Assembly::Terminate(int) Line 311	C++	Symbols loaded.
 	coreclr.dll!Assembly::~Assembly() Line 244	C++	Symbols loaded.
 	coreclr.dll!Assembly::`scalar deleting destructor'(unsigned int __flags)	C++	Symbols loaded.
 	coreclr.dll!DomainAssembly::~DomainAssembly() Line 91	C++	Symbols loaded.
 	coreclr.dll!DomainAssembly::`scalar deleting destructor'(unsigned int __flags)	C++	Symbols loaded.
>	coreclr.dll!LoaderAllocator::GCLoaderAllocators(LoaderAllocator * pOriginalLoaderAllocator) Line 570	C++	Symbols loaded.
 	coreclr.dll!LoaderAllocator::Destroy(QCall::LoaderAllocatorHandle pLoaderAllocator) Line 702	C++	Symbols loaded.
 	coreclr.dll!LoaderAllocator_Destroy(QCall::LoaderAllocatorHandle pLoaderAllocator) Line 718	C++	Symbols loaded.
 	System.Private.CoreLib.dll!00007ffa9c4308f9()	Unknown	No symbols loaded.
 	coreclr.dll!FastCallFinalizeWorker() Line 26	Unknown	Symbols loaded.
 	coreclr.dll!MethodTable::CallFinalizer(Object * obj) Line 4908	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!CallFinalizer(Object *) Line 75	C++	Symbols loaded.
 	coreclr.dll!FinalizerThread::FinalizeAllObjects() Line 110	C++	Symbols loaded.
 	coreclr.dll!FinalizerThread::FinalizerThreadWorker(void * args) Line 354	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ManagedThreadBase_DispatchInner(ManagedThreadCallState *) Line 7222	C++	Symbols loaded.
 	coreclr.dll!ManagedThreadBase_DispatchMiddle(ManagedThreadCallState * pCallState) Line 7266	C++	Symbols loaded.
 	coreclr.dll!ManagedThreadBase_DispatchOuter(ManagedThreadCallState * pCallState) Line 7425	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ManagedThreadBase_NoADTransition(void(*)(void *)) Line 7494	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ManagedThreadBase::FinalizerBase(void(*)(void *)) Line 7513	C++	Symbols loaded.
 	coreclr.dll!FinalizerThread::FinalizerThreadStart(void * args) Line 403	C++	Symbols loaded.
 	kernel32.dll!BaseThreadInitThunk()	Unknown	Symbols loaded.
 	ntdll.dll!RtlUserThreadStart()	Unknown	Symbols loaded.

Thread exit with threadlocal cleanup

>	coreclr.dll!LoaderAllocator::SetHandleValue(unsigned __int64 handle, Object *) Line 992	C++	Symbols loaded.
 	coreclr.dll!LoaderAllocator::FreeHandle(unsigned __int64 handle) Line 884	C++	Symbols loaded.
 	coreclr.dll!ThreadLocalBlock::FreeTLM(unsigned __int64 i, int isThreadShuttingdown) Line 59	C++	Symbols loaded.
 	coreclr.dll!ThreadLocalBlock::FreeTable() Line 93	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!Thread::DeleteThreadStaticData() Line 7704	C++	Symbols loaded.
 	coreclr.dll!Thread::OnThreadTerminate(int holdingLock) Line 2956	C++	Symbols loaded.
 	coreclr.dll!DestroyThread(Thread * th) Line 924	C++	Symbols loaded.
 	coreclr.dll!ThreadNative::KickOffThread(void * pass) Line 239	C++	Symbols loaded.
 	kernel32.dll!BaseThreadInitThunk()	Unknown	Symbols loaded.
 	ntdll.dll!RtlUserThreadStart()	Unknown	Symbols loaded.

with an exception which is thrown with the following details:

Exception thrown: read access violation.
**loaderAllocator** was nullptr.
devenv_iJFpiWBvYe

loaderAllocator value from the LOADERALLOCATORREF loaderAllocator = (LOADERALLOCATORREF)ObjectFromHandle(m_hLoaderAllocatorObjectHandle); call is NULL which makes sense, because finalizer thread just disposed data for some of the same LoaderAllocator assemblies.

Note: enabling COR_PRF_MONITOR_MODULE_LOADS CLR profiler flag increases chances for the race condition as it slows down a bit complete loader destruction.

Changes:

I've looked at the following options as a solution to the problem:

  • Null check for loaderAllocator
  • Wait for GC/Finalizers done before killing thread
  • Skip cleaning up thread locals for unloaded loaders

I think skipping unloaded loader is the best option wrt performance/correctness considerations as loader deletion will cleanup all internal data, we won't access memory which may be already returned to pool/system and we won't lock other threads.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 20, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 20, 2024
@alexey-zakharov
Copy link
Author

@dotnet-policy-service agree company="Unity Technologies"

@alexey-zakharov alexey-zakharov marked this pull request as ready for review March 20, 2024 08:55
@jkotas jkotas added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 20, 2024
@jkotas jkotas requested a review from janvorli March 20, 2024 09:05
}
if (entry->m_hNonGCStatics != NULL)
// Skip assemblies that are already unloaded
if (!pLoaderAllocator->IsUnloaded())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run into a situation where the thread gets rescheduled right after this check, the code in GCLoaderAllocators sets the flag and start destroying the loader allocators, and then the thread running this code gets scheduled back and still hits this crash?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! I think it may happen and ideally FreeHandle call of LoaderAllocator itself should deal with its own state then, perhaps we could use m_crstLoaderAllocator lock, but it would mean some perf hit for the fast pointer path

my ideal expectation would be that since DeleteThreadStaticData is under GCX_COOP in Thread::OnThreadTerminate that should hopefully mean that once we are in DeleteThreadStaticData call we know for sure whether or not LoaderAllocator is alive or not and that state can not change - perhaps IsAlive() should be used as IsUnloaded is set on finalizer thread and not relevant to GCX_COOP 🤔

@davidwrighton
Copy link
Member

I'm in the process of trying to redo how all thread statics work at the moment, could we hold off on this change for a week or two while I see if I can get my larger change checked in. As it happens I had already identified the problem and the general gist of the solution I came up with is the same, although the mechanism is a fair bit different.

@jkotas jkotas added the blocked Issue/PR is blocked on something - see comments label Mar 20, 2024
@simonferquel
Copy link

We (Unity) can confirm this fix is not enough and the race is still here:
image
@davidwrighton would you mind to ping us when your fix is ready ? - also is it planned to be backported to .net 8 in addition to 9 ?

@jkotas
Copy link
Member

jkotas commented Apr 10, 2024

@davidwrighton 's fix is included in #99183.

it is it planned to be backported to .net 8

#99183 is a large refactoring, it is not back portable. The fix for .NET 8 servicing would have to be a small, targeted fix.

@alexey-zakharov
Copy link
Author

The fix for .NET 8 servicing would have to be a small, targeted fix.

@davidwrighton @jkotas do you think using pLoaderAllocator->GetExposedObject(); check in conjunction with GCX_COOP() section would be a better 8.0 targeted approach? (something like this)

the idea there is that if exposed object is alive we can safely free LoaderAllocator handles and while we are doing that it won't be collected and thus finalized

@davidwrighton
Copy link
Member

@alexey-zakharov Yes, that looks pretty reasonable, and should fix the issue. If you create a PR against current runtime, I'd approve that change, and then we can port it to .NET 8. If you can do that soonish, you'll likely beat my larger change in, as that's proving harder to finalize than I hoped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr blocked Issue/PR is blocked on something - see comments community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants