Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutionEngineException in WinForms app (Paint.NET) on .NET 8 RC2, which does not occur in .NET 7 (GCStress) #94579

Closed
rickbrew opened this issue Nov 9, 2023 · 19 comments · Fixed by #94643
Assignees
Labels
area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@rickbrew
Copy link
Contributor

rickbrew commented Nov 9, 2023

Description

I've had two spurious EEE's pop out at me while I was in the debugger over the last week, working on the new version of my app (Paint.NET v5.1) that is targeting .NET 8. I was worried it was due to my new code, some of which is clipboard related (always a bug farm) ...

... But I'm now able to reproduce this 100% with my latest public release (Paint.NET v5.0.11, .NET 7.0.12), when it's retargeted for .NET 8. It has no trouble with .NET 7.

I believe this is a new bug in .NET 8, it's not due to any changes I've made in my code.

Reproduction Steps

You will need access to https://github.com/dotnet/paint.net/ -- just ask and I will add you (for MSFT employees that is)

You will want this branch: https://github.com/dotnet/paint.net/tree/net8_EEE_repro . This is branched from v5.0.11, the latest public release using .NET 7, and then I've done the minimal amount of work to get it running on .NET 8 rc2 (mostly some nullable fixes that are already in my newer v5.1 branch)

  1. You must have long file paths enabled for Windows, which may require a reboot, due to some source generators that end up using super long paths https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
  2. Load up ./paintdotnet.sln in Visual Studio 2022 17.8.0 Preview 7.0.
  3. Due to something I haven't quite yet figured out with respect to project dependencies, you will first have to manually rebuild the AppVersion project. It's at Solution Explorer -> Misc (the first node) -> AppVersion
  4. Now do a full rebuild
  5. Verify that the DOTNET_GCStress=2 environment variable is set up in the Debug properties
  6. Debug menu -> Start Debugging
  7. You will eventually (~20-60 seconds) see an ExecutionEngineException pop up before the app's main window is displayed

Expected behavior

App starts up just fine, albeit very slowly due to DOTNET_GCStress=2

Actual behavior

App is not able to initialize, it dies with an EEE before the main window is displayed. On my system it's usually somewhere in the constructor for the FileMenu class, but there's some random variance as to exactly where it dies.

It usually won't even show a callstack in the debugger, and when it does it will not load symbols for the relevant .NET framework/runtime DLLs:

image

Regression?

Yes. This works fine if you change ./paintdotnet/paintdotnet.csproj to target .NET 7. Just remove the <TargetFramework> element at the top and do a full rebuild (sometimes need to do that twice after changing TargetFramework, as the first time will error due to what I think is a VS or msbuild bug). It will then revert to .NET 7 based on the values in ./TargetFramework.props

Known Workarounds

No response

Configuration

Visual Studio 2022 17.8.0 Preview 7.0
x64
.NET 8 RC2

Happens on both my desktop:
Ryzen 9 7950X
Windows 11 23H2 build 22635.2700

and on my Lenovo P16 laptop:
Intel Core i7-12800HX
Windows 11 23H2 build 22631.2506

Other information

cc @EgorBo who was commenting on this a little over on Discord, and who already has access to the dotnet/paint.net repository. Also @MichalPetryka who recommended me to file the issue

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 9, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 9, 2023
@teo-tsirpanis teo-tsirpanis added area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Nov 9, 2023
@ghost
Copy link

ghost commented Nov 9, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I've had two spurious EEE's pop out at me while I was in the debugger over the last week, working on the new version of my app (Paint.NET v5.1) that is targeting .NET 8. I was worried it was due to my new code, some of which is clipboard related (always a bug farm) ...

... But I'm now able to reproduce this 100% with my latest public release (Paint.NET v5.0.11, .NET 7.0.12), when it's retargeted for .NET 8. It has no trouble with .NET 7.

I believe this is a new bug in .NET 8, it's not due to any changes I've made in my code.

Reproduction Steps

You will need access to https://github.com/dotnet/paint.net/ -- just ask and I will add you (for MSFT employees that is)

You will want this branch: https://github.com/dotnet/paint.net/tree/net8_EEE_repro . This is branched from v5.0.11, the latest public release using .NET 7, and then I've done the minimal amount of work to get it running on .NET 8 rc2 (mostly some nullable fixes that are already in my newer v5.1 branch)

  1. You must have long file paths enabled for Windows, which may require a reboot, due to some source generators that end up using super long paths https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
  2. Load up ./paintdotnet.sln in Visual Studio 2022 17.8.0 Preview 7.0.
  3. Due to something I haven't quite yet figured out with respect to project dependencies, you will first have to manually rebuild the AppVersion project. It's at Solution Explorer -> Misc (the first node) -> AppVersion
  4. Now do a full rebuild
  5. Verify that the DOTNET_GCStress=2 environment variable is set up in the Debug properties
  6. Debug menu -> Start Debugging
  7. You will eventually (~20-60 seconds) see an ExecutionEngineException pop up before the app's main window is displayed

Expected behavior

App starts up just fine, albeit very slowly due to DOTNET_GCStress=2

Actual behavior

App is not able to initialize, it dies with an EEE before the main window is displayed. On my system it's usually somewhere in the constructor for the FileMenu class, but there's some random variance as to exactly where it dies.

It usually won't even show a callstack in the debugger, and when it does it will not load symbols for the relevant .NET framework/runtime DLLs:

image

Regression?

Yes. This works fine if you change ./paintdotnet/paintdotnet.csproj to target .NET 7. Just remove the <TargetFramework> element at the top and do a full rebuild (sometimes need to do that twice after changing TargetFramework, as the first time will error due to what I think is a VS or msbuild bug). It will then revert to .NET 7 based on the values in ./TargetFramework.props

Known Workarounds

No response

Configuration

Visual Studio 2022 17.8.0 Preview 7.0
x64
.NET 8 RC2

Happens on both my desktop:
Ryzen 9 7950X
Windows 11 23H2 build 22635.2700

and on my Lenovo P16 laptop:
Intel Core i7-12800HX
Windows 11 23H2 build 22631.2506

Other information

cc @EgorBo who was commenting on this a little over on Discord, and who already has access to the dotnet/paint.net repository

Author: rickbrew
Assignees: -
Labels:

area-GC-coreclr, untriaged

Milestone: -

@mangod9
Copy link
Member

mangod9 commented Nov 9, 2023

the fact that it repros with GCStress indicates its some heap corruption, so possibly something the WinForms team should initially investigate. Is it easy to share a few dumps?

@jkotas jkotas added area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed area-GC-coreclr labels Nov 9, 2023
@rickbrew
Copy link
Contributor Author

rickbrew commented Nov 9, 2023

the fact that it repros with GCStress indicates its some heap corruption, so possibly something the WinForms team should initially investigate. Is it easy to share a few dumps?

Sure; just via Task Manager's right click -> Create memory dump file?

@rickbrew
Copy link
Contributor Author

Here is a DMP file from the paintdotnet.exe process right when it crashed: https://1drv.ms/u/s!Ak4YjacO2C9yktlxwjC_frr4DNymkA?e=5JChdi

Let me know when you have the file so I can un-share/delete it

@mangod9
Copy link
Member

mangod9 commented Nov 10, 2023

As I had suspected the heap seems to be corrupted. Please provide access to the repo, so we can try to repro locally. Alternatively perhaps you could run with COMPLUS_HeapVerify=1 and share some dumps.

@mangod9
Copy link
Member

mangod9 commented Nov 10, 2023

Also adding @AaronRobinsonMSFT since there is ComAwareWeakReference on the stack. Not sure if there have been any recent changes in this space.

00000E66C17CDC8 00007fffee5b6e12 coreclr!ComAwareWeakReferenceNative::HasInteropInfo + 0x32 [D:\a\_work\1\s\src\coreclr\vm\weakreferencenative.cpp:176] ====> Exception Code c0000005 cxr@000000E66C17C080 exr@000000E66C17C570
000000E66C17C320 00007fffee3c64ef coreclr!CrstBase::Leave + 0x13 [D:\a\_work\1\s\src\coreclr\vm\crst.cpp:356], calling ntdll!RtlLeaveCriticalSection
000000E66C17C350 00007fffee432fa5 coreclr!HndResetAgeMap + 0xd1 [D:\a\_work\1\s\src\coreclr\gc\handletable.cpp:835], calling coreclr!CrstBase::Leave [D:\a\_work\1\s\src\coreclr\vm\crst.cpp:338]
000000E66C17C390 00007fffee30750f coreclr!WKS::gc_heap::verify_regions + 0x2f [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:47459], calling coreclr!WKS::heap_segment_rw [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:5143]
000000E66C17C3E0 00007fffee432e70 coreclr!Ref_RejuvenateHandles + 0xa4 [D:\a\_work\1\s\src\coreclr\gc\objecthandle.cpp:1832], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C400 00007fffee302599 coreclr!WKS::gc_heap::thread_final_regions + 0x1e1 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:34485], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C420 00007fffee3db873 coreclr!SyncBlockCache::GCDone + 0x4f [D:\a\_work\1\s\src\coreclr\vm\syncblk.cpp:1332], calling ntdll!LdrpDispatchUserCallTarget
000000E66C17C450 00007fffee30bd92 coreclr!WKS::gc_heap::make_unused_array + 0xa2 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:35344], calling coreclr!WKS::gc_heap::clear_card_for_addresses [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:40432]
000000E66C17C490 00007fffee30b840 coreclr!WKS::gc_heap::thread_gap + 0x74 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:35266], calling coreclr!WKS::allocator::thread_item [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:16307]
000000E66C17C4C0 00007fffee309f7b coreclr!WKS::gc_heap::plan_phase + 0xfbb [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:34042], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C500 00007fffee39b878 coreclr!GCToEEInterface::AnalyzeSurvivorsFinished + 0x74 [D:\a\_work\1\s\src\coreclr\vm\gcenv.ee.cpp:1693], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C6E0 00007fffee31a110 coreclr!GCToOSInterface::GetMemoryStatus + 0x88 [D:\a\_work\1\s\src\coreclr\gc\windows\gcenv.windows.cpp:1056], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C700 00007fffee45029e coreclr!HndCountAllHandles + 0xd2 [D:\a\_work\1\s\src\coreclr\gc\handletable.cpp:1025], calling coreclr!HndCountHandles [D:\a\_work\1\s\src\coreclr\gc\handletable.cpp:924]
000000E66C17C730 00007fffee3075c8 coreclr!WKS::gc_heap::generation_size + 0x20 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:43494], calling coreclr!WKS::heap_segment_rw [D:\a\_work\1\s\src\coreclr\gc\gc.cpp:5143]
000000E66C17C750 00007fffee3c64ef coreclr!CrstBase::Leave + 0x13 [D:\a\_work\1\s\src\coreclr\vm\crst.cpp:356], calling ntdll!RtlLeaveCriticalSection
000000E66C17C760 00007fffee307247 coreclr!WKS::GCHeap::UpdatePostGCCounters + 0x1b7 [D:\a\_work\1\s\src\coreclr\gc\gcee.cpp:186], calling kernel32!QueryPerformanceCounterStub
000000E66C17C780 00007fffee39db95 coreclr!ThreadSuspend::UnlockThreadStore + 0x61 [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp:1947], calling coreclr!CrstBase::Leave [D:\a\_work\1\s\src\coreclr\vm\crst.cpp:338]
000000E66C17C7B0 00007fffee39db0d coreclr!ThreadSuspend::ResumeRuntime + 0xc5 [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp:3826], calling coreclr!StressLog::LogOn [D:\a\_work\1\s\src\coreclr\utilcode\stresslog.cpp:834]
000000E66C17C7C0 00007fffee358360 coreclr!MethodDesc::SetCodeEntryPoint + 0x124 [D:\a\_work\1\s\src\coreclr\vm\method.cpp:3232], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C820 00007fffee39d87a coreclr!VirtualCallStubManager::ReclaimAll + 0x3a [D:\a\_work\1\s\src\coreclr\vm\virtualcallstub.cpp:818], calling coreclr!VirtualCallStubManagerIterator::Next [D:\a\_work\1\s\src\coreclr\vm\virtualcallstub.cpp:3700]
000000E66C17C830 00007ff941a87aed KERNELBASE!SetEvent + 0xd, calling ntdll!NtSetEvent
000000E66C17C860 00007fffee39ccdc coreclr!ThreadSuspend::RestartEE + 0x148 [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp:5638], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C880 00007fffee3c64ef coreclr!CrstBase::Leave + 0x13 [D:\a\_work\1\s\src\coreclr\vm\crst.cpp:356], calling ntdll!RtlLeaveCriticalSection
000000E66C17C890 00007fffee403076 coreclr!MDInternalRO::GetSigOfMethodDef + 0x86 [D:\a\_work\1\s\src\coreclr\md\runtime\mdinternalro.cpp:1403], calling ntdll!LdrpDispatchUserCallTarget
000000E66C17C8B0 00007fffee392906 coreclr!LoaderHeap::RealAllocMemUnsafe + 0x146 [D:\a\_work\1\s\src\coreclr\inc\loaderheap.h:618], calling coreclr!__security_check_cookie [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\gs\amd64\amdsecgs.asm:45]
000000E66C17C8D0 00007fffee35fa02 coreclr!MethodDesc::GetSig + 0xb2 [D:\a\_work\1\s\src\coreclr\vm\method.cpp:410], calling ntdll!LdrpDispatchUserCallTarget
000000E66C17C900 00007ff941a87aed KERNELBASE!SetEvent + 0xd, calling ntdll!NtSetEvent
000000E66C17C910 00007fffee360b83 coreclr!FindECIndexForMethod + 0x63 [D:\a\_work\1\s\src\coreclr\vm\ecall.cpp:284], calling ucrtbase!strcmp
000000E66C17C990 00007fffee3d19f1 coreclr!MethodDesc::DoBackpatch + 0x2f1 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:86], calling coreclr!MethodDesc::IsEligibleForReJIT [D:\a\_work\1\s\src\coreclr\vm\method.hpp:1084]
000000E66C17C9F0 00007fffee36b521 coreclr!ECall::GetFCallImpl + 0xd5 [D:\a\_work\1\s\src\coreclr\vm\ecall.cpp:516], calling coreclr!CrstBase::Leave [D:\a\_work\1\s\src\coreclr\vm\crst.cpp:338]
000000E66C17CA20 00007fffee4041a9 coreclr!Thread::RareEnablePreemptiveGC + 0x71 [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp:2407], calling coreclr!StressLog::LogOn [D:\a\_work\1\s\src\coreclr\utilcode\stresslog.cpp:834]
000000E66C17CA60 00007fffee3d13b1 coreclr!MethodDesc::DoPrestub + 0x151 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:2740], calling coreclr!MethodDesc::DoBackpatch [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:77]
000000E66C17CA90 00007fffee3d13b1 coreclr!MethodDesc::DoPrestub + 0x151 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:2740], calling coreclr!MethodDesc::DoBackpatch [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:77]
000000E66C17CAA0 00007fffee304678 coreclr!GCHeapUtilities::IsGCInProgress + 0x24 [D:\a\_work\1\s\src\coreclr\vm\gcheaputilities.h:93], calling ntdll!LdrpDispatchUserCallTarget
000000E66C17CAD0 00007fffee304527 coreclr!Thread::RareDisablePreemptiveGC + 0x4f [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp:2127], calling coreclr!GCHeapUtilities::IsGCInProgress [D:\a\_work\1\s\src\coreclr\vm\gcheaputilities.h:89]
000000E66C17CB20 00007ff9440c3330 ntdll!RtlSetLastWin32Error + 0x40, calling ntdll!_security_check_cookie
000000E66C17CB30 00007fffee4041a9 coreclr!Thread::RareEnablePreemptiveGC + 0x71 [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp:2407], calling coreclr!StressLog::LogOn [D:\a\_work\1\s\src\coreclr\utilcode\stresslog.cpp:834]
000000E66C17CB70 00007fffee3d11b4 coreclr!PreStubWorker + 0x284 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:2444], calling kernel32!SetLastErrorStub
000000E66C17CD10 00007fffee45d785 coreclr!ThePreStub + 0x55 [D:\a\_work\1\s\src\coreclr\vm\amd64\ThePreStubAMD64.asm:21], calling coreclr!PreStubWorker [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp:2342]
000000E66C17CDC0 00007fffec526cd8 (MethodDesc 00007fff90fcb5c0 + 0x18 System.ComAwareWeakReference.ObjectToComWeakRef(System.Object, Int64 ByRef)), calling System_Private_CoreLib + 0x66be10
000000E66C17CDF0 00007fffec5270e9 (MethodDesc 00007fff90fcb0a0 + 0x19 System.ComAwareWeakReference+ComInfo.FromObjectSlow(System.Object)), calling System_Private_CoreLib + 0x66be10
000000E66C17CE40 00007fffec8632d2 (MethodDesc 00007fff8ea1a450 + 0x92 System.WeakReference`1[[System.__Canon, System.Private.CoreLib]].Create(System.__Canon, Boolean)), calling System_Private_CoreLib + 0x66be10
00000E66C17CE90 00007fffe215fa78 (MethodDesc 00007fff90b62d18 + 0x38 System.Windows.Forms.LayoutEventArgs..ctor(System.ComponentModel.IComponent, System.String)), calling 00007fff8e9b9b18
000000E66C17CEE0 00007fffe204a7a9 (MethodDesc 00007fff8f43bfb0 + 0x29 System.Windows.Forms.Control.System.Windows.Forms.Layout.IArrangedElement.PerformLayout(System.Windows.Forms.Layout.IArrangedElement, System.String)), calling 00007fff90b705e8
000000E66C17CEF0 00007fffe22b3455 (MethodDesc 00007fff90b61160 + 0x25 System.Windows.Forms.Layout.CommonProperties.xClearPreferredSizeCache(System.Windows.Forms.Layout.IArrangedElement)), calling 00007fff8fb5c108
000000E66C17CF00 00007fffe22b6a93 (MethodDesc 00007fff90c347f0 + 0x23 System.Windows.Forms.Layout.DefaultLayout.InitLayoutCore(System.Windows.Forms.Layout.IArrangedElement, System.Windows.Forms.BoundsSpecified)), calling 00007fff90c28150

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Nov 11, 2023

@VSadov Did change the weak reference logic to be mostly managed last year - see #77196. Since this is failing in the weak reference logic specifically, I think that is likely the best place to start.

@VSadov
Copy link
Member

VSadov commented Nov 11, 2023

Some frames in the native stack do not look right, perhaps some symbol mismatc. However this was entered from ComAwareWeakReference, so it could be related to changes there.

@VSadov
Copy link
Member

VSadov commented Nov 11, 2023

Let me know when you have the file so I can un-share/delete it

@rickbrew I have the file.

@rickbrew
Copy link
Contributor Author

@mangod9 and @VSadov, you now have access to the repository and branch if you need it (see link and instructions above)

@jkotas
Copy link
Member

jkotas commented Nov 11, 2023

@VSadov A possible explanation of the crash is that the syncblock was collected between the PossiblyComObject check and HasInteropInfo check that assumes that the syncblock exists. It is invalid an assumption. Syncblocks that are not marked as SyncBlockPrecious can be collected by the GC.

@VSadov
Copy link
Member

VSadov commented Nov 11, 2023

A possible explanation of the crash is that the syncblock was collected between the PossiblyComObject check and HasInteropInfo

Yes, that is the most likely culprit since it happens under GCStress. Thanks for the hint about Precious.

@VSadov
Copy link
Member

VSadov commented Nov 11, 2023

It looks like everything that puts any data in a syncblock will do SetPrecious directly or indirectly.
I.E. setting a hashcode, interop info, m_BSTRTrailByte, etc... will all call SetPrecious().
Or it could be more indirect like the AwareLock may try using IncrementTransientPrecious, seemingly avoiding setting precious, but few lines after still calls AllocLockSemEvent, which will set the bit.

The whole scheme with precious and transirentPresious seems somewhat dubious. If you need a syncblock, how likely you will not need it again? Maybe there is a scenario where it is more useful than it appears at first glance. It is hard to rule out though that there are opportunities under GC stress for a syncblock to be created, collected by GC, and recreated again when needed next time or before the data is actually set.

Anyways. Since setting interop info would set the precious bit, our way of detecting a possible COM object by the presence of syncblock is valid.

bool SyncBlock::SetInteropInfo(InteropSyncBlockInfo* pInteropInfo)
{
WRAPPER_NO_CONTRACT;
SetPrecious();

If there is interop info, then there must be a syncblock.
I think a fix that would be the least disturbing to the rest of that code would be simply checking again in HasInteropInfo if the syncblock is still there (instead of just having an assert). If the syncblock has disappeared it is not our case and we can return false from the helper.

I will check if such fix is sufficient for the Paint.net.

I wonder if it is possible to have a simpler test-like repro for this bug.

@VSadov
Copy link
Member

VSadov commented Nov 12, 2023

I've managed to capture a time travel trace for this failure, so now we can know the root cause. There is code in monitor lock that creates syncblocks if acquiring via the thinlock was unsuccessful, but does not make them precious, so it keeps creating these syncblocks and GC keeps collecting them.

In my trace:

  • System.Windows.Forms.Control.RecreateHandleCore takes a lock on our object, so the object gets a syncblock
  • WeakReference<T>.Create sees that the syncblock is present, assumes there could be interop info, calls ObjectToComWeakRef to be sure
  • GC happens, collects the syncblock
  • we eventually get into HasInteropInfo, assuming the syncblock is still there and crash on that.

@VSadov
Copy link
Member

VSadov commented Nov 12, 2023

I have tried the fix proposed above and I no longer see the crash. (without the fix it always reproes for me)

I've tried several times - it takes quite a while (since it is with GCStress=2), but eventually we get to the UI showing up without seeing EE exceptions.

I will stage a PR for the main branch and then we can port the fix to 8.0.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 12, 2023
@AaronRobinsonMSFT
Copy link
Member

There is code in monitor lock that creates syncblocks if acquiring via the thinlock was unsuccessful, but does not make them precious, so it keeps creating these syncblocks and GC keeps collecting them.

Why not make them precious? This back and forth seems less than ideal.

@VSadov
Copy link
Member

VSadov commented Nov 12, 2023

Why not make them precious? This back and forth seems less than ideal.

Some early-2000 Java VMs experimented with lock-deflation. If a fat lock becomes uncontended, you may try to reduce it back into a thin lock when GC runs. It is however rare that locks change their habits permanently. Once deflated, the lock will likely inflate again. I am not aware if any modern VM does this.

The code in question looks like an attempt to do something similar, but it does not go all the way. It would not deflate a fully inflated lock. It only deflates if it catches a lock in an intermediate stage - when there is a syncblock, but no OS event allocated yet. That is probably why it is rare to see the effects outside of GC stress.

There is also a piece that does not allow thin lock acquires if GC wants to happen, which is nearly always the case under GC stress. That leads to locks inflating even if there is no contention and creates an artificial opportunity for the above "optimization" to happen.

The code could use some clean up, but not in a servicing fix that we'd like to port to 8.0

@ghost ghost removed in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner labels Nov 12, 2023
@jkotas
Copy link
Member

jkotas commented Nov 12, 2023

Reopening for backport

@jkotas jkotas reopened this Nov 12, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 12, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 12, 2023
@AaronRobinsonMSFT AaronRobinsonMSFT removed the untriaged New issue has not been triaged by the area owner label Nov 13, 2023
@AaronRobinsonMSFT AaronRobinsonMSFT added this to the 9.0.0 milestone Nov 13, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 21, 2023
@teo-tsirpanis
Copy link
Contributor

This has been backported to .NET 8.0.1.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants