Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to 3.1: Fix allocation of RuntimeTypeCache GC handle #28025

Merged

Conversation

@janvorli
Copy link
Member

janvorli commented Mar 9, 2020

When there is a race calling RuntimeType.InitializeCache, each of the
racing threads creates a new GC handle using
new
RuntimeTypeHandle(this).GetGCHandle(GCHandleType.WeakTrackResurrection);
This ends up calling RuntimeTypeHandle::GetGCHandle native method that
adds the allocated handle into the handle cleanup list of the
AssemblyLoaderAllocator specific for the runtime type.
All but the winning thread then call GCHandle.InternalFree on the just
allocated handle. That frees the handle, but leaves it on the cleanup
list of the loader allocator. The same handle can be later allocated for
some
other purpose. When the AssemblyLoaderAllocator is being destroyed, all
the handles on the cleanup list are destroyed too. So it destroys also
the handle that was left on the cleanup list incorrectly. That can cause
all kinds of hard to diagnose issues, like the
dotnet/runtime#32171.

This change fixes it by adding a FreeGCHandle method on the
RuntimeTypeHandle that besides freeing the handle also removes it from
the cleanup list of the related AssemblyLoadContext.

Customer impact

Hard to diagnose crashes in the runtime caused by closing of random
GC handles. The customer that has reported this issue was using
collectible assemblies and it was resulting in collecting
LoaderAllocator that was still in use and it lead to crashes at various
places.

Regression?

Yes, it was introduced in 3.0. In 2.1 and 2.2, the thread that loses
the race destroys the handle only if the type was not in a collectible
assembly. Since the non-collectible assemblies LoaderAllocator is never
destroyed, the handles were never cleaned up and so no problem could occur.
It was introduced in #21737

Testing

Customer affected by the issue heavily has tested a fixed version and
reported the issue doesn't occur anymore.

Risk

Low, the new code is executed at single place once per process runtine
only when a thread races for allocating the GC handle with another one
and loses the race.

When there is a race calling RuntimeType.InitializeCache, each of the
racing threads creates a new GC handle using
new
RuntimeTypeHandle(this).GetGCHandle(GCHandleType.WeakTrackResurrection);
This ends up calling RuntimeTypeHandle::GetGCHandle native method that
adds the allocated handle into the handle cleanup list of the
AssemblyLoaderAllocator specific for the runtime type.
All but the winning thread then call GCHandle.InternalFree on the just
allocated handle. That frees the handle, but leaves it on the cleanup
list of the loader allocator. The same handle can be later allocated for
some
other purpose. When the AssemblyLoaderAllocator is being destroyed, all
the handles on the cleanup list are destroyed too. So it destroys also
the handle that was left on the cleanup list incorrectly. That can cause
all kinds of hard to diagnose issues, like the
dotnet/runtime#32171.

This change fixes it by adding a FreeGCHandle method on the
RuntimeTypeHandle that besides freeing the handle also removes it from
the cleanup list of the related AssemblyLoadContext.

 ## Customer impact
Hard to diagnose crashes in the runtime caused by closing of random
GC handles. The customer that has reported this issue was using
collectible assemblies and it was resulting in collecting
LoaderAllocator that was still in use and it lead to crashes at various
places.

 ## Regression?
Yes, it was introduced in 3.0. In 2.1 and 2.2, the thread that loses
the race destroys the handle only if the type was not in a collectible
assembly. Since the non-collectible assemblies LoaderAllocator is never
destroyed, the handles were never cleaned up and so no problem could occur.
It was introduced in #21737

 ##Testing
Customer affected by the issue heavily has tested a fixed version and
reported the issue doesn't occur anymore.

 ## Risk
Low, the new code is executed at single place once per process runtine
only when a thread races for allocating the GC handle with another one
and loses the race.
@janvorli janvorli added this to the 3.1.x milestone Mar 9, 2020
@janvorli janvorli requested a review from jkotas Mar 9, 2020
@janvorli janvorli self-assigned this Mar 9, 2020
@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Mar 9, 2020

Build break...

@jkotas
jkotas approved these changes Mar 9, 2020
Copy link
Member

jkotas left a comment

LGTM once it builds

@jeffschwMSFT

This comment has been minimized.

Copy link
Member

jeffschwMSFT commented Mar 9, 2020

Approved, I will seek approval for our May release

@jamshedd

This comment has been minimized.

Copy link
Member

jamshedd commented Mar 10, 2020

Approved for 3.1.4.

@jeffschwMSFT jeffschwMSFT modified the milestones: 3.1.x, 3.1.4 Mar 10, 2020
@Anipik Anipik merged commit 0791a4a into dotnet:release/3.1 Mar 25, 2020
35 checks passed
35 checks passed
WIP Ready for review
Details
coreclr-ci Build #20200309.2 had test failures
Details
coreclr-ci (Linux arm checked) Linux arm checked succeeded
Details
coreclr-ci (Linux arm64 checked) Linux arm64 checked succeeded
Details
coreclr-ci (Linux arm64 release) Linux arm64 release succeeded
Details
coreclr-ci (Linux x64 checked) Linux x64 checked succeeded
Details
coreclr-ci (Linux_musl x64 checked) Linux_musl x64 checked succeeded
Details
coreclr-ci (Linux_musl x64 release) Linux_musl x64 release succeeded
Details
coreclr-ci (Linux_rhel6 x64 release) Linux_rhel6 x64 release succeeded
Details
coreclr-ci (OSX x64 checked) OSX x64 checked succeeded
Details
coreclr-ci (Test Pri0 CoreFX Linux x64 checked) Test Pri0 CoreFX Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 CoreFX Windows_NT x64 checked) Test Pri0 CoreFX Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux arm checked) Test Pri0 Linux arm checked succeeded
Details
coreclr-ci (Test Pri0 Linux arm64 checked) Test Pri0 Linux arm64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux x64 checked) Test Pri0 Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux_musl x64 checked) Test Pri0 Linux_musl x64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux_musl x64 release) Test Pri0 Linux_musl x64 release succeeded
Details
coreclr-ci (Test Pri0 OSX x64 checked) Test Pri0 OSX x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R Linux x64 checked) Test Pri0 R2R Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R OSX x64 checked) Test Pri0 R2R OSX x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R Windows_NT x64 checked) Test Pri0 R2R Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R Windows_NT x86 checked) Test Pri0 R2R Windows_NT x86 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT arm64 checked) Test Pri0 Windows_NT arm64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT x64 checked) Test Pri0 Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT x86 checked) Test Pri0 Windows_NT x86 checked succeeded
Details
coreclr-ci (Windows_NT arm checked) Windows_NT arm checked succeeded
Details
coreclr-ci (Windows_NT arm release) Windows_NT arm release succeeded
Details
coreclr-ci (Windows_NT arm64 checked) Windows_NT arm64 checked succeeded
Details
coreclr-ci (Windows_NT arm64 release) Windows_NT arm64 release succeeded
Details
coreclr-ci (Windows_NT x64 checked) Windows_NT x64 checked succeeded
Details
coreclr-ci (Windows_NT x64 debug) Windows_NT x64 debug succeeded
Details
coreclr-ci (Windows_NT x64 release) Windows_NT x64 release succeeded
Details
coreclr-ci (Windows_NT x86 checked) Windows_NT x86 checked succeeded
Details
coreclr-ci (Windows_NT x86 debug) Windows_NT x86 debug succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.