Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic dictionary lookup race condition on expansion due to JIT codegen issue #101131

Open
davidwrighton opened this issue Apr 16, 2024 · 4 comments · May be fixed by #101153
Open

Generic dictionary lookup race condition on expansion due to JIT codegen issue #101131

davidwrighton opened this issue Apr 16, 2024 · 4 comments · May be fixed by #101153
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@davidwrighton
Copy link
Member

Description

The situation is that we are in a method compiled for shared generics, which has several specific conditions.

  1. The dictionary layout has a minimum layout size of 4 slots, but an actual layout size of 8. (The exact number is irrelevant, but this is the particular case we have)
  2. There are multiple threads accessing the dictionary of the same type at roughly the same time.
    a. Thread A is attempting to access slot 4
    b. Thread B is attempting to access slot 5
    c. Thread C is attempting to access slot 4
  3. These threads run in the following interleaving
    a. Thread A attempts to access slot 4, determines the value which should be placed in the slot, and gathers the address of the slot which should be updated
    b. Thread B attempts to access slot 5, determines the value which should be placed in the slot, determines that the accessing the slot requires expanding the generic dictionary, expands it, and copies the current state of the slots into the newly expanded dictionary. Notably, since thread A has not written to slot 4, that slot remains as NULL. However, the expanded dictionary is not yet written into the MethodDesc
    c. Thread A fills in slot 4 in the old dictionary with the new value.
    d. Thread C reads the value in slot 4, and determines that it is not NULL, and so commits to the fast path of generic dictionary lookup.
    e. Thread B writes the newly expanded dictionary into the MethodDesc
    f. Thread C then reads the value in slot 4, and since the expanded dictionary does not have a non-NULL value in slot 4, the rest of execution fails unexpectedly.

The issue is caused by incorrect codegen by the JIT, at least in Tier0 mode.

The failing logic is here

/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
00007ffb`b9e1b4e0 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffb`b9e1b4e4 488b4910        mov     rcx,qword ptr [rcx+10h]
00007ffb`b9e1b4e8 4883792800      cmp     qword ptr [rcx+28h],0
00007ffb`b9e1b4ed 7415            je      Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
00007ffb`b9e1b4ef 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffb`b9e1b4f3 488b4910        mov     rcx,qword ptr [rcx+10h]
00007ffb`b9e1b4f7 488b4928        mov     rcx,qword ptr [rcx+28h]
00007ffb`b9e1b4fb 48898d10ffffff  mov     qword ptr [rbp-0F0h],rcx
00007ffb`b9e1b502 eb1a            jmp     Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
00007ffb`b9e1b504 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffb`b9e1b508 48bab0cef0b9fb7f0000 mov rdx,7FFBB9F0CEB0h
00007ffb`b9e1b512 e8b9b8665e      call    coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
00007ffb`b9e1b517 48898510ffffff  mov     qword ptr [rbp-0F0h],rax
00007ffb`b9e1b51e 488b8d10ffffff  mov     rcx,qword ptr [rbp-0F0h]

The problem is that the dictionary returned by
00007ffb`b9e1b4e4 488b4910 mov rcx,qword ptr [rcx+10h]
Can be updated by another thread, which can cause the actual dictionary entry to be resolved without having a proper NULL check.

There are 2 possibilities for valid code here.
Either we could generate code that looks like…

/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
mov     rcx,qword ptr [rbp+10h]
mov     rcx,qword ptr [rcx+10h]
cmp     qword ptr [rcx+28h],0
je      Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
mov     rcx,qword ptr [rcx+28h]
mov     qword ptr [rbp-0F0h],rcx
jmp     Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
mov     rcx,qword ptr [rbp+10h]
mov rdx,7FFBB9F0CEB0h
call    coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
mov     qword ptr [rbp-0F0h],rax
mov     rcx,qword ptr [rbp-0F0h]

Where we only read the dictionary pointer once.

OR

/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
mov     rcx,qword ptr [rbp+10h]
mov     rcx,qword ptr [rcx+10h]
mov     rcx,qword ptr [rcx+28h]
cmp     rcx,0
je      Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
mov     qword ptr [rbp-0F0h],rcx
jmp     Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
mov     rcx,qword ptr [rbp+10h]
mov rdx,7FFBB9F0CEB0h
call    coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
mov     qword ptr [rbp-0F0h],rax
mov     rcx,qword ptr [rbp-0F0h]

Where we read the dictionary entry once.

Reproduction Steps

Run code which uses many threads, on very subtly different code paths. This particular failure happens in Roslyn, but requires the machine running the code to be running under a tremendous amount of stress.

Expected behavior

Generic dictionary should be able to expand without triggering a crash.

Actual behavior

Under extremely rare conditions it will fail.

Regression?

No, this failing behavior has been in place since 2020.

Known Workarounds

No response

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 16, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 16, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Apr 16, 2024
@AndyAyersMS
Copy link
Member

FYI @dotnet/jit-contrib

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Apr 16, 2024
@JulieLeeMSFT
Copy link
Member

@jakobbotsch, PTAL with a high priority.

@EgorBo
Copy link
Member

EgorBo commented Apr 16, 2024

I think I can take a look since I touched this logic recently

@EgorBo EgorBo assigned EgorBo and unassigned jakobbotsch Apr 16, 2024
@EgorBo EgorBo linked a pull request Apr 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants