Skip to content

Adjust how we access thread ID for header thin locks#124878

Open
jkoritzinsky wants to merge 1 commit intodotnet:mainfrom
jkoritzinsky:thin-lock-thread-memory-access
Open

Adjust how we access thread ID for header thin locks#124878
jkoritzinsky wants to merge 1 commit intodotnet:mainfrom
jkoritzinsky:thin-lock-thread-memory-access

Conversation

@jkoritzinsky
Copy link
Member

Pass the Thread* object instead of the thread ID and only compute thread ID in the paths that need it to convince the compiler to better separate memory accesses for better pipelining.

This brings the emitted assembly of the .NET 11 implementation closer to the .NET 10 implementation, which I think should be enough to get performance back (as the assembly is darn near identical at this point, arguably better for the new implementation).

Pass the `Thread*` object instead of the thread ID and only compute thread ID in the paths that need it to convince the compiler to better separate memory accesses for better pipelining.

This brings the emitted assembly of the .NET 11 implementation closer to the .NET 10 implementation, which I think should be enough to get performance back (as the assembly is darn near identical at this point, arguably better for the new implementation).
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

@jkoritzinsky
Copy link
Member Author

@MihuBot benchmark System.Collections.TryAddGiventSize

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the thin lock implementation by adjusting how thread IDs are accessed in AcquireHeaderThinLock and ReleaseHeaderThinLock. Instead of passing a pre-computed thread ID (DWORD tid), these methods now accept a Thread* pointer and only call GetThreadId() in code paths where the thread ID is actually needed. This change helps the compiler better separate memory accesses for improved CPU pipelining, bringing the .NET 11 implementation's generated assembly closer to .NET 10's performance characteristics.

Changes:

  • Modified function signatures to accept Thread* pCurThread instead of DWORD tid
  • Moved GetThreadId() calls inside conditional branches that actually use the thread ID
  • Updated call sites to pass GetThread() instead of GetThread()->GetThreadId()

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/coreclr/vm/syncblk.h Updated method signatures for AcquireHeaderThinLock and ReleaseHeaderThinLock to accept Thread* parameter
src/coreclr/vm/syncblk.inl Modified implementations to defer GetThreadId() calls to only the code paths that need the thread ID value
src/coreclr/vm/comsynchronizable.cpp Updated call sites to pass GetThread() instead of computing thread ID upfront

@MihuBot
Copy link

MihuBot commented Feb 26, 2026

System.Collections.TryAddGiventSize_String_
BenchmarkDotNet v0.16.0-custom.20260127.101, Linux Ubuntu 24.04.3 LTS (Noble Numbat)
AMD EPYC 9V74 2.60GHz, 1 CPU, 8 logical and 4 physical cores
  Job-POJOIC : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v4
  Job-TATIAS : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v4
OutlierMode=Default  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MemoryRandomization=Default  MinIterationCount=15
WarmupCount=1
Method Toolchain Count Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 512 6.060 μs 0.0081 μs 1.00 14.38 KB 1.00
Dictionary PR 512 6.024 μs 0.0090 μs 0.99 14.38 KB 1.00
ConcurrentDictionary Main 512 20.499 μs 0.0281 μs 1.00 59.21 KB 1.00
ConcurrentDictionary PR 512 20.536 μs 0.0341 μs 1.00 59.21 KB 1.00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants