Reduce contention in UnwindInfoTable seen during high volume LCG method creation by eduardo-vp · Pull Request #128619 · dotnet/runtime

eduardo-vp · 2026-05-27T03:36:23Z

Contributes to #123124.

Changes:

Adds an initial code pointer to m_DynamicCodePointers in LCGMethodResolver. In most cases the list will contain exactly one pointer but if it starts empty, it does a chunk allocation while taking a lock. The initial pointer avoids that unnecessary chunk allocation for most cases.
Adds a publish and pending lock per unwind info table rather than using global locks.
Reduces contention on the publish lock by allowing threads to determine if some other thread is already flushing entries. If it is, it doesn't block waiting for the publish lock - it returns and the flushing thread will handle it by doing another iteration in the flushing loop.
Bumps pending table size to 128 and use qsort instead of a quadratic sort.
Removes entries using binary search instead of linear search. Also, we should now check both the pending and the published table since this removes the fact that an entry is actually published before AddToUnwindInfoTable finishes (it may stay in the pending buffer and get picked up by another thread).
Adds a registration failed flag to early-return in AddToUnwindInfoTable to avoid taking locks unnecessarily.

I used both the original xUnit test with 30k methods mentioned in the issue and a separate benchmark that stresses LCG method creation to test these changes. I mostly used the LCG benchmark to investigate this since .NET 10 runs ~150% slower than .NET 9 so it was easier to spot if changes were helpful or not.

For the measurements, I built from release/9.0, release/10.0 and release/10.0 + the changes in this PR since the goal is to backport to net 10.

LCG Stress Benchmark (300K methods)

LCG stress benchmark code

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq.Expressions;
using System.Threading;
using System.Threading.Tasks;

internal static class Program
{
    private static int Main(string[] args)
    {
        int totalMethods   = args.Length > 0 ? int.Parse(args[0])   : 300_000;
        int testsPerClass  = args.Length > 1 ? int.Parse(args[1])   : 3;
        int threads        = Environment.ProcessorCount;

        Console.WriteLine($"totalMethods={totalMethods} testsPerClass={testsPerClass} threads={threads}");

        var sw = Stopwatch.StartNew();
        Run(totalMethods, threads, testsPerClass);
        sw.Stop();

        Console.WriteLine($"Elapsed: {sw.Elapsed.TotalSeconds:F3} s ");
        return 0;
    }

    private static void Run(int totalMethods, int threads, int testsPerClass)
    {
        Func<int, Func<int, int>> compileBody = k => CompileTestBody(k);
        int classCount = Math.Max(1, totalMethods / testsPerClass);
        var queue = new ConcurrentQueue<int>();
        for (int i = 0; i < classCount; i++) queue.Enqueue(i);

        var workers = new Task[threads];
        for (int w = 0; w < threads; w++)
        {
            workers[w] = Task.Factory.StartNew(() =>
            {
                int seed = 0;
                var classScope = new List<Func<int, int>>(capacity: testsPerClass);

                while (queue.TryDequeue(out int classIndex))
                {
                    classScope.Clear();

                    for (int i = 0; i < testsPerClass; i++)
                    {
                        Func<int, int> formatter = CompileFormatter(seed++);
                        int arg = formatter(classIndex + i);

                        Func<int, int> testBody = compileBody(seed++);
                        testBody(arg);
                        // testBody.DynamicInvoke(arg);

                        classScope.Add(testBody);
                    }

                }
            }, TaskCreationOptions.LongRunning);
        }

        Task.WaitAll(workers);
    }

    private static Func<int, int> CompileFormatter(int k)
    {
        var x = Expression.Parameter(typeof(int), "x");
        Expression body = Expression.ExclusiveOr(x, Expression.Constant(k));
        return Expression.Lambda<Func<int, int>>(body, x).Compile();
    }

    private static Func<int, int> CompileTestBody(int k)
    {
        var x = Expression.Parameter(typeof(int), "x");
        Expression body = Expression.Add(
            Expression.Multiply(x, Expression.Constant(k | 1)),
            Expression.Constant(k));
        body = Expression.Condition(
            Expression.GreaterThan(x, Expression.Constant(0)),
            body,
            Expression.Negate(body));
        return Expression.Lambda<Func<int, int>>(body, x).Compile();
    }
}

Workload	.NET 9	.NET 10	.NET 10 + this PR
Time (s)	5.178	13.116	5.677
vs .NET 9	-	+153 %	+10 %
vs .NET 10	-	-	-57 %

Original regression was ~150%. Still slower than .NET 9 but at the same time I'm dubious if it's a realistic scenario.

Original xUnit Benchmark (30K methods)

Workload	.NET 9	.NET 10	.NET 10 + this PR
Time (s)	2.111	2.802	2.115
vs .NET 9	-	+33 %	+0 %
vs .NET 10	-	-	-25 %

Original regression was ~33%. The original benchmark now runs essentially as fast as .NET 9.

dotnet-policy-service · 2026-05-27T03:37:29Z

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR targets performance/scalability under high-volume dynamic method (LCG) creation by reducing lock contention and avoiding avoidable allocations in CoreCLR’s dynamic method and unwind-info publishing paths.

Changes:

Pre-seeds LCGMethodResolver::m_DynamicCodePointers with an inline “first” node to avoid chunk allocation in the common single-pointer case.
Reworks UnwindInfoTable synchronization to use per-table locks plus a per-table “flush gate” to reduce contention when flushing pending entries.
Increases pending buffer capacity and switches pending sorting and removal operations to more efficient algorithms.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
src/coreclr/vm/dynamicmethod.h	Adds an inline initial `DynamicCodePointer` node and initializes `m_DynamicCodePointers` to it.
src/coreclr/vm/dynamicmethod.cpp	Resets/allocates code pointer records using the inline initial node to avoid allocations under hot locks.
src/coreclr/vm/codeman.h	Expands pending buffer and introduces per-table locks, flush gate, and registration-failed state.
src/coreclr/vm/codeman.cpp	Implements per-table locking/flush gating, pending sort changes, binary-search removal, and lock-free table installation via CAS.

Comments suppressed due to low confidence (1)

src/coreclr/vm/codeman.cpp:500

RemoveFromUnwindInfoTable can incorrectly return after matching a soft-deleted (UnwindData==0) entry in the published table. This is problematic now that we also search the pending buffer: if code addresses are reused, a deleted published entry may still cover the new method’s Begin/End range, causing this path to return early and skip removing the real pending entry. Consider requiring UnwindData != 0 for the published-table match (or otherwise treating deleted matches as not-found so the pending buffer check can run).

        if (lo > 0)
        {
            ULONG i = lo - 1;
            if (relativeEntryPoint < RUNTIME_FUNCTION__EndAddress(&unwindInfo->pTable[i], unwindInfo->iRangeStart))
            {
                if (unwindInfo->pTable[i].UnwindData != 0)
                    unwindInfo->cDeletedEntries++;
                unwindInfo->pTable[i].UnwindData = 0;        // Mark the entry for deletion
                STRESS_LOG1(LF_JIT, LL_INFO100, "RemoveFromUnwindInfoTable Removed entry 0x%x\n", i);

jkotas · 2026-05-27T04:40:47Z

+
+    // This thread attempts to become the sole flusher for this table by taking
+    // the flush gate. If it wins, publish the pending entries, then release the gate,
+    // re-check if more entries arrived and loop if so.


Do the other threads need to wait for their entries to be flushed?

I do not see any code that does that.

After AddToUnwindInfoTable finishes, they may need to wait a minimum amount of time. A thread that finds m_flushInProgress set to 1 (there's a flushing thread) just returns and waits for the flushing thread to just loop again and pick up its entries. Some threads essentially defer the work to another thread in this version as opposed to every thread taking the lock and publishing (which generated a lot of contention).

A thread that finds m_flushInProgress set to 1 (there's a flushing thread) just returns and waits for the flushing thread to just loop again

What's the line of code that makes it wait?

Ah it doesn't actually wait/block - I meant there's some time until it gets published. The thread just continues.

It means that the tracing stacktraces on the thread that continues are going to be broken until the flushing thread catches up. This change is regressing tracing reliability.

korchak-aleksandr · 2026-05-27T13:54:00Z

@eduardo-vp you still need minimal repro from us on Linux? based on your comment

Eduardo Velarde added 6 commits May 25, 2026 22:42

Preallocate pointer

f9a0941

Locks per table

3fa7e5b

Reduce contention in FlushPendingEntries

1a46f70

Fix build

27d6a6d

Remove from unwind info table with binary search + bump cPendingMaxCount

d584d31

Add registration failed field and fix entries removal

dd49f0a

eduardo-vp self-assigned this May 27, 2026

Copilot AI review requested due to automatic review settings May 27, 2026 03:36

eduardo-vp added tenet-performance Performance related issue area-VM-coreclr labels May 27, 2026

Copilot started reviewing on behalf of eduardo-vp May 27, 2026 03:36 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Copilot feedback

f6c3524

eduardo-vp requested review from AaronRobinsonMSFT and jkotas May 27, 2026 04:25

jkotas reviewed May 27, 2026

View reviewed changes

This was referenced May 27, 2026

XHarness package install failure on iOS due to devicectl NSPOSIXErrorDomain error 49 #123796

Open

Multiple Helix work items fail on maccatalyst/tvos CoreCLR Release #126460

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce contention in UnwindInfoTable seen during high volume LCG method creation#128619

Reduce contention in UnwindInfoTable seen during high volume LCG method creation#128619
eduardo-vp wants to merge 7 commits into
dotnet:mainfrom
eduardo-vp:xunit-reg-windows

eduardo-vp commented May 27, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service Bot commented May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jkotas May 27, 2026

Uh oh!

eduardo-vp May 27, 2026 •

edited

Loading

Uh oh!

jkotas May 27, 2026

Uh oh!

eduardo-vp May 27, 2026

Uh oh!

jkotas May 27, 2026

Uh oh!

korchak-aleksandr commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eduardo-vp commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service Bot commented May 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jkotas May 27, 2026

Choose a reason for hiding this comment

Uh oh!

eduardo-vp May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas May 27, 2026

Choose a reason for hiding this comment

Uh oh!

eduardo-vp May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jkotas May 27, 2026

Choose a reason for hiding this comment

Uh oh!

korchak-aleksandr commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eduardo-vp commented May 27, 2026 •

edited

Loading

eduardo-vp May 27, 2026 •

edited

Loading