Skip to content

Support hardware with more than 1024 CPUs#126763

Open
janvorli wants to merge 1 commit intodotnet:mainfrom
janvorli:support-more-than-1024-cpus
Open

Support hardware with more than 1024 CPUs#126763
janvorli wants to merge 1 commit intodotnet:mainfrom
janvorli:support-more-than-1024-cpus

Conversation

@janvorli
Copy link
Copy Markdown
Member

A customer has reported that .NET runtime fails to initialize on machines that have more than 1024 CPUs due to sched_getaffinity being passed the default instance of cpu_set_t that supports max 1024 CPUs and fails if there are more CPUs on the current machine.

This change fixes sched_getaffinity calls to use a dynamically allocated CPU set data structure so that it can support any number of CPUs.

In the GC code, we keep the limit of max 1024 heaps, but the CPU limit is now dynamic. The arrays proc_no_to_heap_no and numa_node_to_heap_map are now dynamically allocated based on the real number of processors configured on the system. Also the AffinitySet was modified to be able to contain affinities for a dynamic number of CPUs.

Several other arrays were originally sized by MAX_SUPPORTED_CPUS, but that was misleading as they are really indexed by heaps. So I've renamed the constant to MAX_SUPPORTED_HEAPS to make it clear that the number of supported CPUs is not limited.

Close #126747

A customer has reported that .NET runtime fails to initialize on machines
that have more than 1024 CPUs due to sched_getaffinity being passed the
default instance of cpu_set_t that supports max 1024 CPUs and fails if
there are more CPUs on the current machine.

This change fixes sched_getaffinity calls to use a dynamically allocated
CPU set data structure so that it can support any number of CPUs.

In the GC code, we keep the limit of max 1024 heaps, but the CPU limit is now
dynamic. The array `proc_no_to_heap_no` is now dynamically allocated based on
the real number of processors configured on the system. Also the AffinitySet
was modified to be able to contain affinities for a dynamic number of CPUs.

Several other arrays were originally sized by MAX_SUPPORTED_CPUS, but that
was misleading as they are really indexed by heaps. So I've renamed the constant
to MAX_SUPPORTED_HEAPS to make it clear that the number of supported CPUs is
not limited.
@janvorli janvorli added this to the 8.0.x milestone Apr 10, 2026
@janvorli janvorli requested a review from jkotas April 10, 2026 21:21
@janvorli janvorli self-assigned this Apr 10, 2026
Copilot AI review requested due to automatic review settings April 10, 2026 21:21
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CoreCLR (PAL, GC, and NativeAOT PAL) to correctly handle Linux machines with >1024 CPUs by avoiding fixed-size cpu_set_t usage and by making GC affinity-related data structures CPU-count-aware while keeping the GC heap limit at 1024.

Changes:

  • Use dynamically-sized CPU affinity sets (CPU_ALLOC / CPU_ALLOC_SIZE) for sched_getaffinity/sched_setaffinity to support >1024 CPUs.
  • Introduce GCToOSInterface::GetMaxProcessorCount() and make AffinitySet dynamically sized (plus rename MAX_SUPPORTED_CPUSMAX_SUPPORTED_HEAPS for clarity).
  • Allocate GC mapping tables based on actual processor capacity (e.g., proc_no_to_heap_no, numa_node_to_heap_map) while retaining the 1024-heap limit.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/coreclr/pal/src/thread/thread.cpp Switch thread-start affinity reset to dynamically-sized cpu_set allocation.
src/coreclr/pal/src/misc/sysinfo.cpp Make logical CPU count retrieval use dynamic cpu_set; add clamping for total CPU count.
src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp Update NativeAOT processor count initialization to use dynamic cpu_set.
src/coreclr/gc/windows/gcenv.windows.cpp Initialize process affinity set dynamically; loop bounds updated to avoid 1024 CPU assumption.
src/coreclr/gc/unix/gcenv.unix.cpp Initialize process affinity set based on configured CPU count; use dynamic cpu_set for affinity enumeration.
src/coreclr/gc/env/gcenv.os.h Rename MAX_SUPPORTED_CPUS→MAX_SUPPORTED_HEAPS; make AffinitySet dynamically allocated; add GetMaxProcessorCount() API.
src/coreclr/gc/interface.cpp Initialize config affinity set with max processor count; handle init failure for NUMA heap mapping.
src/coreclr/gc/gcconfig.cpp Validate HeapAffinitizeRanges against dynamic max CPU count.
src/coreclr/gc/gc.cpp Allocate CPU→heap and NUMA→heap maps dynamically; remove modulo mapping for processor→heap.
src/coreclr/gc/*.cpp / gcpriv.h Replace array sizes indexed by heaps to use MAX_SUPPORTED_HEAPS and adjust signatures accordingly.

Comment on lines 1210 to 1215
// Locate heap_number-th available processor
uint16_t procIndex = 0;
size_t cnt = heap_number;
for (uint16_t i = 0; i < MAX_SUPPORTED_CPUS; i++)
for (uint16_t i = 0; i < g_totalCpuCount; i++)
{
if (g_processAffinitySet.Contains(i))
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetProcessorForHeap, the loop counter is uint16_t but the upper bound is g_totalCpuCount (uint32_t). On systems with 65,536 processors, i will wrap and this loop becomes infinite. Use a wider type (e.g., uint32_t/size_t) for the iterator and only cast to uint16_t when storing the selected processor index.

Copilot uses AI. Check for mistakes.
Comment on lines +180 to +182
}

nrcpus = CPU_COUNT_S(CPU_ALLOC_SIZE(configuredCpuCount), pCpuSet);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sched_getaffinity fails, nrcpus is computed from the zeroed mask and may become 0, which can propagate into GetSystemInfo().dwNumberOfProcessors. Consider falling back to PAL_GetTotalCpuCount() (or at least clamping to 1) when st != 0 to avoid returning an invalid CPU count.

Suggested change
}
nrcpus = CPU_COUNT_S(CPU_ALLOC_SIZE(configuredCpuCount), pCpuSet);
nrcpus = PAL_GetTotalCpuCount();
}
else
{
nrcpus = CPU_COUNT_S(cpuSetSize, pCpuSet);
if (nrcpus < 1)
{
ASSERT("CPU_COUNT_S returned an invalid CPU count\n");
nrcpus = PAL_GetTotalCpuCount();
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +481 to +483
}

count = (uint32_t)CPU_COUNT_S(CPU_ALLOC_SIZE(configuredCpuCount), pCpuSet);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When sched_getaffinity fails, count is still computed from the (empty) cpu set and can become 0, which conflicts with the later _ASSERTE(count > 0) and can result in g_RhNumberOfProcessors being set to 0 in release builds. Consider falling back to GCToOSInterface::GetTotalProcessorCount() or clamping to 1 when st != 0.

Suggested change
}
count = (uint32_t)CPU_COUNT_S(CPU_ALLOC_SIZE(configuredCpuCount), pCpuSet);
count = GCToOSInterface::GetTotalProcessorCount();
}
else
{
count = (uint32_t)CPU_COUNT_S(cpuSetSize, pCpuSet);
}

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +165
int configuredCpuCount = sysconf(_SC_NPROCESSORS_CONF);
if (configuredCpuCount == -1)
{
return false;
}

g_totalCpuCount = cpuCount;

if (!g_processAffinitySet.Initialize(configuredCpuCount))
{
return false;
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC init now hard-fails if sysconf(_SC_NPROCESSORS_CONF) returns -1. That introduces a new runtime-initialization failure mode; other call sites (e.g., PAL sysinfo/thread affinity) fall back to CPU_SETSIZE instead. Consider using a similar fallback here (and also validate configuredCpuCount >= 1) so GC initialization remains robust if sysconf returns an unexpected value.

Copilot uses AI. Check for mistakes.
#define __GCENV_OS_H__

#include <new>
using std::nothrow;
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using std::nothrow; in a widely-included header injects a name into the global namespace for all includers and can cause subtle conflicts. Prefer removing the using-directive from the header and using new (std::nothrow) at the allocation sites instead.

Suggested change
using std::nothrow;

Copilot uses AI. Check for mistakes.
Comment on lines 154 to 189
class AffinitySet
{
static const size_t BitsPerBitsetEntry = 8 * sizeof(uintptr_t);

uintptr_t m_bitset[MAX_SUPPORTED_CPUS / BitsPerBitsetEntry];
uintptr_t *m_bitset = nullptr;
size_t m_bitsetDataSize = 0;

static uintptr_t GetBitsetEntryMask(size_t cpuIndex)
{
return (uintptr_t)1 << (cpuIndex & (BitsPerBitsetEntry - 1));
}

static size_t GetBitsetEntryIndex(size_t cpuIndex)
{
return cpuIndex / BitsPerBitsetEntry;
}

public:

static const size_t BitsetDataSize = MAX_SUPPORTED_CPUS / BitsPerBitsetEntry;
bool Initialize(int cpuCount)
{
m_bitsetDataSize = (cpuCount + BitsPerBitsetEntry - 1) / BitsPerBitsetEntry;
m_bitset = new (nothrow) uintptr_t[m_bitsetDataSize];
if (m_bitset == nullptr)
{
return false;
}

memset(m_bitset, 0, sizeof(uintptr_t) * m_bitsetDataSize);
return true;
}

AffinitySet()
~AffinitySet()
{
memset(m_bitset, 0, sizeof(m_bitset));
delete[] m_bitset;
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AffinitySet now owns a heap-allocated buffer and has a destructor, but it still has the compiler-generated copy/move operations. Any accidental copy (e.g., pass-by-value or assignment) would double-free m_bitset. Consider explicitly deleting copy/move ctor/assignment (or implementing deep copy / using a smart pointer) and also guarding Initialize against being called twice (to avoid leaking the previous buffer).

Copilot uses AI. Check for mistakes.
Comment on lines 1398 to +1416
{
ASSERT("sched_getaffinity failed!\n");
// The sched_getaffinity should never fail for getting affinity of the current process
palError = ERROR_INTERNAL_ERROR;
goto fail;
}
int configuredCpuCount = sysconf(_SC_NPROCESSORS_CONF);
if (configuredCpuCount == -1)
{
// In the unlikely event that sysconf(_SC_NPROCESSORS_CONF) fails, just assume a reasonable default maximum number of CPUs to avoid failing thread creation.
configuredCpuCount = CPU_SETSIZE;
}

st = sched_setaffinity(0, sizeof(cpu_set_t), &cpuSet);
if (st != 0)
{
if (errno == EPERM || errno == EACCES)
cpu_set_t* pCpuSet = CPU_ALLOC(configuredCpuCount);
if (pCpuSet == nullptr)
{
// Some sandboxed or restricted environments (snap strict confinement,
// vendor-modified Android kernels with strict SELinux policy) block
// sched_setaffinity even when passed a mask extracted via sched_getaffinity.
// Treat this as non-fatal — the thread will continue running on any
// available CPU rather than the originally affinitized one.
WARN("sched_setaffinity failed with EPERM/EACCES, ignoring\n");
ASSERT("CPU_ALLOC failed!\n");
palError = ERROR_OUTOFMEMORY;
goto fail;
}
else

size_t cpuSetSize = CPU_ALLOC_SIZE(configuredCpuCount);
CPU_ZERO_S(cpuSetSize, pCpuSet);

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a heap allocation (CPU_ALLOC/CPU_FREE) on every thread start to reset affinity. If thread creation is performance-sensitive in some workloads, consider caching the required cpu_set_t size (and possibly reusing a buffer) to avoid repeated malloc/free on the hot path, while still supporting >1024 CPUs.

Copilot uses AI. Check for mistakes.
ASSERT("sched_setaffinity failed!\n");
ASSERT("sched_getaffinity failed!\n");
CPU_FREE(pCpuSet);
// The sched_getaffinity should never fail for getting affinity of the current process
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We gracefully ignore sched_getaffinity failures in the GC in release builds. Should we do the same here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoreCLR init fails on >1024 CPU Linux due to sched_getaffinity returning EINVAL

3 participants