Enable optimized single-proc allocation helpers for single-proc x86/x64 systems only #27014
Conversation
…64 systems only Use maximum number of processors the process may run on to determine whether it is ok to use single-proc allocation helpers. It is not sufficient to depend on current process affinity since that can change during the process lifetime. Also, the single-proc allocation helpers work well on x86/x64 systems only because of they depend on atomic non-interlocked increment instruction for good performance. Such instruction is available on x86/x64 only. Disable them everywhere else. Fixes #26990
I have mixed feelings about this - this stops using the global alloc context even when the process is affinitized to a single proc which I would think is a way more common scenario than process affinity changing while the process is running.... |
I have built a small micro-benchmark to see the difference between the allocation rate for global alloc context vs. per-thread allocation contexts. The micro-benchmarks runs The average results that I see on my machine (Xeon E5, Windows x64) show:
The helper for global allocation context is 12 instructions vs. the helper for per-thread allocation context is 14 instruction. However, the global allocation context requires 4 writes per allocation (lock, MethodTable*, allocptr, unlock), but the per-thread allocation context is only 2 memory writes per allocation (MethodTable*, allocptr). I think it explains why the per-thread allocation context is slightly faster even though it is more instructions. @Maoni0 Do you have any benchmarks for which you would like to keep the global alloc contexts? Ideally, I would love to get rid of them to make everything simpler. |
@jkotas I would love to make things simpler too but I'm pretty sure some perf benefit was seen with the global alloc context in this scenario - of course that was a long time ago (before my time) so it wouldn't be surprising if things have changed. so I would like to at least some perf investigation been done with GCPerfSim. @andy-ms can fill you in on how to run it. another thing is how this performs on Linux as getting to the per thread alloc context is more expensive on Linux. I'll be OOF starting tomorrow for a week. also CC-ing @sergiy-k and @PeterSolMS who can assist with the perf investigation in my absence. |
Thanks. @andy-ms Could you please send me instructions for how to run GCPerfSim?
We have not implemented the UP allocation helpers at all on Unix. This change affects Windows only. |
GCPerfSim has not identified any issues |
…64 systems only (dotnet#27014) Use maximum number of processors the process may run on to determine whether it is ok to use single-proc allocation helpers. It is not sufficient to depend on current process affinity since that can change during the process lifetime. Also, the single-proc allocation helpers work well on x86/x64 systems only because of they depend on atomic non-interlocked increment instruction for good performance. Such instruction is available on x86/x64 only. Disable them everywhere else. Fixes #26990
…64 systems only (#27014) (#27080) Use maximum number of processors the process may run on to determine whether it is ok to use single-proc allocation helpers. It is not sufficient to depend on current process affinity since that can change during the process lifetime. Also, the single-proc allocation helpers work well on x86/x64 systems only because of they depend on atomic non-interlocked increment instruction for good performance. Such instruction is available on x86/x64 only. Disable them everywhere else. Fixes #26990
Use maximum number of processors the process may run on to determine whether it is ok to use
single-proc allocation helpers. It is not sufficient to depend on current process affinity since
that can change during the process lifetime.
Also, the single-proc allocation helpers work well on x86/x64 systems only because of they depend
on atomic non-interlocked increment instruction for good performance. Such instruction is available
on x86/x64 only. Disable them everywhere else.
Fixes #26990