Provide x86 CPUID related information #785

damageboy · 2019-12-12T07:27:35Z

Proposal

Rationale

There's a breadth of information available through the CPUID instruction.
Most of this information is not available today from C#, with the exception of information like
support for specific families of HW Intrinsics which is surfaced through the IsSupported JIT time constant/property in each respective group of intrinsics.

I've lightly touched on this in a different issue, but I think this deserves an issue of its own in light with the specific needs that I have in mind.

To name a few types of information that are not currently provided:

Specific CPU Model (Kaby Lake / Skylake etc.)
Manufacturer
Various key HW info like:
- Layout of the caches:
  - cache line sizes
  - levels
  - associativity
  - TLB levels, sizes
- Topology of the system (NUMA Nodes, physical/logical mapping) and it's mapping to CPU affinity masks

This information could be used to select proper code paths or pre-allocate data structures with sizes that relate to the actual HW at hand.

Proposed API

namespace System.Runtime.Intrinsics.X86
{
    public class X86Base
    {
        public static bool IsSupported { get; }

        public static (int Eax, int Ebx, int Ecx, int Edx) CpuId(int functionId, int subFunctionId = 0);
    }
}

Alternatives and Other Considerations

Given that Value Tuples are currently considered AVOID, an alternative is to match the C++ signature of public static void CpuId(int* cpuInfo, int functionId) and public static void CpuId(int* cpuInfo, int functionid, int subFunctionId)

CPUID is not technically a baseline instruction. It requires its own check using the ID bit of the EFLAGS register. However, only CPUs prior to the 486SL (released in 1992) won't have this feature available. In order to minimize the number of ISAs that only contain 1-2 methods (like Lzcnt), it is proposed to simply consider this part of X86Base rather than it be in its own CpuId class.

It is also important to make note that other, cross-platform/architecture methods of obtaining such information are available today, namely the hwloc native library that can provide most if not all of this and beyond through a normalized native API, and ultimately give us this view of our own systems (screenshot of lstopo generated output, part of the hwloc library):

For reference, here's a sample of cpuid generated to show what's up for discussion.

Relevant issues:
https://github.com/dotnet/corefx/issues/22790
https://github.com/dotnet/coreclr/issues/14388

The text was updated successfully, but these errors were encountered:

tannergooding · 2019-12-12T15:52:05Z

I had thought a bit about this previously and believe an appropriate API would look something like:

// casing of the name here is weird but follows the .NET naming guidelines
public abstract class CpuId
{
    internal CpuId() { }

    // We can't use CpuId for the method as it conflicts with the class
    // name. Some other name may be better but `Query` seems to fit

    // Signature matches the C++ convention for __cpuid and __cpuidex
    // we could also return a tuple or an explicitly typed struct and
    // could reorder the parameters so the "out buffer" is last

    // Using `uint` rather than `int` may also be better for C# since most
    // bit manipulations should be unsigned and there are negative "max values"

    public static void QueryCpu(int* cpuInfo, int functionId);
    public static void QueryCpu(int* cpuInfo, int functionId, int subFunctionId);
}

We could then additionally expose helper properties for certain key information. For example:

public static bool IsAuthenticAMD { get; }
public static bool IsGenuineIntel { get; }

public static uint MaxFunctionId { get; }

// etc

damageboy · 2019-12-12T16:31:18Z

I think this would be a great sort of addition.

Some properties are surfaced all the way as static native types, treated as JIT time constants.
I think that the "minimum" set (in my opinion, that is, of course) would have to include some more data about cache layout, processor layout, and topology, while leaving the really nutty stuff to be queried through the QueryCpu() methods you propose.

I'm assuming you meant for the function to either accept (out uint eax, out uint ebx, out uint ecx, out uint edx) or return them as a value tuple (but value tuples are verboten in CoreFX, IIRC)... right?

I even hacked up something like this a long long time ago :), back when I was popping the stack pointer and re-writing the call-site to add intrinsics "dynamically" (Don't ask):
https://github.com/damageboy/ReJit/blob/master/ReJIT/CPUID.cs

The one "big" thing that would be still "missing" is that this would not allow for usage inside
existing/future attributes controlling layout/alignment, since, well, it isn't a compile time constant...

I guess that is more of an issue at the CLR level where certain things have to be constants at compile-time even though in theory they could be expressed as constants at JIT time (Had that notion of late-bound constant was acceptable/supported).

tannergooding · 2019-12-12T18:04:45Z

(but value tuples are verboten in CoreFX, IIRC)

I believe we resolved the naming issues that prevented them from being used and now merely recommend against their use.

@CarolEidt could probably comment on the best way to expose the signature that can also work efficiently with the register allocator. Namely, cpuid returns its result in rax, rbx, rcx, and rdx. C++ takes a int cpuInfo[4] (which is effectively just a int* that is has space for 4 elements), so we need to return 4 results either via some stack allocated space or a specialized struct/value tuple.

CarolEidt · 2019-12-12T18:44:03Z

A struct/value tuple with 4 elements would be quite problematic to do efficiently (given current JIT capabilities) if it were an actual call. As it is, since intrinsics are expanded early in the JIT, it's just a bit less problematic. The remaining issue is that we'd really like to express this as an instruction that defines multiple (register) values, and the register allocator (indeed, the JIT more broadly) currently doesn't handle that really well (even multi-register returns and longs on arm32 are supported as special cases). That's a recurring issue, though, that we probably should just address. It will probably require additional register allocator work to handle a case where not only does it define multiple registers, but they are fixed. Again, doable but probably a bit tricky.

jkotas · 2019-12-12T18:51:22Z

I do not think CPUID instruction needs to be handled as HW intrinsic by the JIT. It can be a regular call to C or assembly helper that the JIT knows nothing about. CPUID is expensive. You want to call it once and cache the result.

damageboy · 2019-12-12T19:37:18Z

I think @jkotas is right.
CPUID is not the sort of thing you don't need to optimize in general, but rather cache.

The only reason to even consider CPUID as an full-blown intrinsic would be if for some reason .NET/CoreCLR programmers would benefit somehow by CPUID's only other saving grace: the fact that it is a serializing instruction, and the easiest one to issue userspace.

Quoting Intel's SDM Volume 3, section 8.3:

Non-privileged serializing instructions — CPUID, IRET, and RSM.
When the processor serializes instruction execution, it ensures that all pending memory transactions are completed (including writes stored in its store buffer) before it executes the next instruction. Nothing can pass a serializing instruction and a serializing instruction cannot pass any other instruction (read, write, instruction fetch, or I/O). For example, CPUID can be executed at any privilege level to serialize instruction execution with no effect on program flow, except that the EAX, EBX, ECX, and EDX registers are modified.

I might be unimaginative, but I'm having a hard time coming up for a reason to call this from userspace/CoreCLR.

Perhaps other people can think of a real use case for having this as an intrinsic for its serializing property...

tannergooding · 2019-12-12T19:49:00Z

It can be a regular call to C or assembly helper that the JIT knows nothing about

Are you indicating that a CpuId API, if exposed, would be more akin to an InternalCall that invokes __cpuid and __cpuidex?

You want to call it once and cache the result.

Right and there are certain key CPUID properties that it would be beneficial to be more broadly exposed and which are x86 specific. For example, IsGenuineIntel and IsAuthenticAMD checks which would be used in conjunction with other hardware intrinsics.

Will the existing JIT constant folding logic around static readonly fields allow these cases to work?

There are also certain hardware properties which would be beneficial to more broadly expose but which may also be beneficial for other platforms (such as cache line sizes) and I think we have an issue tracking those.

jkotas · 2019-12-12T22:43:13Z

Are you indicating that a CpuId API, if exposed, would be more akin to an InternalCall that invokes __cpuid and __cpuidex?

Right.

Will the existing JIT constant folding logic around static readonly fields allow these cases to work?

Yes, with tirered JITing.

AndyAyersMS · 2019-12-12T22:49:30Z

FWIW the native compiler has to play some tricks to safely emit cpuid for x86, since it kills so many registers. Would be nice if we didn't have to propagate those tricks to the jit.

damageboy · 2020-02-04T05:58:33Z

Continuing from #2251:

The SIMD intrinsics are even harder to get right. I believe that there is negative value in trying to teach people how to conditionalize those using CPUIDs.

I agree that this is far from optimal, I just see this is a lesser of all evils.
I think it is also important to acknowledge that there are many common cases where developers could rely on CPUID results, but would be better served by a more high-level/specific way of expressing their intent. One such immediate case that pops to mind would be cache-line alignment, which would benefit more from a targeted fix in the form of some attribute rather than throwing CPUID at developers expecting them to hack it together.

But the world of intrinsics, in the end, is very dynamic, complicated, and uneven, as the PDEP situation exemplifies (BTW, there is a lot of chatter that AMD is not solving this due to a patent situation regarding the underlying implementation inside the CPU, and it might stay so for a long time), which brings me to your next point:

Right. And I do not believe that we want to add APIs like this to the core platform. If somebody believes that they really need them, they can build them on their own.

You are both right and wrong, in my opinion. At the end of the day, the community would be better served from having a complete and wide API that can describe the underlying HW (I mentioned hwloc in #785, which I've personally used from C# just like you suggest, in the past).

My only counterpoint to this would be that the JIT/platform is in a unique position to mark and treat these values as constants, just like some values, gleaned from the system like .IsSupported, VectorNNN<T>.Count, ProcessorCount to name a few, already are.
Maybe if there was a declarative way of providing the JIT with constants knowing for sure they would be employed to elide branches and for constant folding, we could relegate this to third parties while not missing any optimization opportunities.
I know about static read-only members in this regard, but my impression is/was that they are not a failproof way of ensuring the JIT uses them as constant, am I wrong in this regard?

jkotas · 2020-02-04T06:06:52Z

I know about static read-only members in this regard, but my impression is/was that they are not a failproof way of ensuring the JIT uses them as constant, am I wrong in this regard?

They have small gotcha today in that the value gets inlined only if the static constructor has run, so you may need to explicit trigger the static constructor to get them initialized early. We have plans to improve tiered compilation that will make this unnecessary.

jkotas · 2020-02-04T06:09:22Z

The module initializers will make it easy to trigger the static constructors early.

saucecontrol · 2020-03-04T20:18:39Z

Can we consider @tannergooding's basic CpuId wrapper an API proposal? I'm looking at mitigating performance issues with AVX2 gather instructions, which were slow on Intel pre-Skylake models and have always been (relatively) slow on AMD. I'd be looking for vendor and model info for that.

It would, of course, be nice to have a convenience API that pulls out some of the basic properties in a more readable format, but raw access to CPUID would solve the basic need.

tannergooding · 2020-03-04T22:07:10Z

Is the proposed API simply (which matches C++):

public static void QueryCpu(int* cpuInfo, int functionId);
public static void QueryCpu(int* cpuInfo, int functionId, int subFunctionId);

or do we want to be more explicit:

public static void QueryCpu(int* eax, int* ebx, int* ecx, int* edx, int functionId);

And do we want the outputs as the first or last parameter?

Also, do we want to propose any common entries (IsAuthenticAMD, IsGenuineIntel, or the Cache Line Size) or do we want to say getting and caching those is the responsibility of the user?

tannergooding · 2020-03-04T22:10:23Z

My preference is to match C++, which will also make the code fairly trivial to call with a stackalloc.
It would also be to only propose QueryCpu until we have use cases for any "common" entries the framework may want to more generally expose.
But, I thought I'd ask to get other people's opinions as well.

jkotas · 2020-03-04T22:24:47Z

What about matching the style of the always present ARM intrinsics and make it look like this?

namespace System.Runtime.Intrinsics.X86
{
    public class X86Base
    {
        public static bool IsSupported { get; }
        public ValueTuple<int,int,int,int> CpuId(int functionId, int subFunctionId = 0);
    }
}

It would also be to only propose QueryCpu

+1

tannergooding · 2020-03-04T22:43:46Z

of the always present ARM intrinsics

My only concern is that it isn't actually an "always present" intrinsic. CPUID technically requires its own check (attempting to toggle the ID bit in the EFLAGS register) and so far we've followed that hierarchy even for other ISAs (such as SSE/SSE2) that are "baseline" to our own implementation of the framework/runtime.
I don't think that "in practice" this is a concern, but there have been hobby projects to get code running on Win3.1 or DOS and that may prevent certain usages for anyone wanting to expose these intrinsics in their own runtime implementations.

ValueTuple<int,int,int,int>

I'm fine with taking the proposal for a value tuple return to API review, we just haven't had much precedence for it yet 😄
I might suggest (int EAX, int EBX, int ECX, int EDX) (or some other casing) rather than straight ValueTuple though to help clarify which entry is which.

jkotas · 2020-03-04T22:48:21Z

I don't think that "in practice" this is a concern, but there have been hobby projects

These hobby projects can return false from X86Base.IsSupported.

tannergooding · 2020-03-04T22:57:10Z

These hobby projects can return false from X86Base.IsSupported.

Yes. But alternatively we could just make the class named Cpuid and it would preserve the existing layering and matching of IsSupported checks to the underlying hardware.
In either case, both will likely be brought up during API review (one as the suggested API and the other under the typical alternatives/other notes section) and API review should be able to settle on which is better.

saucecontrol · 2020-03-04T22:58:51Z

    public static class CpuId
    {
        public static bool IsSupported { get; }
        public static (int EAX, int EBX, int ECX, int EDX) CpuId(int functionId, int subFunctionId = 0);
    }

LGTM. I'm keen to see if a ValueTuple return makes it through API review as well.

Things like IsAuthenticAMD or IsGenuineIntel would be nice to have, but CPUID returns such a mishmash of stuff, I can't picture what a generic representation would be. And we've got new xArch clones coming out of China now, so it's about to stop being a two-company game again.

jkotas · 2020-03-04T23:28:55Z

My reason for proposing X86Base is that it looks wasteful to have a type with just two methods and there are more methods that are potential candidates forX86Base (e.g. rep movb)

Things like IsAuthenticAMD or IsGenuineIntel would be nice to have

These are one-liners thanks to C# pattern matching, so it is pretty straightforward for everybody to roll their own as necessary.

static bool IsGenuineIntel => X86Base.CpuId(0) is (_, 0x47656e75, 0x6e74656c, 0x696e6549);

tannergooding · 2020-03-04T23:41:31Z

My reason for proposing X86Base is that it looks wasteful to have a type with just two methods

That's understandable. On the other hand, if we get enough requests for some of the common information to be exposed it might become more than just two methods. I think the immediately obvious ones are:

Maximum CPUID Function/SubFunction Number
CPU Vendor (IsAuthenticAMD and IsGenuineIntel)
CPU Model/Family Information
Cache Line Flush Size
Cache Line and TLB Size Information

These are things we are already querying and caching in the runtime for various functionality and so are things that would be trivial to expose to the end user and that they might find useful as well.
If we don't think that these will become common enough to warrant exposing as part of the framework, then I'll make sure to bring that up during the API review.

mjsabby · 2020-03-06T02:46:29Z

Isn't an intrinsic better here because CoreRT will get it for free since it'll be in RyuJit vs if it's not it'll have to be ported to CoreRT? cc @MichalStrehovsky

tannergooding · 2020-03-06T02:54:40Z

CPUID is sufficiently complex and may require quite a bit of handling.

I imagine it is easier to port it to CoreRT than to update things to support an instruction that trashes 4 registers and requires storing that back to a struct.

mjsabby · 2020-03-06T02:59:59Z

Alright, thanks.

jkotas · 2020-03-06T03:37:07Z

Also, the place we need to get to with CoreCLR, Mono and CoreRT is that these sort of things are written once and shared, without need to be ported.

jkotas · 2020-03-06T03:39:04Z

E.g. this can be achieved today by putting it to System.Native that is shared between all 3.

damageboy · 2020-03-09T10:25:29Z

In either case, @damageboy, would you be able to update the original post to contain the following? Once that is done, I can mark this as ready for review.

Done, sorry for the delay.

tannergooding · 2020-03-09T10:40:34Z

Done, sorry for the delay.

No worries, you've been busy writing an amazing blog series and I'm sure many other things as well 😄

BruceForstall · 2020-05-18T19:19:57Z

@tannergooding this is marked ready-for-review. Is it going to be reviewed? should it be marked as milestone 5.0?

tannergooding · 2020-05-18T19:35:28Z

I've marked it future, we don't have any critical need for the API so it can be reviewed as part of the normal backlog.

terrajobst · 2020-07-23T19:00:25Z

Video

Looks good as proposed
All x86 types will extend X86Base (unless it already base type)

namespace System.Runtime.Intrinsics.X86
{
    public class X86Base
    {
        public static bool IsSupported { get; }

        public static (int Eax, int Ebx, int Ecx, int Edx) CpuId(int functionId, int subFunctionId = 0);
    }
}

abelbraaksma · 2020-08-05T23:06:20Z

Great stuff! Thanks for this 💯.

am11 · 2020-08-10T16:47:58Z

Thanks! Is there also a plan for ARM's CPUID info?

tannergooding · 2020-08-10T17:17:47Z

@am11, no. The ARM information is only accessible to the Kernel and is why each OS exposes an API to query it.

am11 · 2020-08-10T19:07:23Z

@tannergooding, thank you, understood. I saw a similar ongoing PR, where we are reading /proc/cpuinfo on Linux for ARM64 to gather CPU capabilities; and was wondering if it makes sense to level this APIs surface for ARM: https://github.com/dotnet/runtime/pull/40597/files#diff-f808f893e0b21b8950f8284dd17cdc08R91? Perhaps such implementation does not make sense under System.Runtime.Intrinsics.* namespace, if the underlying instruction is only available to the kernel.

Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels Dec 12, 2019

damageboy mentioned this issue Dec 12, 2019

Allow going around PDEP catastrophe on AMD processors #786

Closed

jkotas added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.Intrinsics and removed area-System.Runtime labels Dec 12, 2019

tannergooding mentioned this issue Jan 28, 2020

Performance Regression: UTF8 Span Methods 3x slower on AMD Platforms (NetCore2.1 -> 3.1) #2251

Closed

tannergooding removed the untriaged New issue has not been triaged by the area owner label Feb 10, 2020

tannergooding added api-ready-for-review and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 9, 2020

tannergooding added the arch-x64 label Mar 25, 2020

damageboy mentioned this issue Mar 30, 2020

How to get Common System Hardware information across platforms #22948

Open

saucecontrol mentioned this issue Apr 7, 2020

Add BSF and BSR fallbacks for BitOperations methods #34550

Merged

tannergooding added this to the Future milestone May 18, 2020

carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review labels Jul 6, 2020

jkotas changed the title ~~Provide x86 CPUID related information as JIT time consts~~ Provide x86 CPUID related information Jul 15, 2020

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 23, 2020

jkotas added the help wanted [up-for-grabs] Good issue for external contributors label Jul 23, 2020

tannergooding mentioned this issue Jul 31, 2020

Adding support for X86Base.CpuId #40167

Merged

tannergooding closed this as completed in #40167 Aug 5, 2020

reflectronic mentioned this issue Aug 12, 2020

[Proposal] Expose cpuid #40716

Closed

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide x86 CPUID related information #785

Provide x86 CPUID related information #785

Comments

damageboy commented Dec 12, 2019 • edited Loading

Proposal

Rationale

Proposed API

Alternatives and Other Considerations

tannergooding commented Dec 12, 2019

damageboy commented Dec 12, 2019

tannergooding commented Dec 12, 2019

CarolEidt commented Dec 12, 2019

jkotas commented Dec 12, 2019

damageboy commented Dec 12, 2019

tannergooding commented Dec 12, 2019

jkotas commented Dec 12, 2019

AndyAyersMS commented Dec 12, 2019

damageboy commented Feb 4, 2020 • edited Loading

jkotas commented Feb 4, 2020 • edited Loading

jkotas commented Feb 4, 2020

saucecontrol commented Mar 4, 2020

tannergooding commented Mar 4, 2020 • edited Loading

tannergooding commented Mar 4, 2020

jkotas commented Mar 4, 2020 • edited Loading

tannergooding commented Mar 4, 2020

jkotas commented Mar 4, 2020

tannergooding commented Mar 4, 2020

saucecontrol commented Mar 4, 2020 • edited Loading

jkotas commented Mar 4, 2020

tannergooding commented Mar 4, 2020

mjsabby commented Mar 6, 2020

tannergooding commented Mar 6, 2020

mjsabby commented Mar 6, 2020

jkotas commented Mar 6, 2020

jkotas commented Mar 6, 2020

damageboy commented Mar 9, 2020

tannergooding commented Mar 9, 2020

BruceForstall commented May 18, 2020

tannergooding commented May 18, 2020

terrajobst commented Jul 23, 2020 • edited by dotnet-api-review bot Loading

abelbraaksma commented Aug 5, 2020

am11 commented Aug 10, 2020

tannergooding commented Aug 10, 2020

am11 commented Aug 10, 2020

damageboy commented Dec 12, 2019 •

edited

Loading

damageboy commented Feb 4, 2020 •

edited

Loading

jkotas commented Feb 4, 2020 •

edited

Loading

tannergooding commented Mar 4, 2020 •

edited

Loading

jkotas commented Mar 4, 2020 •

edited

Loading

saucecontrol commented Mar 4, 2020 •

edited

Loading

terrajobst commented Jul 23, 2020 •

edited by dotnet-api-review bot

Loading