Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide x86 CPUID related information #785

Closed
damageboy opened this issue Dec 12, 2019 · 42 comments · Fixed by #40167
Closed

Provide x86 CPUID related information #785

damageboy opened this issue Dec 12, 2019 · 42 comments · Fixed by #40167
Labels
api-approved API was approved in API review, it can be implemented arch-x64 area-System.Runtime.Intrinsics help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@damageboy
Copy link
Contributor

damageboy commented Dec 12, 2019

Proposal

Rationale

There's a breadth of information available through the CPUID instruction.
Most of this information is not available today from C#, with the exception of information like
support for specific families of HW Intrinsics which is surfaced through the IsSupported JIT time constant/property in each respective group of intrinsics.

I've lightly touched on this in a different issue, but I think this deserves an issue of its own in light with the specific needs that I have in mind.

To name a few types of information that are not currently provided:

  • Specific CPU Model (Kaby Lake / Skylake etc.)
  • Manufacturer
  • Various key HW info like:
    • Layout of the caches:
      • cache line sizes
      • levels
      • associativity
      • TLB levels, sizes
    • Topology of the system (NUMA Nodes, physical/logical mapping) and it's mapping to CPU affinity masks

This information could be used to select proper code paths or pre-allocate data structures with sizes that relate to the actual HW at hand.

Proposed API

namespace System.Runtime.Intrinsics.X86
{
    public class X86Base
    {
        public static bool IsSupported { get; }

        public static (int Eax, int Ebx, int Ecx, int Edx) CpuId(int functionId, int subFunctionId = 0);
    }
}

Alternatives and Other Considerations

Given that Value Tuples are currently considered AVOID, an alternative is to match the C++ signature of public static void CpuId(int* cpuInfo, int functionId) and public static void CpuId(int* cpuInfo, int functionid, int subFunctionId)

CPUID is not technically a baseline instruction. It requires its own check using the ID bit of the EFLAGS register. However, only CPUs prior to the 486SL (released in 1992) won't have this feature available. In order to minimize the number of ISAs that only contain 1-2 methods (like Lzcnt), it is proposed to simply consider this part of X86Base rather than it be in its own CpuId class.

It is also important to make note that other, cross-platform/architecture methods of obtaining such information are available today, namely the hwloc native library that can provide most if not all of this and beyond through a normalized native API, and ultimately give us this view of our own systems (screenshot of lstopo generated output, part of the hwloc library):

For reference, here's a sample of cpuid generated to show what's up for discussion.

Relevant issues:
https://github.com/dotnet/corefx/issues/22790
https://github.com/dotnet/coreclr/issues/14388

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels Dec 12, 2019
@jkotas jkotas added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.Intrinsics and removed area-System.Runtime labels Dec 12, 2019
@tannergooding
Copy link
Member

I had thought a bit about this previously and believe an appropriate API would look something like:

// casing of the name here is weird but follows the .NET naming guidelines
public abstract class CpuId
{
    internal CpuId() { }

    // We can't use CpuId for the method as it conflicts with the class
    // name. Some other name may be better but `Query` seems to fit

    // Signature matches the C++ convention for __cpuid and __cpuidex
    // we could also return a tuple or an explicitly typed struct and
    // could reorder the parameters so the "out buffer" is last

    // Using `uint` rather than `int` may also be better for C# since most
    // bit manipulations should be unsigned and there are negative "max values"

    public static void QueryCpu(int* cpuInfo, int functionId);
    public static void QueryCpu(int* cpuInfo, int functionId, int subFunctionId);
}

We could then additionally expose helper properties for certain key information. For example:

public static bool IsAuthenticAMD { get; }
public static bool IsGenuineIntel { get; }

public static uint MaxFunctionId { get; }

// etc

@damageboy
Copy link
Contributor Author

I think this would be a great sort of addition.

Some properties are surfaced all the way as static native types, treated as JIT time constants.
I think that the "minimum" set (in my opinion, that is, of course) would have to include some more data about cache layout, processor layout, and topology, while leaving the really nutty stuff to be queried through the QueryCpu() methods you propose.

I'm assuming you meant for the function to either accept (out uint eax, out uint ebx, out uint ecx, out uint edx) or return them as a value tuple (but value tuples are verboten in CoreFX, IIRC)... right?

I even hacked up something like this a long long time ago :), back when I was popping the stack pointer and re-writing the call-site to add intrinsics "dynamically" (Don't ask):
https://github.com/damageboy/ReJit/blob/master/ReJIT/CPUID.cs

The one "big" thing that would be still "missing" is that this would not allow for usage inside
existing/future attributes controlling layout/alignment, since, well, it isn't a compile time constant...

I guess that is more of an issue at the CLR level where certain things have to be constants at compile-time even though in theory they could be expressed as constants at JIT time (Had that notion of late-bound constant was acceptable/supported).

@tannergooding
Copy link
Member

(but value tuples are verboten in CoreFX, IIRC)

I believe we resolved the naming issues that prevented them from being used and now merely recommend against their use.

@CarolEidt could probably comment on the best way to expose the signature that can also work efficiently with the register allocator. Namely, cpuid returns its result in rax, rbx, rcx, and rdx. C++ takes a int cpuInfo[4] (which is effectively just a int* that is has space for 4 elements), so we need to return 4 results either via some stack allocated space or a specialized struct/value tuple.

@CarolEidt
Copy link
Contributor

A struct/value tuple with 4 elements would be quite problematic to do efficiently (given current JIT capabilities) if it were an actual call. As it is, since intrinsics are expanded early in the JIT, it's just a bit less problematic. The remaining issue is that we'd really like to express this as an instruction that defines multiple (register) values, and the register allocator (indeed, the JIT more broadly) currently doesn't handle that really well (even multi-register returns and longs on arm32 are supported as special cases). That's a recurring issue, though, that we probably should just address. It will probably require additional register allocator work to handle a case where not only does it define multiple registers, but they are fixed. Again, doable but probably a bit tricky.

@jkotas
Copy link
Member

jkotas commented Dec 12, 2019

I do not think CPUID instruction needs to be handled as HW intrinsic by the JIT. It can be a regular call to C or assembly helper that the JIT knows nothing about. CPUID is expensive. You want to call it once and cache the result.

@damageboy
Copy link
Contributor Author

I think @jkotas is right.
CPUID is not the sort of thing you don't need to optimize in general, but rather cache.

The only reason to even consider CPUID as an full-blown intrinsic would be if for some reason .NET/CoreCLR programmers would benefit somehow by CPUID's only other saving grace: the fact that it is a serializing instruction, and the easiest one to issue userspace.

Quoting Intel's SDM Volume 3, section 8.3:

Non-privileged serializing instructions — CPUID, IRET, and RSM.
When the processor serializes instruction execution, it ensures that all pending memory transactions are completed (including writes stored in its store buffer) before it executes the next instruction. Nothing can pass a serializing instruction and a serializing instruction cannot pass any other instruction (read, write, instruction fetch, or I/O). For example, CPUID can be executed at any privilege level to serialize instruction execution with no effect on program flow, except that the EAX, EBX, ECX, and EDX registers are modified.

I might be unimaginative, but I'm having a hard time coming up for a reason to call this from userspace/CoreCLR.

Perhaps other people can think of a real use case for having this as an intrinsic for its serializing property...

@tannergooding
Copy link
Member

It can be a regular call to C or assembly helper that the JIT knows nothing about

Are you indicating that a CpuId API, if exposed, would be more akin to an InternalCall that invokes __cpuid and __cpuidex?

You want to call it once and cache the result.

Right and there are certain key CPUID properties that it would be beneficial to be more broadly exposed and which are x86 specific. For example, IsGenuineIntel and IsAuthenticAMD checks which would be used in conjunction with other hardware intrinsics.

Will the existing JIT constant folding logic around static readonly fields allow these cases to work?

There are also certain hardware properties which would be beneficial to more broadly expose but which may also be beneficial for other platforms (such as cache line sizes) and I think we have an issue tracking those.

@jkotas
Copy link
Member

jkotas commented Dec 12, 2019

Are you indicating that a CpuId API, if exposed, would be more akin to an InternalCall that invokes __cpuid and __cpuidex?

Right.

Will the existing JIT constant folding logic around static readonly fields allow these cases to work?

Yes, with tirered JITing.

@AndyAyersMS
Copy link
Member

FWIW the native compiler has to play some tricks to safely emit cpuid for x86, since it kills so many registers. Would be nice if we didn't have to propagate those tricks to the jit.

@damageboy
Copy link
Contributor Author

damageboy commented Feb 4, 2020

Continuing from #2251:

The SIMD intrinsics are even harder to get right. I believe that there is negative value in trying to teach people how to conditionalize those using CPUIDs.

I agree that this is far from optimal, I just see this is a lesser of all evils.
I think it is also important to acknowledge that there are many common cases where developers could rely on CPUID results, but would be better served by a more high-level/specific way of expressing their intent. One such immediate case that pops to mind would be cache-line alignment, which would benefit more from a targeted fix in the form of some attribute rather than throwing CPUID at developers expecting them to hack it together.

But the world of intrinsics, in the end, is very dynamic, complicated, and uneven, as the PDEP situation exemplifies (BTW, there is a lot of chatter that AMD is not solving this due to a patent situation regarding the underlying implementation inside the CPU, and it might stay so for a long time), which brings me to your next point:

Right. And I do not believe that we want to add APIs like this to the core platform. If somebody believes that they really need them, they can build them on their own.

You are both right and wrong, in my opinion. At the end of the day, the community would be better served from having a complete and wide API that can describe the underlying HW (I mentioned hwloc in #785, which I've personally used from C# just like you suggest, in the past).

My only counterpoint to this would be that the JIT/platform is in a unique position to mark and treat these values as constants, just like some values, gleaned from the system like .IsSupported, VectorNNN<T>.Count, ProcessorCount to name a few, already are.
Maybe if there was a declarative way of providing the JIT with constants knowing for sure they would be employed to elide branches and for constant folding, we could relegate this to third parties while not missing any optimization opportunities.
I know about static read-only members in this regard, but my impression is/was that they are not a failproof way of ensuring the JIT uses them as constant, am I wrong in this regard?

@jkotas
Copy link
Member

jkotas commented Feb 4, 2020

I know about static read-only members in this regard, but my impression is/was that they are not a failproof way of ensuring the JIT uses them as constant, am I wrong in this regard?

They have small gotcha today in that the value gets inlined only if the static constructor has run, so you may need to explicit trigger the static constructor to get them initialized early. We have plans to improve tiered compilation that will make this unnecessary.

@jkotas
Copy link
Member

jkotas commented Feb 4, 2020

The module initializers will make it easy to trigger the static constructors early.

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Feb 10, 2020
@saucecontrol
Copy link
Member

Can we consider @tannergooding's basic CpuId wrapper an API proposal? I'm looking at mitigating performance issues with AVX2 gather instructions, which were slow on Intel pre-Skylake models and have always been (relatively) slow on AMD. I'd be looking for vendor and model info for that.

It would, of course, be nice to have a convenience API that pulls out some of the basic properties in a more readable format, but raw access to CPUID would solve the basic need.

@tannergooding
Copy link
Member

tannergooding commented Mar 4, 2020

Is the proposed API simply (which matches C++):

public static void QueryCpu(int* cpuInfo, int functionId);
public static void QueryCpu(int* cpuInfo, int functionId, int subFunctionId);

or do we want to be more explicit:

public static void QueryCpu(int* eax, int* ebx, int* ecx, int* edx, int functionId);

And do we want the outputs as the first or last parameter?

Also, do we want to propose any common entries (IsAuthenticAMD, IsGenuineIntel, or the Cache Line Size) or do we want to say getting and caching those is the responsibility of the user?

@tannergooding
Copy link
Member

My preference is to match C++, which will also make the code fairly trivial to call with a stackalloc.
It would also be to only propose QueryCpu until we have use cases for any "common" entries the framework may want to more generally expose.
But, I thought I'd ask to get other people's opinions as well.

@jkotas
Copy link
Member

jkotas commented Mar 4, 2020

What about matching the style of the always present ARM intrinsics and make it look like this?

namespace System.Runtime.Intrinsics.X86
{
    public class X86Base
    {
        public static bool IsSupported { get; }
        public ValueTuple<int,int,int,int> CpuId(int functionId, int subFunctionId = 0);
    }
}

It would also be to only propose QueryCpu

+1

@tannergooding
Copy link
Member

of the always present ARM intrinsics

My only concern is that it isn't actually an "always present" intrinsic. CPUID technically requires its own check (attempting to toggle the ID bit in the EFLAGS register) and so far we've followed that hierarchy even for other ISAs (such as SSE/SSE2) that are "baseline" to our own implementation of the framework/runtime.
I don't think that "in practice" this is a concern, but there have been hobby projects to get code running on Win3.1 or DOS and that may prevent certain usages for anyone wanting to expose these intrinsics in their own runtime implementations.

ValueTuple<int,int,int,int>

I'm fine with taking the proposal for a value tuple return to API review, we just haven't had much precedence for it yet 😄
I might suggest (int EAX, int EBX, int ECX, int EDX) (or some other casing) rather than straight ValueTuple though to help clarify which entry is which.

@jkotas
Copy link
Member

jkotas commented Mar 4, 2020

I don't think that "in practice" this is a concern, but there have been hobby projects

These hobby projects can return false from X86Base.IsSupported.

@tannergooding
Copy link
Member

These hobby projects can return false from X86Base.IsSupported.

Yes. But alternatively we could just make the class named Cpuid and it would preserve the existing layering and matching of IsSupported checks to the underlying hardware.
In either case, both will likely be brought up during API review (one as the suggested API and the other under the typical alternatives/other notes section) and API review should be able to settle on which is better.

@saucecontrol
Copy link
Member

saucecontrol commented Mar 4, 2020

    public static class CpuId
    {
        public static bool IsSupported { get; }
        public static (int EAX, int EBX, int ECX, int EDX) CpuId(int functionId, int subFunctionId = 0);
    }

LGTM. I'm keen to see if a ValueTuple return makes it through API review as well.

Things like IsAuthenticAMD or IsGenuineIntel would be nice to have, but CPUID returns such a mishmash of stuff, I can't picture what a generic representation would be. And we've got new xArch clones coming out of China now, so it's about to stop being a two-company game again.

@jkotas
Copy link
Member

jkotas commented Mar 4, 2020

My reason for proposing X86Base is that it looks wasteful to have a type with just two methods and there are more methods that are potential candidates forX86Base (e.g. rep movb)

Things like IsAuthenticAMD or IsGenuineIntel would be nice to have

These are one-liners thanks to C# pattern matching, so it is pretty straightforward for everybody to roll their own as necessary.

static bool IsGenuineIntel => X86Base.CpuId(0) is (_, 0x47656e75, 0x6e74656c, 0x696e6549);

@tannergooding
Copy link
Member

My reason for proposing X86Base is that it looks wasteful to have a type with just two methods

That's understandable. On the other hand, if we get enough requests for some of the common information to be exposed it might become more than just two methods. I think the immediately obvious ones are:

  • Maximum CPUID Function/SubFunction Number
  • CPU Vendor (IsAuthenticAMD and IsGenuineIntel)
  • CPU Model/Family Information
  • Cache Line Flush Size
  • Cache Line and TLB Size Information

These are things we are already querying and caching in the runtime for various functionality and so are things that would be trivial to expose to the end user and that they might find useful as well.
If we don't think that these will become common enough to warrant exposing as part of the framework, then I'll make sure to bring that up during the API review.

@mjsabby
Copy link
Contributor

mjsabby commented Mar 6, 2020

Isn't an intrinsic better here because CoreRT will get it for free since it'll be in RyuJit vs if it's not it'll have to be ported to CoreRT? cc @MichalStrehovsky

@tannergooding
Copy link
Member

CPUID is sufficiently complex and may require quite a bit of handling.

I imagine it is easier to port it to CoreRT than to update things to support an instruction that trashes 4 registers and requires storing that back to a struct.

@mjsabby
Copy link
Contributor

mjsabby commented Mar 6, 2020

Alright, thanks.

@jkotas
Copy link
Member

jkotas commented Mar 6, 2020

Also, the place we need to get to with CoreCLR, Mono and CoreRT is that these sort of things are written once and shared, without need to be ported.

@jkotas
Copy link
Member

jkotas commented Mar 6, 2020

E.g. this can be achieved today by putting it to System.Native that is shared between all 3.

@damageboy
Copy link
Contributor Author

In either case, @damageboy, would you be able to update the original post to contain the following? Once that is done, I can mark this as ready for review.

Done, sorry for the delay.

@tannergooding tannergooding added api-ready-for-review and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 9, 2020
@tannergooding
Copy link
Member

Done, sorry for the delay.

No worries, you've been busy writing an amazing blog series and I'm sure many other things as well 😄

@BruceForstall
Copy link
Member

@tannergooding this is marked ready-for-review. Is it going to be reviewed? should it be marked as milestone 5.0?

@tannergooding tannergooding added this to the Future milestone May 18, 2020
@tannergooding
Copy link
Member

I've marked it future, we don't have any critical need for the API so it can be reviewed as part of the normal backlog.

@carlossanlop carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review labels Jul 6, 2020
@jkotas jkotas changed the title Provide x86 CPUID related information as JIT time consts Provide x86 CPUID related information Jul 15, 2020
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 23, 2020
@terrajobst
Copy link
Member

terrajobst commented Jul 23, 2020

Video

  • Looks good as proposed
  • All x86 types will extend X86Base (unless it already base type)
namespace System.Runtime.Intrinsics.X86
{
    public class X86Base
    {
        public static bool IsSupported { get; }

        public static (int Eax, int Ebx, int Ecx, int Edx) CpuId(int functionId, int subFunctionId = 0);
    }
}

@jkotas jkotas added the help wanted [up-for-grabs] Good issue for external contributors label Jul 23, 2020
@abelbraaksma
Copy link
Contributor

Great stuff! Thanks for this 💯.

@am11
Copy link
Member

am11 commented Aug 10, 2020

Thanks! Is there also a plan for ARM's CPUID info?

@tannergooding
Copy link
Member

@am11, no. The ARM information is only accessible to the Kernel and is why each OS exposes an API to query it.

@am11
Copy link
Member

am11 commented Aug 10, 2020

@tannergooding, thank you, understood. I saw a similar ongoing PR, where we are reading /proc/cpuinfo on Linux for ARM64 to gather CPU capabilities; and was wondering if it makes sense to level this APIs surface for ARM: https://github.com/dotnet/runtime/pull/40597/files#diff-f808f893e0b21b8950f8284dd17cdc08R91? Perhaps such implementation does not make sense under System.Runtime.Intrinsics.* namespace, if the underlying instruction is only available to the kernel.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented arch-x64 area-System.Runtime.Intrinsics help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.