Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using UMONITOR, UMWAIT, TPAUSE in CLR and exposing in Intel specific hardware intrinsics #66873

Open
simplejackcoder opened this issue Mar 19, 2022 · 9 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics
Milestone

Comments

@simplejackcoder
Copy link
Contributor

simplejackcoder commented Mar 19, 2022

Summary

x86 based hardware introduced the waitpkg ISA back in 2020 which can be used to better facilitate low-power and low-latency spin-loops.

API Suggestion

namespace System.Runtime.Intrinsics.X86;

[Intrinsic]
[CLSCompliant(false)]
public abstract class WaitPkg : X86Base
{
    public static new bool IsSupported { get; }

    // UMONITOR: void _umonitor(void *address);
    public static unsafe void SetUpUserLevelMonitor(void* address);
    
    // UMWAIT: uint8_t _umwait(uint32_t control, uint64_t counter);
    public static bool WaitForUserLevelMonitor(uint control, ulong counter);
    
    // TPAUSE: uint8_t _tpause(uint32_t control, uint64_t counter);
    public static bool TimedPause(uint control, ulong counter);

    [Intrinsic]
    public new abstract class X64 : X86Base.X64
    {
        internal X64() { }

        public static new bool IsSupported { get; }
    }
}

Additional Considerations

There is a model specific register IA32_UMWAIT_CONTROL (MSR 0xE1) which provides additional information. However, model specific registers can only be read by ring 0 (the kernel) and as such this information is not available to user mode programs without the underlying OS exposing an explicit API. As such, this information is not surfaced to the end user.

  • IA32_UMWAIT_CONTROL[31:2] — Determines the maximum time in TSC-quanta that the processor can reside in either C0.1 or C0.2. A zero value indicates no maximum time. The maximum time value is a 32-bit value where the upper 30 bits come from this field and the lower two bits are zero.
  • IA32_UMWAIT_CONTROL[1] — Reserved.
  • IA32_UMWAIT_CONTROL[0] — C0.2 is not allowed by the OS. Value of “1” means all C0.2 requests revert to C0.1.

This information is not strictly pertinent to the user either and would not normally influence their use of the APIs. For example, if IA32_UMWAIT_CONTROL[0] is 1, it simply means that a user call of TimedPause where control == 0 will be treated as control == 1:

Bit Value State Name Wakeup Time Power Savings Other Benefits
bit[0] = 0 C0.2 Slower Larger Improves performance of the other SMT thread(s) on the same core
bit[0] = 1 C0.1 Faster Smaller N/A
bits[31:1] N/A N/A N/A Reserved

Likewise, if the user specified counter is larger than IA32_UMWAIT_CONTROL[31:2] then TimedPause returns true indicating that the pause ended due to expiration of the operating system time-limit rather than reaching/exceeding the specified counter (returns false). The same applies to WaitForUserLevelMonitor.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Runtime.Intrinsics untriaged New issue has not been triaged by the area owner labels Mar 19, 2022
@ghost
Copy link

ghost commented Mar 19, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Use cases

  • Expose in the hardware intrinsics library
  • Threads
  • Sleep
  • GC write watch
Author: simplejackcoder
Assignees: -
Labels:

area-System.Runtime.Intrinsics, untriaged

Milestone: -

@tannergooding
Copy link
Member

At the very least, this proposal needs to be updated to follow the API Proposal outline, similarly to #66467

These instructions are available in user-mode and don't appear to have any oddities that would prevent their support in the JIT. waitpkg is a relatively new ISA that I believe is only supported in Tremont, Alder Lake, Sapphire Rapids at the moment and is currently Intel only.

It might be interesting to see if @stephentoub, @jkotas has anywhere this could be used in-box. Things like working with the GC would likely not be easy to support and like pause/yield these are likely difficult to use APIs. It might be better to see if the functionality could be implicitly used where possible or if a more general set of "efficient/xplat" APIs covering this functionality is a "better idea".

For reference:

  • tpause is timed pause and lets you basically wait for n cycles in a "power" or "efficiency" mode
  • umonitor is monitor address and lets you set up a hardware trigger that occurs when a given address is written (the range is queried via cpuid)
  • umwait is monitor wait and basically does tpause until the time stamp counter passes or the setup umonitor address is triggered

The C++ signatures for these are:

  • uint8_t _tpause(uint32_t control, uint64_t counter);
  • void _umonitor(void *address);
  • uint8_t _umwait(uint32_t control, uint64_t counter);

Rust provides similarly named APIs.

@tannergooding tannergooding added needs-author-action An issue or pull request that requires more info or actions from the author. and removed untriaged New issue has not been triaged by the area owner labels Apr 11, 2022
@ghost

This comment was marked as off-topic.

@jkotas
Copy link
Member

jkotas commented Apr 11, 2022

It might be interesting to see if @stephentoub, @jkotas has anywhere this could be used in-box

It would be interesting to experiment with replacing the lock spin loops using these intrinsics. It should provide better overall performance, especially on machines with many cores.

The common locks are implemented in C/C++ in CoreCLR today, so we would need to reimplement them in C# first before the managed intrinsics can be used for those.

@ghost ghost added the no-recent-activity label Apr 25, 2022
@ghost

This comment was marked as off-topic.

@jkotas jkotas removed no-recent-activity needs-author-action An issue or pull request that requires more info or actions from the author. labels Apr 25, 2022
@jeffhandley jeffhandley added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 27, 2022
@jeffhandley jeffhandley added this to the Future milestone Apr 27, 2022
@deeprobin
Copy link
Contributor

Can somebody create an API-Shape for this Proposal?

@MineCake147E

This comment was marked as resolved.

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jan 2, 2024
@tannergooding
Copy link
Member

I've updated it loosely based on the above. Made a couple tweaks and gave an explanation of why GetMaximumWaitTime and GetIsC02Supported can't be exposed

@bartonjs
Copy link
Member

bartonjs commented Jan 9, 2024

Video

Looks good as proposed.

namespace System.Runtime.Intrinsics.X86;

[Intrinsic]
[CLSCompliant(false)]
public abstract class WaitPkg : X86Base
{
    public static new bool IsSupported { get; }

    // UMONITOR: void _umonitor(void *address);
    public static unsafe void SetUpUserLevelMonitor(void* address);
    
    // UMWAIT: uint8_t _umwait(uint32_t control, uint64_t counter);
    public static bool WaitForUserLevelMonitor(uint control, ulong counter);
    
    // TPAUSE: uint8_t _tpause(uint32_t control, uint64_t counter);
    public static bool TimedPause(uint control, ulong counter);

    [Intrinsic]
    public new abstract class X64 : X86Base.X64
    {
        internal X64() { }

        public static new bool IsSupported { get; }
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics
Projects
None yet
Development

No branches or pull requests

7 participants