Add support for utilizing F16C instructions on xarch#127094
Add support for utilizing F16C instructions on xarch#127094tannergooding wants to merge 2 commits intodotnet:mainfrom
Conversation
|
@EgorBot -intel -amd using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Benchmarks
{
static void Main(string[] args)
{
BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);
}
float dataF32 = float.Pi;
Half dataF16 = Half.Pi;
[Benchmark]
public float HalfToSingle() => (float)dataF16;
[Benchmark]
public Half SingleToHalf() => (Half)dataF32;
} |
There was a problem hiding this comment.
Pull request overview
This PR adds initial xarch JIT support to accelerate System.Half ↔ float explicit conversions by recognizing Half.op_Explicit as a named intrinsic and lowering it to AVX2/F16C conversion instructions where available, without changing the existing ABI.
Changes:
- Mark
System.Halfand theHalf(float)/float(Half)explicit operators as[Intrinsic]so the JIT can recognize them. - Add a new named intrinsic (
NI_System_Half_op_Explicit) and importer expansion that emits AVX2 conversion HW intrinsics (vcvtps2ph / vcvtph2ps) for xarch. - Update xarch HW intrinsic lists + containment/perf metadata to support the new conversion instructions.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Half.cs | Marks Half and key explicit operators as [Intrinsic] to enable JIT recognition. |
| src/coreclr/jit/namedintrinsiclist.h | Adds a named intrinsic ID for System.Half.op_Explicit. |
| src/coreclr/jit/importercalls.cpp | Recognizes Half.op_Explicit and expands it to AVX2 conversion intrinsic sequences on xarch. |
| src/coreclr/jit/importer.cpp | Adds helper routines to pack/unpack scalar Half values through SIMD nodes. |
| src/coreclr/jit/compiler.h | Declares helper routines and adds isSystemHalfClass type recognition. |
| src/coreclr/jit/hwintrinsiclistxarch.h | Adds AVX2 conversion intrinsics for half<->single vector conversions. |
| src/coreclr/jit/lowerxarch.cpp | Extends containment logic to support the new conversion/store patterns. |
| src/coreclr/jit/emitxarch.cpp | Adds perf characteristics entries for the new conversion instructions. |
|
CC. @dotnet/jit-contrib, @EgorBo, @kg for review @dotnet/intel and @jkotas as an FYI on the alternative approach. AVX512-FP16 support can be done nearly identically, it's just a bigger PR. I'll pull the changes from #122649 after this lands. We can then look at the ABI handling and ensuring |
|
Benchmark (EgorBot/Benchmarks#132) is too small to get good results... The realistic is that |
Since #122649 had to be reverted due to the ABI concerns, this is a simpler initial change that works with the existing ABI and on hardware with AVX2 support (not just AVX512-FP16 capable hardware).
This should provide a nice win across most existing hardware and we can follow up with a PR that does similar for the AVX512-FP16 instructions that allow directly accelerated arithmetic operations, rather than only handling conversions.
Before
After