Add support for utilizing F16C instructions on xarch by tannergooding · Pull Request #127094 · dotnet/runtime

tannergooding · 2026-04-17T21:36:14Z

Since #122649 had to be reverted due to the ABI concerns, this is a simpler initial change that works with the existing ABI and on hardware with AVX2 support (not just AVX512-FP16 capable hardware).

This should provide a nice win across most existing hardware and we can follow up with a PR that does similar for the AVX512-FP16 instructions that allow directly accelerated arithmetic operations, rather than only handling conversions.

Before

; Method Program:HalfToSingle(System.Half):float (FullOpts)
G_M16314_IG01:  ;; offset=0x0000
       4883EC28             sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M16314_IG02:  ;; offset=0x0004
       0FB7C9               movzx    rcx, cx
       FF156BA74500         call     [System.Half:op_Explicit(System.Half):float]
       90                   nop      
						;; size=10 bbWeight=1 PerfScore 3.50

G_M16314_IG03:  ;; offset=0x000E
       4883C428             add      rsp, 40
       C3                   ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 19

; Method Program:SingleToHalf(float):System.Half (FullOpts)
G_M32250_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M32250_IG02:  ;; offset=0x0000
       FF2572A74500         tail.jmp [System.Half:op_Explicit(float):System.Half]
						;; size=6 bbWeight=1 PerfScore 2.00
; Total bytes of code: 6

After

; Method Program:HalfToSingle(System.Half):float (FullOpts)
G_M15861_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M15861_IG02:  ;; offset=0x0000
       0FB7C1               movzx    rax, cx
       C5F96EC0             vmovd    xmm0, eax
       C4E27913C0           vcvtph2ps xmm0, xmm0
						;; size=12 bbWeight=1 PerfScore 6.25

G_M15861_IG03:  ;; offset=0x000C
       C3                   ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 13

; Method Program:SingleToHalf(float):System.Half (FullOpts)
G_M15413_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M15413_IG02:  ;; offset=0x0000
       C4E3791DC000         vcvtps2ph xmm0, xmm0, 0
       C5F97EC0             vmovd    eax, xmm0
       0FB7C0               movzx    rax, ax
						;; size=13 bbWeight=1 PerfScore 6.25

G_M15413_IG03:  ;; offset=0x000D
       C3                   ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 14

tannergooding · 2026-04-17T21:40:07Z

@EgorBot -intel -amd

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Benchmarks
{
    static void Main(string[] args)
    {
        BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);
    }

    float dataF32 = float.Pi;
    Half dataF16 = Half.Pi;

    [Benchmark]
    public float HalfToSingle() => (float)dataF16;

    [Benchmark]
    public Half SingleToHalf() => (Half)dataF32;
}

Copilot

Pull request overview

This PR adds initial xarch JIT support to accelerate System.Half ↔ float explicit conversions by recognizing Half.op_Explicit as a named intrinsic and lowering it to AVX2/F16C conversion instructions where available, without changing the existing ABI.

Changes:

Mark System.Half and the Half(float) / float(Half) explicit operators as [Intrinsic] so the JIT can recognize them.
Add a new named intrinsic (NI_System_Half_op_Explicit) and importer expansion that emits AVX2 conversion HW intrinsics (vcvtps2ph / vcvtph2ps) for xarch.
Update xarch HW intrinsic lists + containment/perf metadata to support the new conversion instructions.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/libraries/System.Private.CoreLib/src/System/Half.cs	Marks `Half` and key explicit operators as `[Intrinsic]` to enable JIT recognition.
src/coreclr/jit/namedintrinsiclist.h	Adds a named intrinsic ID for `System.Half.op_Explicit`.
src/coreclr/jit/importercalls.cpp	Recognizes `Half.op_Explicit` and expands it to AVX2 conversion intrinsic sequences on xarch.
src/coreclr/jit/importer.cpp	Adds helper routines to pack/unpack scalar `Half` values through SIMD nodes.
src/coreclr/jit/compiler.h	Declares helper routines and adds `isSystemHalfClass` type recognition.
src/coreclr/jit/hwintrinsiclistxarch.h	Adds AVX2 conversion intrinsics for half<->single vector conversions.
src/coreclr/jit/lowerxarch.cpp	Extends containment logic to support the new conversion/store patterns.
src/coreclr/jit/emitxarch.cpp	Adds perf characteristics entries for the new conversion instructions.

tannergooding · 2026-04-17T21:42:35Z

CC. @dotnet/jit-contrib, @EgorBo, @kg for review

@dotnet/intel and @jkotas as an FYI on the alternative approach. AVX512-FP16 support can be done nearly identically, it's just a bigger PR. I'll pull the changes from #122649 after this lands. We can then look at the ABI handling and ensuring Half is properly passed in a floating-point register in the future.

tannergooding · 2026-04-17T22:16:36Z

Benchmark (EgorBot/Benchmarks#132) is too small to get good results...

The realistic is that HalfToSingle is roughly 4.16x faster and SingleToHalf is about 1.76x faster. Changing from about 28 instructions w/ 3 memory accesses and 25 instructions w/ 0 memory accesses, respectively, to about 2 instructions with 0 memory accesses.

Add support for utilizing F16C instructions on xarch

e2cd8bc

Copilot AI review requested due to automatic review settings April 17, 2026 21:36

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 17, 2026

Copilot started reviewing on behalf of tannergooding April 17, 2026 21:37 View session

dotnet-policy-service bot assigned tannergooding Apr 17, 2026

EgorBot mentioned this pull request Apr 17, 2026

Benchmarks for dotnet/runtime#127094 (for @tannergooding) EgorBot/Benchmarks#132

Open

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread src/coreclr/jit/importer.cpp Outdated

Comment thread src/coreclr/jit/importercalls.cpp

Comment thread src/coreclr/jit/importercalls.cpp

Remove a typo

c3ad72c

kg approved these changes Apr 18, 2026

View reviewed changes

tannergooding enabled auto-merge (squash) April 18, 2026 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for utilizing F16C instructions on xarch#127094

Add support for utilizing F16C instructions on xarch#127094
tannergooding wants to merge 2 commits intodotnet:mainfrom
tannergooding:half-simd

tannergooding commented Apr 17, 2026 •

edited

Loading

Uh oh!

tannergooding commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Apr 17, 2026

Uh oh!

tannergooding commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tannergooding commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

tannergooding commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Apr 17, 2026

Uh oh!

tannergooding commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding commented Apr 17, 2026 •

edited

Loading