Skip to content

Potential optimizations creating intrinsic Vector from scalar #86405

@BladeWise

Description

@BladeWise

Following a chat with @tannergooding, I am opening this issue to track potential codegen improvements to create an intrisics VectorXXX from a scalar value.

Consider the following code, which tries to load a 2-byte vector array as an ushort into a Vector128<ushort>:

public static class Extensions
{
    public static Vector128<byte> LoadAsCreateScalar(this in Vector2<byte> v) =>
            Vector128.CreateScalar(Unsafe.As<Vector2<byte>, ushort>(ref Unsafe.AsRef(in v))).AsByte();

    public static Vector128<byte> LoadAsSkipInitWithElement(this in Vector2<byte> v) =>
            default(Vector128<ushort>).WithElement(0, Unsafe.As<Vector2<byte>, ushort>(ref Unsafe.AsRef(in v))).AsByte();

    public static Vector128<byte> LoadAsCreateScalarUnsafe(this in Vector2<byte> v) =>
            Vector128.CreateScalarUnsafe(Unsafe.As<Vector2<byte>, ushort>(ref Unsafe.AsRef(in v))).AsByte();
}

[StructLayout(LayoutKind.Sequential)]
public readonly struct Vector2<T> where T : unmanaged, INumber<T>
{
    public T X
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get;
    }

    public T Y
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get;
    }

    public Vector2(T x, T y)
    {
        X = x;
        Y = y;
    }
}

The code generated by the current runtime (net8) for above extension are:

; LoadAsCreateScalar
vzeroupper 
movzx    rax, word  ptr [rdx]
vmovd    xmm0, rax
vmovups  xmmword ptr [rcx], xmm0
mov      rax, rcx
ret
; LoadAsSkipInitWithElement
vzeroupper
vxorps   xmm0, xmm0, xmm0
vpinsrw  xmm0, xmm0, word  ptr [rdx], 0
vmovups  xmmword ptr [rcx], xmm0
mov      rax, rcx
ret
; LoadAsCreateScalarUnsafe
vzeroupper
movzx    rax, word  ptr [rdx]
vmovd    xmm0, eax
vmovups  xmmword ptr [rcx], xmm0
mov      rax, rcx
ret

There could be two possible improvements:

  • if destination is a register and source is from memory, use pinsrw directly (like LoadAsSkipInitWithElement, without zeroing)
  • if both source and destination are memory it would be possible to just chain movzx/mov
movzx rax, word ptr[rdx]
mov word [rcx], rax

As an addendum, it could be useful to have an API to get an unitialized intrinsics VectorXXX from managed code to perform an unsafe WithElement, without paying the cost of the zeroing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions