Following a chat with @tannergooding, I am opening this issue to track potential codegen improvements to create an intrisics VectorXXX from a scalar value.
Consider the following code, which tries to load a 2-byte vector array as an ushort into a Vector128<ushort>:
public static class Extensions
{
public static Vector128<byte> LoadAsCreateScalar(this in Vector2<byte> v) =>
Vector128.CreateScalar(Unsafe.As<Vector2<byte>, ushort>(ref Unsafe.AsRef(in v))).AsByte();
public static Vector128<byte> LoadAsSkipInitWithElement(this in Vector2<byte> v) =>
default(Vector128<ushort>).WithElement(0, Unsafe.As<Vector2<byte>, ushort>(ref Unsafe.AsRef(in v))).AsByte();
public static Vector128<byte> LoadAsCreateScalarUnsafe(this in Vector2<byte> v) =>
Vector128.CreateScalarUnsafe(Unsafe.As<Vector2<byte>, ushort>(ref Unsafe.AsRef(in v))).AsByte();
}
[StructLayout(LayoutKind.Sequential)]
public readonly struct Vector2<T> where T : unmanaged, INumber<T>
{
public T X
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get;
}
public T Y
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get;
}
public Vector2(T x, T y)
{
X = x;
Y = y;
}
}
The code generated by the current runtime (net8) for above extension are:
; LoadAsCreateScalar
vzeroupper
movzx rax, word ptr [rdx]
vmovd xmm0, rax
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
; LoadAsSkipInitWithElement
vzeroupper
vxorps xmm0, xmm0, xmm0
vpinsrw xmm0, xmm0, word ptr [rdx], 0
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
; LoadAsCreateScalarUnsafe
vzeroupper
movzx rax, word ptr [rdx]
vmovd xmm0, eax
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
There could be two possible improvements:
- if destination is a register and source is from memory, use
pinsrw directly (like LoadAsSkipInitWithElement, without zeroing)
- if both source and destination are memory it would be possible to just chain
movzx/mov
movzx rax, word ptr[rdx]
mov word [rcx], rax
As an addendum, it could be useful to have an API to get an unitialized intrinsics VectorXXX from managed code to perform an unsafe WithElement, without paying the cost of the zeroing.
Following a chat with @tannergooding, I am opening this issue to track potential codegen improvements to create an intrisics
VectorXXXfrom a scalar value.Consider the following code, which tries to load a 2-byte vector array as an
ushortinto aVector128<ushort>:The code generated by the current runtime (net8) for above extension are:
There could be two possible improvements:
pinsrwdirectly (likeLoadAsSkipInitWithElement, without zeroing)movzx/movAs an addendum, it could be useful to have an API to get an unitialized intrinsics
VectorXXXfrom managed code to perform an unsafeWithElement, without paying the cost of the zeroing.