Skip to content

Vector128<byte>:get_Zero() doesn't inline (or intrinsicify) at crossgen #32714

@benaadams

Description

@benaadams

Nor does Vector128<byte>.Count() or Vector128:AsByte(Vector128`1):Vector128`1

ASCIIUtility.WidenAsciiToUtf16_Sse2 calls Vector128<byte>.Zero

// Then perform an unaligned write of the first part of the input buffer.
Vector128<byte> zeroVector = Vector128<byte>.Zero;
utf16FirstHalfVector = Sse2.UnpackLow(asciiVector, zeroVector);

Which ends up reserving stack, making call and reading the stack back to zero a xmm register:

G_M55642_IG05:
       lea      rcx, [rsp+20H]
       call     [Vector128`1:get_Zero():Vector128`1]
       movaps   xmm0, xmmword ptr [rsp+20H]
       movaps   xmm1, xmm6
       punpcklbw xmm1, xmm0
       movdqu   xmmword ptr [rsi], xmm1
       mov      rax, rsi
       shr      rax, 1
       and      rax, 7
       mov      edx, 8
       sub      rdx, rax
       mov      rax, rdx
       sub      rbx, 16

Which is quite inefficient

/cc @tannergooding @GrabYourPitchforks

category:cq
theme:intrinsics
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions