Skip to content

Suboptimal code patterns when using Unsafe methods + intrinsics #12201

Description

@redknightlois

Micro-optimizing a particular piece of code I found suboptimal codegen introduced by the signature of the Unsafe class design that could be fixed by the JIT.

Say that I need to read data from 2 different memory locations where offset is an int

matches = Sse2.MoveMask(Sse2.CompareEqual(LoadVector128(ref first, (IntPtr)offset), LoadVector128(ref second, (IntPtr)offset)));

Now you can see that it is performing 2 times the same operation.

**movsxd      r8,eax  
vmovupd     xmm0,xmmword ptr [rcx+r8]  
**movsxd      r8,eax  
vmovupd     xmm1,xmmword ptr [rdx+r8]  
vpcmpeqb    xmm0,xmm0,xmm1  
vpmovmskb   r8d,xmm0  

This has been solved (somehow) for AVX2 but it also introduce another strange behavior:

matches = Avx2.MoveMask(Avx2.CompareEqual(LoadVector256(ref first, (IntPtr)offset), LoadVector256(ref second, (IntPtr)offset)));

As you can see not only we copy with sign extension but we are also coping it into r9. While at the architectural level that is a simple rename (better than the other one) we are still issuing an extra operation.

**movsxd      r8,eax  
**mov         r9,r8  
vmovupd     ymm0,ymmword ptr [rcx+r9]  
vmovupd     ymm1,ymmword ptr [rdx+r8]  
vpcmpeqb    ymm0,ymm0,ymm1  
vpmovmskb   r8d,ymm0  

What I dont understand is why if eax has been set in the same code (not coming from anywhere else) the JIT decides to use an extra mov operation instead of emitting:

vmovupd     ymm0,ymmword ptr [rcx+eax]  
vmovupd     ymm1,ymmword ptr [rdx+eax]  
vpcmpeqb    ymm0,ymm0,ymm1  
vpmovmskb   r8d,ymm0  

And, futhermore, this can also be optimized to:

vmovupd     ymm0,ymmword ptr [rcx+eax]  
vpcmpeqb    ymm0,ymm0,ymmword ptr [rdx+eax]  
vpmovmskb   r8d,ymm0  

I am running nightly from today. 3.0.0-preview4-27506-5
Any idea how I can achieve the latter code?

category:cq
theme:hardware-intrinsics
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions