Struct copy using movs rather than SSE on x64 #7469

benaadams · 2017-02-19T21:23:47Z

Assigning a struct to another static variable struct (Memory<T> below)

public struct Memory<T> : IEquatable<Memory<T>>, IEquatable<ReadOnlyMemory<T>>
{
    readonly OwnedMemory<T> _owner;
    readonly int _id;
    readonly int _index;
    readonly int _length;
}

Can generate a movs copy

movs        qword ptr [rdi],qword ptr [rsi]  
movs        qword ptr [rdi],qword ptr [rsi]

Would this be better as a SSE copy?

movdqu      xmm0,xmmword ptr [rsi]  
movdqu      xmmword ptr [rdi],xmm0

As seen in dotnet/corefxlab#1227 (comment)

/cc @mikedn
category:cq
theme:block-opts
skill-level:intermediate
cost:small
impact:medium

The text was updated successfully, but these errors were encountered:

mikedn · 2017-02-19T22:12:31Z

To be clear the entire copy code looks like this:

; rsi loaded with the source address and rdi loaded with the destination address
; copy _owner field, GC write barrier, rsi += 8, rdi += 8
E89FEE0C5F           call     CORINFO_HELP_ASSIGN_BYREF
; copy 16 bytes: _id, _index, _length and 4 padding bytes
48A5                 movsq
48A5                 movsq

If there's no reference field then the generated code is:

; copy 12 bytes
488B08               mov      rcx, qword ptr [rax]
48890A               mov      qword ptr [rdx], rcx
8B4808               mov      ecx, dword ptr [rax+8]
894A08               mov      dword ptr [rdx+8], ecx

If there's no reference field and another int field is added so the struct has 16 bytes then we get this:

; copy 16 bytes
C4E17A6F00           vmovdqu  xmm0, qword ptr [rax]
C4E17A7F02           vmovdqu  qword ptr [rdx], xmm0

If instead of a reference field we use a long field then we get this:

; copy 24 bytes
C4E17A6F00           vmovdqu  xmm0, qword ptr [rax]
C4E17A7F02           vmovdqu  qword ptr [rdx], xmm0
488B4810             mov      rcx, qword ptr [rax+16]
48894A10             mov      qword ptr [rdx+16], rcx

So movs is used when the struct contains reference fields, it complements CORINFO_HELP_ASSIGN_BYREF which uses rsi and rdi just like movs does.

The use of movs may be worth investigation, it doesn't have a good reputation performance wise. For example, on Skylake movs requires 5 uops instead of 1 or 2 needed by movdqu. That said, its association with helper calls and memory accesses might mean that the overhead is too small to matter.

mikedn · 2017-02-20T18:37:24Z

Some quick testing seems to indicate that when vmovdqu is used the code is ~1.5x faster. It would seem that movs is indeed very slow, especially considering that the test code includes 2 calls.

huoyaoyuan · 2024-05-06T06:59:24Z

I believe this can be closed now. Struct copying now uses SSE/AVX when appropriate:

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKBuIGYACXDKAVzA0YFkZ9oAnjQDeNRuMaxsAEwgA7ADYDGASzlcA+hADucmFADcYiVNmLlazSulHqEyTBnylq9Yw1rpMBLfunnFm4aCjByAOYYABa+EpaM2DHiccCJrlxgqXE2NAC+NHRMZIwAwoyidhK8/FDK+KnG4sQoPAAUVYKMAG7YCmwwAJSMALwAfIz4w109fbY5QA

C.M(Memory)
    L0000: vzeroupper
    L0003: vmovdqu ymm0, [rdx]
    L0007: vmovdqu [rcx+8], ymm0
    L000c: vzeroupper
    L000f: ret

Struct with GC reference may use rep movsx though.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023

This was referenced May 15, 2023

[Perf] Windows/x64: 4 Regressions on 5/2/2023 10:35:24 AM #85987

Closed

JIT: Fix new helper calls for some block copies involving promoted locals #86246

Merged

jakobbotsch mentioned this issue May 15, 2024

Use SIMD for block inits with GC fields #102132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Struct copy using movs rather than SSE on x64 #7469

Struct copy using movs rather than SSE on x64 #7469

benaadams commented Feb 19, 2017 •

edited by BruceForstall

Loading

mikedn commented Feb 19, 2017

mikedn commented Feb 20, 2017

huoyaoyuan commented May 6, 2024

Struct copy using movs rather than SSE on x64 #7469

Struct copy using movs rather than SSE on x64 #7469

Comments

benaadams commented Feb 19, 2017 • edited by BruceForstall Loading

mikedn commented Feb 19, 2017

mikedn commented Feb 20, 2017

huoyaoyuan commented May 6, 2024

benaadams commented Feb 19, 2017 •

edited by BruceForstall

Loading