Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span.Fill(0) vs Span.Clear() #84126

Open
EgorBo opened this issue Mar 30, 2023 · 2 comments
Open

Span.Fill(0) vs Span.Clear() #84126

EgorBo opened this issue Mar 30, 2023 · 2 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Mar 30, 2023

static readonly int[] data = new int[1024];

[Benchmark]
[Arguments(1)]
[Arguments(8)]
[Arguments(32)]
[Arguments(100)]
public void Clear(int len) => data.AsSpan(0, len).Clear();

[Benchmark]
[Arguments(1)]
[Arguments(8)]
[Arguments(32)]
[Arguments(100)]
public void Fill(int len) => data.AsSpan(0, len).Fill(0);
| Method |  len |      Mean |
|------- |----- |----------:|
|  Clear |    1 |  1.898 ns |
|   Fill |    1 |  1.351 ns |

|  Clear |    8 |  1.614 ns |
|   Fill |    8 |  1.108 ns |

|  Clear |   32 |  2.949 ns |
|   Fill |   32 |  1.319 ns |

|  Clear |  100 |  3.099 ns |
|   Fill |  100 |  8.019 ns |

A couple of notes:

  1. Clear() calls either InitBlockUnaligned (which is lowered to memset when span's size is not known at jit time) or pinvokes into memset (+ pinvoke machinery in this case). So even for small buffers we end up calling memset.
  2. Fill(0) despite not using any external calls 2.5 times slower than memset for len=100
  3. Fill(0) is never unrolled for constant spans, e.g.:
void Clear1(int[] data, int len) => data.AsSpan(0, 20).Clear();
void Clear2(int[] data, int len) => data.AsSpan(0, 20).Fill(0);
; Method Program:Clear1(int[],int):this
       4883EC28             sub      rsp, 40
       C5F877               vzeroupper 
       4885D2               test     rdx, rdx
       7421                 je       SHORT G_M58891_IG04
       837A0814             cmp      dword ptr [rdx+08H], 20
       721B                 jb       SHORT G_M58891_IG04
       4883C210             add      rdx, 16
       C5FC57C0             vxorps   ymm0, ymm0
       C5FE7F02             vmovdqu  ymmword ptr[rdx], ymm0 ;; unrolled & vectorized with AVX !
       C5FE7F4220           vmovdqu  ymmword ptr[rdx+20H], ymm0
       C5FA7F4240           vmovdqu  xmmword ptr [rdx+40H], xmm0
       4883C428             add      rsp, 40
       C3                   ret      
G_M58891_IG04:              
       FF15AD4D0E00         call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       CC                   int3     
; Total bytes of code: 52


; Method Program:Clear2(int[],int):this
       4883EC28             sub      rsp, 40
       4885D2               test     rdx, rdx
       741E                 je       SHORT G_M47240_IG04
       837A0814             cmp      dword ptr [rdx+08H], 20
       7218                 jb       SHORT G_M47240_IG04
       488D4A10             lea      rcx, bword ptr [rdx+10H]
       BA14000000           mov      edx, 20
       4533C0               xor      r8d, r8d
       FF15B7968000         call     [System.SpanHelpers:Fill[int](byref,ulong,int)]
       90                   nop      
       4883C428             add      rsp, 40
       C3                   ret      
G_M47240_IG04:              
       FF15634D0E00         call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       CC                   int3     
; Total bytes of code: 46
[Benchmark()]
public void Clear_32() => data.AsSpan(0, 32).Clear();

[Benchmark()]
public void Fill_32() => data.AsSpan(0, 32).Fill(0);
|   Method |      Mean |
|--------- |----------:|
| Clear_32 | 0.3312 ns |
|  Fill_32 | 1.2964 ns |
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 30, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 30, 2023
@ghost
Copy link

ghost commented Mar 30, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details
static readonly int[] data = new int[1024];

[Benchmark]
[Arguments(1)]
[Arguments(8)]
[Arguments(32)]
[Arguments(100)]
public void Clear(int len) => data.AsSpan(0, len).Clear();

[Benchmark]
[Arguments(1)]
[Arguments(8)]
[Arguments(32)]
[Arguments(100)]
public void Fill(int len) => data.AsSpan(0, len).Fill(0);
| Method |  len |      Mean |
|------- |----- |----------:|
|  Clear |    1 |  1.898 ns |
|   Fill |    1 |  1.351 ns |

|  Clear |    8 |  1.614 ns |
|   Fill |    8 |  1.108 ns |

|  Clear |   32 |  2.949 ns |
|   Fill |   32 |  1.319 ns |

|  Clear |  100 |  3.099 ns |
|   Fill |  100 |  8.019 ns |

A couple of notes:

  1. Clear() calls either InitBlockUnaligned (which is lowered to memset when span's size is not known at jit time) or pinvokes into memset (+ pinvoke machinery in this case). So even for small buffers we end up calling memset.
  2. Fill(0) despite not using any external calls 2.5 times slower than memset for len=100
  3. Fill(0) is never unrolled for constant spans, e.g.:
void Clear1(int[] data, int len) => data.AsSpan(0, 20).Clear();
void Clear2(int[] data, int len) => data.AsSpan(0, 20).Fill(0);
; Method Program:Clear1(int[],int):this
       4883EC28             sub      rsp, 40
       C5F877               vzeroupper 
       4885D2               test     rdx, rdx
       7421                 je       SHORT G_M58891_IG04
       837A0814             cmp      dword ptr [rdx+08H], 20
       721B                 jb       SHORT G_M58891_IG04
       4883C210             add      rdx, 16
       C5FC57C0             vxorps   ymm0, ymm0
       C5FE7F02             vmovdqu  ymmword ptr[rdx], ymm0 ;; unrolled & vectorized with AVX !
       C5FE7F4220           vmovdqu  ymmword ptr[rdx+20H], ymm0
       C5FA7F4240           vmovdqu  xmmword ptr [rdx+40H], xmm0
       4883C428             add      rsp, 40
       C3                   ret      
G_M58891_IG04:              
       FF15AD4D0E00         call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       CC                   int3     
; Total bytes of code: 52


; Method Program:Clear2(int[],int):this
       4883EC28             sub      rsp, 40
       4885D2               test     rdx, rdx
       741E                 je       SHORT G_M47240_IG04
       837A0814             cmp      dword ptr [rdx+08H], 20
       7218                 jb       SHORT G_M47240_IG04
       488D4A10             lea      rcx, bword ptr [rdx+10H]
       BA14000000           mov      edx, 20
       4533C0               xor      r8d, r8d
       FF15B7968000         call     [System.SpanHelpers:Fill[int](byref,ulong,int)]
       90                   nop      
       4883C428             add      rsp, 40
       C3                   ret      
G_M47240_IG04:              
       FF15634D0E00         call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       CC                   int3     
; Total bytes of code: 46
[Benchmark()]
public void Clear_32() => data.AsSpan(0, 32).Clear();

[Benchmark()]
public void Fill_32() => data.AsSpan(0, 32).Fill(0);
|   Method |      Mean |
|--------- |----------:|
| Clear_32 | 0.3312 ns |
|  Fill_32 | 1.2964 ns |
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Apr 4, 2023
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2023
@JulieLeeMSFT
Copy link
Member

Pushing out to Future for this optimization work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants