[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

EgorBo · 2020-06-04T21:18:06Z

A while ago, Roslyn team implemented a ~~hack~~feature: dotnet/roslyn#24621 Refer directly to static data when ReadOnlySpan wraps arrays of bytes.. Technically, it allows to access static read-only arrays without allocations / static initialization, the syntax looks like this:

static ReadOnlySpan<byte> Arr => new byte[] {1, 2, 3, 4};

// access example:
byte GetByte(int i) => Arr[i];

CoreCLR:

; Method Program:GetByte(int):ubyte:this
       cmp      edx, 4   ;; bound check 
       jae      SHORT G_M24515_IG04
       movsxd   rax, edx
       mov      rdx, 0xD1FFAB1E
       movzx    rax, byte  ptr [rax+rdx]
       ret      

;; bound check exception:
G_M24515_IG04:
       call     CORINFO_HELP_RNGCHKFAIL
       int3

As you can see, the access is just a few mov's + bound check (can be eliminated in some cases).
For some reason Mono doesn't inline it get_Arr() call

Mono-LLVM:

0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   53                      push   %rbx
   1:   48 83 ec 10             sub    $0x10,%rsp
   5:   89 fb                   mov    %edi,%ebx
   7:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   b:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
  10:   48 b8 10 c1 44 b2 52    movabs $0x5652b244c110,%rax
  17:   56 00 00
  1a:   48 89 e7                mov    %rsp,%rdi
  1d:   ff 10                   callq  *(%rax)   ;;; get_Arr() call
  1f:   39 5c 24 08             cmp    %ebx,0x8(%rsp)
  23:   76 10                   jbe    35 <gram_GetByte__int_+0x35>
  25:   48 8b 04 24             mov    (%rsp),%rax
  29:   48 63 cb                movslq %ebx,%rcx
  2c:   8a 04 08                mov    (%rax,%rcx,1),%al
  2f:   48 83 c4 10             add    $0x10,%rsp
  33:   5b                      pop    %rbx
  34:   c3                      retq
  35:   48 b8 28 c1 44 b2 52    movabs $0x5652b244c128,%rax
  3c:   56 00 00
  3f:   bf 02 01 00 00          mov    $0x102,%edi
  44:   ff 10                   callq  *(%rax)

Another example:

static ReadOnlySpan<byte> Arr => new byte[] {1, 2, 3, 4};

byte GetByte(int i) => Arr[0];

CoreCLR:

; Method Program:GetByte(int):ubyte:this
       mov      rax, 0xD1FFAB1E
       movzx    rax, byte  ptr [rax]
       ret

In this example CoreCLR didn't even emit any bound check!
Mono still emits get_Arr() and doesn't eliminate the bound check

Mono:

0000000000000000 <gram_GetByte__int_>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   8:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
   d:   48 b8 60 e9 d0 d6 54    movabs $0x5654d6d0e960,%rax
  14:   56 00 00
  17:   48 89 e7                mov    %rsp,%rdi
  1a:   ff 10                   callq  *(%rax)  ;; get_Arr() call
  1c:   83 7c 24 08 00          cmpl   $0x0,0x8(%rsp)
  21:   74 0b                   je     2e <gram_GetByte__int_+0x2e>
  23:   48 8b 04 24             mov    (%rsp),%rax
  27:   8a 00                   mov    (%rax),%al
  29:   48 83 c4 18             add    $0x18,%rsp
  2d:   c3                      retq
  2e:   48 b8 78 e9 d0 d6 54    movabs $0x5654d6d0e978,%rax
  35:   56 00 00
  38:   bf 02 01 00 00          mov    $0x102,%edi
  3d:   ff 10                   callq  *(%rax)

This is quite important since this hack is heavily used across the BCL in hot methods:

Note those "// uses C# compiler's optimization for static byte[] data" comments

The text was updated successfully, but these errors were encountered:

ghost · 2020-06-04T21:18:07Z

Tagging subscribers to this area: @lewurm
Notify danmosemsft if you want to be subscribed.

marek-safar · 2020-06-04T21:22:51Z

/cc @SamMonoRT

Partially fixes dotnet/runtime#37449 ```csharp static ReadOnlySpan<byte> Arr { [MethodImpl(MethodImplOptions.AggressiveInlining)] get => new byte[] {1, 2, 3, 4}; } [MethodImpl(MethodImplOptions.NoInlining)] public static byte GetByte(int i) => Arr[0]; ``` #### Codegen for GetByte() in LLVM-JIT mode: ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 83 ec 28 sub $0x28,%rsp 4: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0 8: c5 f8 29 04 24 vmovaps %xmm0,(%rsp) d: 48 b8 b0 bd 55 d2 3f movabs $0x563fd255bdb0,%rax 14: 56 00 00 17: 48 89 04 24 mov %rax,(%rsp) 1b: c7 44 24 08 04 00 00 movl $0x4,0x8(%rsp) 22: 00 23: 48 8b 04 24 mov (%rsp),%rax 27: 48 89 44 24 10 mov %rax,0x10(%rsp) 2c: 8b 44 24 08 mov 0x8(%rsp),%eax 30: 89 44 24 18 mov %eax,0x18(%rsp) 34: 8b 44 24 0c mov 0xc(%rsp),%eax 38: 89 44 24 1c mov %eax,0x1c(%rsp) 3c: 83 7c 24 18 00 cmpl $0x0,0x18(%rsp) 41: 74 0c je 4f <gram_GetByte__int_+0x4f> 43: 48 8b 44 24 10 mov 0x10(%rsp),%rax 48: 8a 00 mov (%rax),%al 4a: 48 83 c4 28 add $0x28,%rsp 4e: c3 retq 4f: 48 b8 d0 ae 42 d2 3f movabs $0x563fd242aed0,%rax 56: 56 00 00 59: bf 02 01 00 00 mov $0x102,%edi 5e: ff 10 callq *(%rax) ``` Codegen after I appended `-sroa -instcombine` to the end of LLVM passes list: ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 b8 d0 ef a6 ba 31 movabs $0x5631baa6efd0,%rax 7: 56 00 00 a: 8a 00 mov (%rax),%al c: c3 retq ``` This https://godbolt.org/z/YKjzsV link explains motivation (on the left is the "final" LLVM IR our llvm-jit produces after the default optimizations). Zoltan noticed that LLVM-AOT where we use `opt -O2` instead of custom pass order optimized that code perfectly. The other issue remains: for some reason we don't inline `get_Arr()` without AggressiveInlining, coreclr does inline it: ``` Successfully inlined Program:get_Arr():System.ReadOnlySpan`1[Byte] (12 IL bytes) (depth 1) [below ALWAYS_INLINE size] ```

lambdageek · 2020-06-05T15:10:19Z

Is this the same issue as mono/mono#18572 (comment) ?

EgorBo · 2020-06-06T14:42:21Z

Is this the same issue as mono/mono#18572 (comment) ?

Turns out this one is different, it's bad codegen after LLVM-JIT + inliner refuses to inline get_Arr
both problems are fixed in #37458 and #37544

Partially fixes dotnet/runtime#37449 ```csharp static ReadOnlySpan<byte> Arr { [MethodImpl(MethodImplOptions.AggressiveInlining)] get => new byte[] {1, 2, 3, 4}; } [MethodImpl(MethodImplOptions.NoInlining)] public static byte GetByte(int i) => Arr[0]; ``` #### Codegen for GetByte() in LLVM-JIT mode: ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 83 ec 28 sub $0x28,%rsp 4: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0 8: c5 f8 29 04 24 vmovaps %xmm0,(%rsp) d: 48 b8 b0 bd 55 d2 3f movabs $0x563fd255bdb0,%rax 14: 56 00 00 17: 48 89 04 24 mov %rax,(%rsp) 1b: c7 44 24 08 04 00 00 movl $0x4,0x8(%rsp) 22: 00 23: 48 8b 04 24 mov (%rsp),%rax 27: 48 89 44 24 10 mov %rax,0x10(%rsp) 2c: 8b 44 24 08 mov 0x8(%rsp),%eax 30: 89 44 24 18 mov %eax,0x18(%rsp) 34: 8b 44 24 0c mov 0xc(%rsp),%eax 38: 89 44 24 1c mov %eax,0x1c(%rsp) 3c: 83 7c 24 18 00 cmpl $0x0,0x18(%rsp) 41: 74 0c je 4f <gram_GetByte__int_+0x4f> 43: 48 8b 44 24 10 mov 0x10(%rsp),%rax 48: 8a 00 mov (%rax),%al 4a: 48 83 c4 28 add $0x28,%rsp 4e: c3 retq 4f: 48 b8 d0 ae 42 d2 3f movabs $0x563fd242aed0,%rax 56: 56 00 00 59: bf 02 01 00 00 mov $0x102,%edi 5e: ff 10 callq *(%rax) ``` Codegen after I appended `-sroa -instcombine` to the end of LLVM passes list: ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 b8 d0 ef a6 ba 31 movabs $0x5631baa6efd0,%rax 7: 56 00 00 a: 8a 00 mov (%rax),%al c: c3 retq ``` This https://godbolt.org/z/YKjzsV link explains motivation (on the left is the "final" LLVM IR our llvm-jit produces after the default optimizations). Zoltan noticed that LLVM-AOT where we use `opt -O2` instead of custom pass order optimized that code perfectly. The other issue remains: for some reason we don't inline `get_Arr()` without AggressiveInlining, coreclr does inline it: ``` Successfully inlined Program:get_Arr():System.ReadOnlySpan`1[Byte] (12 IL bytes) (depth 1) [below ALWAYS_INLINE size] ``` Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>

EgorBo added the area-Codegen-JIT-mono label Jun 4, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jun 4, 2020

SamMonoRT assigned imhameed Jun 4, 2020

EgorBo mentioned this issue Jun 4, 2020

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager #37458

Merged

monojenkins mentioned this issue Jun 4, 2020

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager mono/mono#19916

Merged

EgorBo closed this as completed in mono/mono#19916 Jun 8, 2020

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

EgorBo commented Jun 4, 2020 •

edited

Loading

ghost commented Jun 4, 2020

marek-safar commented Jun 4, 2020

lambdageek commented Jun 5, 2020

EgorBo commented Jun 6, 2020

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

Comments

EgorBo commented Jun 4, 2020 • edited Loading

CoreCLR:

Mono-LLVM:

CoreCLR:

Mono:

ghost commented Jun 4, 2020

marek-safar commented Jun 4, 2020

lambdageek commented Jun 5, 2020

EgorBo commented Jun 6, 2020

EgorBo commented Jun 4, 2020 •

edited

Loading