Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

Closed
EgorBo opened this issue Jun 4, 2020 · 4 comments · Fixed by #37458 or mono/mono#19916
Closed

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

EgorBo opened this issue Jun 4, 2020 · 4 comments · Fixed by #37458 or mono/mono#19916
Assignees
Labels
area-Codegen-JIT-mono untriaged New issue has not been triaged by the area owner

Comments

@EgorBo
Copy link
Member

EgorBo commented Jun 4, 2020

A while ago, Roslyn team implemented a hackfeature: dotnet/roslyn#24621 Refer directly to static data when ReadOnlySpan wraps arrays of bytes.. Technically, it allows to access static read-only arrays without allocations / static initialization, the syntax looks like this:

static ReadOnlySpan<byte> Arr => new byte[] {1, 2, 3, 4};

// access example:
byte GetByte(int i) => Arr[i];

CoreCLR:

; Method Program:GetByte(int):ubyte:this
       cmp      edx, 4   ;; bound check 
       jae      SHORT G_M24515_IG04
       movsxd   rax, edx
       mov      rdx, 0xD1FFAB1E
       movzx    rax, byte  ptr [rax+rdx]
       ret      

;; bound check exception:
G_M24515_IG04:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     

As you can see, the access is just a few mov's + bound check (can be eliminated in some cases).
For some reason Mono doesn't inline it get_Arr() call

Mono-LLVM:

0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   53                      push   %rbx
   1:   48 83 ec 10             sub    $0x10,%rsp
   5:   89 fb                   mov    %edi,%ebx
   7:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   b:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
  10:   48 b8 10 c1 44 b2 52    movabs $0x5652b244c110,%rax
  17:   56 00 00
  1a:   48 89 e7                mov    %rsp,%rdi
  1d:   ff 10                   callq  *(%rax)   ;;; get_Arr() call
  1f:   39 5c 24 08             cmp    %ebx,0x8(%rsp)
  23:   76 10                   jbe    35 <gram_GetByte__int_+0x35>
  25:   48 8b 04 24             mov    (%rsp),%rax
  29:   48 63 cb                movslq %ebx,%rcx
  2c:   8a 04 08                mov    (%rax,%rcx,1),%al
  2f:   48 83 c4 10             add    $0x10,%rsp
  33:   5b                      pop    %rbx
  34:   c3                      retq
  35:   48 b8 28 c1 44 b2 52    movabs $0x5652b244c128,%rax
  3c:   56 00 00
  3f:   bf 02 01 00 00          mov    $0x102,%edi
  44:   ff 10                   callq  *(%rax)

Another example:

static ReadOnlySpan<byte> Arr => new byte[] {1, 2, 3, 4};

byte GetByte(int i) => Arr[0];

CoreCLR:

; Method Program:GetByte(int):ubyte:this
       mov      rax, 0xD1FFAB1E
       movzx    rax, byte  ptr [rax]
       ret 

In this example CoreCLR didn't even emit any bound check!
Mono still emits get_Arr() and doesn't eliminate the bound check

Mono:

0000000000000000 <gram_GetByte__int_>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   8:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
   d:   48 b8 60 e9 d0 d6 54    movabs $0x5654d6d0e960,%rax
  14:   56 00 00
  17:   48 89 e7                mov    %rsp,%rdi
  1a:   ff 10                   callq  *(%rax)  ;; get_Arr() call
  1c:   83 7c 24 08 00          cmpl   $0x0,0x8(%rsp)
  21:   74 0b                   je     2e <gram_GetByte__int_+0x2e>
  23:   48 8b 04 24             mov    (%rsp),%rax
  27:   8a 00                   mov    (%rax),%al
  29:   48 83 c4 18             add    $0x18,%rsp
  2d:   c3                      retq
  2e:   48 b8 78 e9 d0 d6 54    movabs $0x5654d6d0e978,%rax
  35:   56 00 00
  38:   bf 02 01 00 00          mov    $0x102,%edi
  3d:   ff 10                   callq  *(%rax)

This is quite important since this hack is heavily used across the BCL in hot methods:
image
Note those "// uses C# compiler's optimization for static byte[] data" comments

@ghost
Copy link

ghost commented Jun 4, 2020

Tagging subscribers to this area: @lewurm
Notify danmosemsft if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jun 4, 2020
@marek-safar
Copy link
Contributor

/cc @SamMonoRT

monojenkins pushed a commit to monojenkins/mono that referenced this issue Jun 4, 2020
Partially fixes dotnet/runtime#37449
```csharp
static ReadOnlySpan<byte> Arr
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get => new byte[] {1, 2, 3, 4};
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static byte GetByte(int i) => Arr[0];
```
#### Codegen for GetByte() in LLVM-JIT mode:
```asm
0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   8:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
   d:   48 b8 b0 bd 55 d2 3f    movabs $0x563fd255bdb0,%rax
  14:   56 00 00
  17:   48 89 04 24             mov    %rax,(%rsp)
  1b:   c7 44 24 08 04 00 00    movl   $0x4,0x8(%rsp)
  22:   00
  23:   48 8b 04 24             mov    (%rsp),%rax
  27:   48 89 44 24 10          mov    %rax,0x10(%rsp)
  2c:   8b 44 24 08             mov    0x8(%rsp),%eax
  30:   89 44 24 18             mov    %eax,0x18(%rsp)
  34:   8b 44 24 0c             mov    0xc(%rsp),%eax
  38:   89 44 24 1c             mov    %eax,0x1c(%rsp)
  3c:   83 7c 24 18 00          cmpl   $0x0,0x18(%rsp)
  41:   74 0c                   je     4f <gram_GetByte__int_+0x4f>
  43:   48 8b 44 24 10          mov    0x10(%rsp),%rax
  48:   8a 00                   mov    (%rax),%al
  4a:   48 83 c4 28             add    $0x28,%rsp
  4e:   c3                      retq
  4f:   48 b8 d0 ae 42 d2 3f    movabs $0x563fd242aed0,%rax
  56:   56 00 00
  59:   bf 02 01 00 00          mov    $0x102,%edi
  5e:   ff 10                   callq  *(%rax)
```

Codegen after I appended `-sroa -instcombine` to the end of LLVM passes list:
```asm
0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   48 b8 d0 ef a6 ba 31    movabs $0x5631baa6efd0,%rax
   7:   56 00 00
   a:   8a 00                   mov    (%rax),%al
   c:   c3                      retq
```

This https://godbolt.org/z/YKjzsV link explains motivation (on the left is the "final" LLVM IR our llvm-jit produces after the default optimizations). Zoltan noticed that LLVM-AOT where we use `opt -O2` instead of custom pass order optimized that code perfectly.

The other issue remains: for some reason we don't inline `get_Arr()` without AggressiveInlining, coreclr does inline it:
```
Successfully inlined Program:get_Arr():System.ReadOnlySpan`1[Byte] (12 IL bytes) (depth 1) [below ALWAYS_INLINE size]
```
@lambdageek
Copy link
Member

Is this the same issue as mono/mono#18572 (comment) ?

@EgorBo
Copy link
Member Author

EgorBo commented Jun 6, 2020

Is this the same issue as mono/mono#18572 (comment) ?

Turns out this one is different, it's bad codegen after LLVM-JIT + inliner refuses to inline get_Arr
both problems are fixed in #37458 and #37544

EgorBo added a commit to mono/mono that referenced this issue Jun 8, 2020
Partially fixes dotnet/runtime#37449
```csharp
static ReadOnlySpan<byte> Arr
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get => new byte[] {1, 2, 3, 4};
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static byte GetByte(int i) => Arr[0];
```
#### Codegen for GetByte() in LLVM-JIT mode:
```asm
0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   8:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
   d:   48 b8 b0 bd 55 d2 3f    movabs $0x563fd255bdb0,%rax
  14:   56 00 00
  17:   48 89 04 24             mov    %rax,(%rsp)
  1b:   c7 44 24 08 04 00 00    movl   $0x4,0x8(%rsp)
  22:   00
  23:   48 8b 04 24             mov    (%rsp),%rax
  27:   48 89 44 24 10          mov    %rax,0x10(%rsp)
  2c:   8b 44 24 08             mov    0x8(%rsp),%eax
  30:   89 44 24 18             mov    %eax,0x18(%rsp)
  34:   8b 44 24 0c             mov    0xc(%rsp),%eax
  38:   89 44 24 1c             mov    %eax,0x1c(%rsp)
  3c:   83 7c 24 18 00          cmpl   $0x0,0x18(%rsp)
  41:   74 0c                   je     4f <gram_GetByte__int_+0x4f>
  43:   48 8b 44 24 10          mov    0x10(%rsp),%rax
  48:   8a 00                   mov    (%rax),%al
  4a:   48 83 c4 28             add    $0x28,%rsp
  4e:   c3                      retq
  4f:   48 b8 d0 ae 42 d2 3f    movabs $0x563fd242aed0,%rax
  56:   56 00 00
  59:   bf 02 01 00 00          mov    $0x102,%edi
  5e:   ff 10                   callq  *(%rax)
```

Codegen after I appended `-sroa -instcombine` to the end of LLVM passes list:
```asm
0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   48 b8 d0 ef a6 ba 31    movabs $0x5631baa6efd0,%rax
   7:   56 00 00
   a:   8a 00                   mov    (%rax),%al
   c:   c3                      retq
```

This https://godbolt.org/z/YKjzsV link explains motivation (on the left is the "final" LLVM IR our llvm-jit produces after the default optimizations). Zoltan noticed that LLVM-AOT where we use `opt -O2` instead of custom pass order optimized that code perfectly.

The other issue remains: for some reason we don't inline `get_Arr()` without AggressiveInlining, coreclr does inline it:
```
Successfully inlined Program:get_Arr():System.ReadOnlySpan`1[Byte] (12 IL bytes) (depth 1) [below ALWAYS_INLINE size]
```

Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Codegen-JIT-mono untriaged New issue has not been triaged by the area owner
Projects
None yet
5 participants