Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd side-effect check code generation with fixed keyword #10325

Open
ArtBlnd opened this issue May 14, 2018 · 14 comments
Open

Odd side-effect check code generation with fixed keyword #10325

ArtBlnd opened this issue May 14, 2018 · 14 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@ArtBlnd
Copy link

ArtBlnd commented May 14, 2018

There is something wired side-effect check code generation on fixed pointers.
I don't know its because Roslyn generates wrong MSIL. but first. I am writing here.

Example here.

static unsafe int Test(int[] arr)
{
    int total = 0;

    fixed (int* pArr = arr)
    {
        for (int i = 0; i < 4; ++i)
        {
            total += pArr[i];
        }
    }

    return total;
}

This code generates some odd check codes. here is disassembles.

    12:             int total = 0;
00007FFC545304E2 EC                   in          al,dx  
00007FFC545304E3 28 33                sub         byte ptr [rbx],dh  
00007FFC545304E5 C0 48 89 44          ror         byte ptr [rax-77h],44h  
00007FFC545304E9 24 20                and         al,20h  
00007FFC545304EB 33 C0                xor         eax,eax  
    13: 
    14:             fixed (int* pArr = arr)
00007FFC545304ED 48 89 4C 24 20       mov         qword ptr [rsp+20h],rcx  
00007FFC545304F2 48 85 C9             test        rcx,rcx  
00007FFC545304F5 74 0B                je          00007FFC54530502  
00007FFC545304F7 48 8B 54 24 20       mov         rdx,qword ptr [rsp+20h]  
00007FFC545304FC 83 7A 08 00          cmp         dword ptr [rdx+8],0  
00007FFC54530500 75 04                jne         00007FFC54530506  
00007FFC54530502 33 D2                xor         edx,edx  
00007FFC54530504 EB 14                jmp         00007FFC5453051A  
00007FFC54530506 48 8B 54 24 20       mov         rdx,qword ptr [rsp+20h]  
00007FFC5453050B 83 7A 08 00          cmp         dword ptr [rdx+8],0  
00007FFC5453050F 76 25                jbe         00007FFC54530536  
00007FFC54530511 48 8B 54 24 20       mov         rdx,qword ptr [rsp+20h]  
00007FFC54530516 48 83 C2 10          add         rdx,10h  
    15:             {
    16:                 for (int i = 0; i < 4; ++i)
00007FFC5453051A 33 C9                xor         ecx,ecx  
    17:                 {
    18:                     total += pArr[i];
00007FFC5453051C 4C 63 C1             movsxd      r8,ecx  
00007FFC5453051F 42 03 04 82          add         eax,dword ptr [rdx+r8*4]  
    16:                 for (int i = 0; i < 4; ++i)
00007FFC54530523 FF C1                inc         ecx  
00007FFC54530525 83 F9 04             cmp         ecx,4  
00007FFC54530528 7C F2                jl          00007FFC5453051C  
00007FFC5453052A 33 D2                xor         edx,edx  
00007FFC5453052C 48 89 54 24 20       mov         qword ptr [rsp+20h],rdx  
00007FFC54530531 48 83 C4 28          add         rsp,28h  
00007FFC54530535 C3                   ret  
00007FFC54530536 E8 05 50 C4 5F       call        00007FFCB4175540  
00007FFC5453053B CC                   int         3  

As you can see. its double checking object.

But this code. generates as I excepted.

static unsafe int Test(int[] arr)
{
    int total = 0;

    fixed (int* pArr = &arr[0])
    {
        for (int i = 0; i < 4; ++i)
        {
            total += pArr[i];
        }
    }

    return total;
}
    12:             int total = 0;
00007FFC545604E2 EC                   in          al,dx  
00007FFC545604E3 28 33                sub         byte ptr [rbx],dh  
00007FFC545604E5 C0 48 89 44          ror         byte ptr [rax-77h],44h  
00007FFC545604E9 24 20                and         al,20h  
00007FFC545604EB 33 C0                xor         eax,eax  
00007FFC545604ED 83 79 08 00          cmp         dword ptr [rcx+8],0  
00007FFC545604F1 76 2A                jbe         00007FFC5456051D  
00007FFC545604F3 48 83 C1 10          add         rcx,10h  
00007FFC545604F7 48 89 4C 24 20       mov         qword ptr [rsp+20h],rcx  
    13: 
    14:             fixed (int* pArr = &arr[0])
00007FFC545604FC 48 8B 54 24 20       mov         rdx,qword ptr [rsp+20h]  
    15:             {
    16:                 for (int i = 0; i < 4; ++i)
00007FFC54560501 33 C9                xor         ecx,ecx  
    17:                 {
    18:                     total += pArr[i];
00007FFC54560503 4C 63 C1             movsxd      r8,ecx  
00007FFC54560506 42 03 04 82          add         eax,dword ptr [rdx+r8*4]  
    16:                 for (int i = 0; i < 4; ++i)
00007FFC5456050A FF C1                inc         ecx  
00007FFC5456050C 83 F9 04             cmp         ecx,4  
00007FFC5456050F 7C F2                jl          00007FFC54560503  
00007FFC54560511 33 D2                xor         edx,edx  
00007FFC54560513 48 89 54 24 20       mov         qword ptr [rsp+20h],rdx  
00007FFC54560518 48 83 C4 28          add         rsp,28h  
00007FFC5456051C C3                   ret  
00007FFC5456051D E8 1E 50 C1 5F       call        00007FFCB4175540  
00007FFC54560522 CC                   int         3  

category:cq
theme:basic-cq
skill-level:intermediate
cost:small
impact:medium

@ArtBlnd
Copy link
Author

ArtBlnd commented May 14, 2018

@mikedn

@ArtBlnd ArtBlnd changed the title Bad side-effect check code generation with fixed pointers Odd side-effect check code generation with fixed pointers May 14, 2018
@ArtBlnd ArtBlnd changed the title Odd side-effect check code generation with fixed pointers Odd side-effect check code generation with fixed keyword May 14, 2018
@mikedn
Copy link
Contributor

mikedn commented May 14, 2018

Right, fixed (int* pArr = arr) has special semantics - if the array is null or empty it produces a null pointer rather than throwing NullReferenceException/IndexOutOfRangeException. So the generated IL first checks for null, then checks if the array length is 0 and then uses ldelema to take the address of the first element in the array. This ldelema generates a range check for index 0 and it seems that the JIT doesn't know how to eliminate it (have to check why, most likely because the previous check for length 0 is a !=0 comparison that the range check elimination code ignores).

The fixed (int* pArr = &arr[0]) doesn't have this problem since it simply tries to get the address of the first array element, this one will produce a single range check (again from ldelema).

BTW - the disassembly you get by using COMPLUS_JitDisasm (requires a Checked build of CoreCLR) is usually preferable to the disassembly from VS:

G_M55886_IG01:
       4883EC28             sub      rsp, 40
       33C0                 xor      rax, rax
       4889442420           mov      qword ptr [rsp+20H], rax
G_M55886_IG02:
       33C0                 xor      eax, eax
       48894C2420           mov      gword ptr [rsp+20H], rcx
       4885C9               test     rcx, rcx
       740B                 je       SHORT G_M55886_IG03
       488B542420           mov      rdx, gword ptr [rsp+20H]
       837A0800             cmp      dword ptr [rdx+8], 0
       7504                 jne      SHORT G_M55886_IG04
G_M55886_IG03:
       33D2                 xor      rdx, rdx
       EB14                 jmp      SHORT G_M55886_IG05
G_M55886_IG04:
       488B542420           mov      rdx, gword ptr [rsp+20H]
       837A0800             cmp      dword ptr [rdx+8], 0
       7625                 jbe      SHORT G_M55886_IG09
       488B542420           mov      rdx, gword ptr [rsp+20H]
       4883C210             add      rdx, 16
G_M55886_IG05:
       33C9                 xor      ecx, ecx
G_M55886_IG06:
       4C63C1               movsxd   r8, ecx
       42030482             add      eax, dword ptr [rdx+4*r8]
       FFC1                 inc      ecx
       83F904               cmp      ecx, 4
       7CF2                 jl       SHORT G_M55886_IG06
G_M55886_IG07:
       33D2                 xor      rdx, rdx
       4889542420           mov      gword ptr [rsp+20H], rdx
G_M55886_IG08:
       4883C428             add      rsp, 40
       C3                   ret
G_M55886_IG09:
       E805C4385F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

@ArtBlnd
Copy link
Author

ArtBlnd commented May 14, 2018

Sorry, I was fixing my coreclr repository(its emitting unrolled loop so), so I couldn't use for it. I will dump it with COMPLUS_JitDisasm next time. :>

@ArtBlnd
Copy link
Author

ArtBlnd commented May 15, 2018

I will figure it out after fixing basic loop unrolling issues.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 16, 2018

@mikedn Wait. there is another issues here.

       837A0800             cmp      dword ptr [rdx+8], 0
       7625                 jbe      SHORT G_M55886_IG09

pointers cannot be below. why isn't this code use test instead of cmp?

@mikedn
Copy link
Contributor

mikedn commented May 16, 2018

test would require loading the value at [rdx + 8] in a register so the code would be larger and require a temporary register.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 16, 2018

G_M46070_IG01:
       4883EC28             sub      rsp, 40
       33C0                 xor      rax, rax
       4889442420           mov      qword ptr [rsp+20H], rax

G_M46070_IG02:
       83790800             cmp      dword ptr [rcx+8], 0
       7627                 jbe      SHORT G_M46070_IG04
       4883C110             add      rcx, 16
       48894C2420           mov      bword ptr [rsp+20H], rcx
       488B442420           mov      rax, bword ptr [rsp+20H]
       8B10                 mov      edx, dword ptr [rax]
       035004               add      edx, dword ptr [rax+4]
       035008               add      edx, dword ptr [rax+8]
       03500C               add      edx, dword ptr [rax+12]
       33C0                 xor      rax, rax
       4889442420           mov      bword ptr [rsp+20H], rax
       8BC2                 mov      eax, edx

G_M46070_IG03:
       4883C428             add      rsp, 40
       C3                   ret      

G_M46070_IG04:
       E88391BB5E           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3 

How about this?

       488B442420           mov      rax, bword ptr [rsp+20H]

rax will be reinitialized here.
we can just use rax register for temp register.

@mikedn
Copy link
Contributor

mikedn commented May 16, 2018

That's a side effect of using fixed - for GC to know that the object (array in this case) must not be moved the pointer must be stored in a stack slot marked as pinned. You can see this in the list of local variable that appears in the JitDisasm output:

;  V03 loc2         [V03    ] (  4,  4   )   byref  ->  [rsp+0x20]   must-init pinned

That's also why [rsp+20] is zeroed out in the function prolog and again zeroed out after the loop, at the end of the fixed block. Zeroing out the variable "unpins" the object allowing the GC to move it as needed.

It may be possible to improve this but it's not very clear how. I suppose the register allocator might be able to figure out that rcx was just stored at [rsp+20h] and then somehow change mov rax, [rsp+20h] into mov rax, rcx. Or better, allocate rcx instead of rax and avoid the mov completely.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 16, 2018

figuring out either after fixup loop-unrolling issues.

@mikedn
Copy link
Contributor

mikedn commented May 16, 2018

This reminds me that today there are better ways to avoid range checks than fixed - using ref locals and System.Runtime.CompilerServices.Unsafe methods:

int s = 0;
ref int p = ref a[0];
for (int i = 0; i < a.Length; i++)
    s += Unsafe.Add(ref p, i);
return s;

This generates:

G_M55887_IG01:
       4883EC28             sub      rsp, 40
G_M55887_IG02:
       33C0                 xor      eax, eax
       8B5108               mov      edx, dword ptr [rcx+8]
       83FA00               cmp      edx, 0
       761F                 jbe      SHORT G_M55887_IG05
       4883C110             add      rcx, 16
       4533C0               xor      r8d, r8d
       85D2                 test     edx, edx
       7E0F                 jle      SHORT G_M55887_IG04
G_M55887_IG03:
       4D63C8               movsxd   r9, r8d
       42030489             add      eax, dword ptr [rcx+4*r9]
       41FFC0               inc      r8d
       413BD0               cmp      edx, r8d
       7FF1                 jg       SHORT G_M55887_IG03
G_M55887_IG04:
       4883C428             add      rsp, 40
       C3                   ret
G_M55887_IG05:
       E8EE462E5F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

@ArtBlnd
Copy link
Author

ArtBlnd commented May 16, 2018

Seems not really simple, friendly code for programmers. not for front-line programmers.
If CoreCLR's goal is just avoiding range checks for performance.
I mean for who well-learned that how compiler optimizes or works. they don't need like this.

the goal should be more simpler for front-line programmers.

this is my opinion

for example, if there is such soo many arithmetical process using Unsafe.*** on code.
it will be not really fun for everyone.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 16, 2018

And also. this is not only for this problem.
as you can see.

       8B5108               mov      edx, dword ptr [rcx+8]
       83FA00               cmp      edx, 0
       761F                 jbe      SHORT G_M55887_IG05

this doesn't uses test instruction for checking. pointers are not signed.
I can't sure its pointer, index or something else. but it seems something is wrong with it.

And here does correctly.

       85D2                 test     edx, edx
       7E0F                 jle      SHORT G_M55887_IG04

@mikedn
Copy link
Contributor

mikedn commented May 16, 2018

Seems not really simple, friendly code for programmers. not for front-line programmers.

Certainly, but the whole point of this was to give you something that's more easy to unroll. Attempting to unroll a similar loop but with range checks is problematic due the increase in code size and number of branches. At the end of the day the difficulty is not to unroll the loop but to decide when it's beneficial to do so. And array range checks complicate that.

@mikedn
Copy link
Contributor

mikedn commented May 16, 2018

this doesn't uses test instruction for checking

Yeah, I know. I've made a fix for this a while ago but didn't bother pushing it. a[0] isn't that common.

this doesn't uses test instruction for checking

I'm not sure what you're trying to say here. This is an array length, not a pointer.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@TIHan TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

5 participants