Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: dead stores are not eliminated #13727

Closed
EgorBo opened this issue Nov 5, 2019 · 10 comments
Closed

JIT: dead stores are not eliminated #13727

EgorBo opened this issue Nov 5, 2019 · 10 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Nov 5, 2019

Couldn't find any related issue and decided to file it:

unsafe void DeadStore1(int* array)
{
    *array = 1;
    *array = 2;
    *array = 3;
}

void DeadStore2(ref int x)
{
    x = 1;
    x = 2;
    x = 3;
}

void DeadStore3(int[] array)
{
    if (0 < (uint)array.Length)
    {
        array[0] = 1;
        array[0] = 2;
        array[0] = 3;
    }
}

Codegen:

; DeadStore1(long):this
G_M36731_IG01:
G_M36731_IG02:
       mov      dword ptr [rdx], 1
       mov      dword ptr [rdx], 2
       mov      dword ptr [rdx], 3
G_M36731_IG03:
       ret      
; Total bytes of code: 19


; DeadStore2(byref):this
G_M42680_IG01:
G_M42680_IG02:
       mov      dword ptr [rdx], 1
       mov      dword ptr [rdx], 2
       mov      dword ptr [rdx], 3
G_M42680_IG03:
       ret      
; Total bytes of code: 19


; DeadStore3(System.Int32[]):this
G_M18892_IG01:
G_M18892_IG02:
       mov      eax, dword ptr [rdx+8]
       test     eax, eax
       je       SHORT G_M18892_IG04
G_M18892_IG03:		;; bbWeight=0.50
       mov      dword ptr [rdx+16], 1
       mov      dword ptr [rdx+16], 2
       mov      dword ptr [rdx+16], 3
G_M18892_IG04:
       ret      
; Total bytes of code: 29

In none of these cases dead stores were eliminated.
Is it a known Value Numbering limitation?

Mono-LLVM:

00000000000b10 <Program_DeadStore1_int_>:
 b10:   c7 06 03 00 00 00       mov    DWORD PTR [rsi],0x3
 b16:   c3                      ret

0000000000000b20 <Program_DeadStore2_int_>:
 b20:   c7 06 03 00 00 00       mov    DWORD PTR [rsi],0x3
 b26:   c3                      ret

0000000000000b30 <Program_DeadStore3_int__>:
 b30:   83 7e 18 00             cmp    DWORD PTR [rsi+0x18],0x0
 b34:   74 07                   je     b3d <Program_DeadStore3_int__+0xd>
 b36:   c7 46 20 03 00 00 00    mov    DWORD PTR [rsi+0x20],0x3
 b3d:   c3                      ret

category:cq
theme:optimization
skill-level:intermediate
cost:small
impact:small

@omariom
Copy link
Contributor

omariom commented Nov 5, 2019

How often people write such code? 👀

@gfoidl
Copy link
Member

gfoidl commented Nov 5, 2019

Aside question: will the cpu eliminate the dead stores?
(So the cost would be fetching and decoding the redundant instructions).

@ArtBlnd
Copy link

ArtBlnd commented Nov 7, 2019

@gfoidl I don't think CPU can eliminate those codes.
storing data directly into memory will not be eliminated because CPU does not know that memory has side-effects.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@kunalspathak
Copy link
Member

Another case to note:

public class MyType<T>
{
    T value;
    public MyType(T _value)
    {
        value = _value;
    }
    public T GetValue()
    {
        return value;
    }
}
[MethodImpl(MethodImplOptions.NoInlining)]
uint deadStore()
{
    char result;
    result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue();
    result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue();
    result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue();
    result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue(); result = myType.GetValue();
    return result;
}

Code generated for x64:

G_M30105_IG02:
       488B4108             mov      rax, gword ptr [rcx+8]
       488BD0               mov      rdx, rax
       3912                 cmp      dword ptr [rdx], edx  # <-- Not sure what this cmp are for?
       488BD0               mov      rdx, rax
       3912                 cmp      dword ptr [rdx], edx
       488BD0               mov      rdx, rax
       3912                 cmp      dword ptr [rdx], edx
       488BD0               mov      rdx, rax
       3912                 cmp      dword ptr [rdx], edx
       488BD0               mov      rdx, rax
       3912                 cmp      dword ptr [rdx], edx
       488BD0               mov      rdx, rax

Code generated for arm64:

G_M5313_IG02:
        F9400400          ldr     x0, [x0,#8]
        AA0003E1          mov     x1, x0
        B940003F          ldr     wzr, [x1]
        AA0003E1          mov     x1, x0
        B940003F          ldr     wzr, [x1]
        AA0003E1          mov     x1, x0
        B940003F          ldr     wzr, [x1]
        AA0003E1          mov     x1, x0
        B940003F          ldr     wzr, [x1]
        AA0003E1          mov     x1, x0
        B940003F          ldr     wzr, [x1]
        AA0003E1          mov     x1, x0
        B940003F          ldr     wzr, [x1]

@erozenfeld erozenfeld self-assigned this Apr 14, 2020
@erozenfeld erozenfeld modified the milestones: Future, 5.0.0 Jul 6, 2020
@erozenfeld
Copy link
Member

@EgorBo I looked at your examples in #13727 (comment)
The jit currently can't track dead assignments to the heap. Adding a general optimization to handle them would be expensive in terms of jit throughput since it will require heap alias analysis. We could handle these trivial examples (when there are no trees between the stores) as peaphole optimizations but I suspect these cases don't happen often enough to justify it.

@erozenfeld
Copy link
Member

The example in #13727 (comment) is different since these are redundant null checks. We should be able to handle it.

@erozenfeld
Copy link
Member

Analysis of the example in #13727 (comment).

For simplicity I reduced the number of

result = myType.GetValue();

from 16 to 3.

Before null check folding optimization (in earlyProp) we have:

tmp1 = [this+8]
nullcheck tmp1
tmp2 = [this+8]
nullcheck tmp2
tmp3 = [this+8]
nullcheck tmp3
return [tmp3+8]

The optimization gets rid of the last nullcheck but can't do anything with the earlier nullchecks. It doesn't understand that tmp1, tmp2, and tmp3 all have the same value.

tmp1 = [this+8]
nullcheck tmp1
tmp2 = [this+8]
nullcheck tmp2
tmp3 = [this+8]
return [tmp3+8]

After CSE we end up with:

tmp1 = comma(cse0 = [this+8], cse0)
nullcheck tmp1
tmp2 = cse0
nullcheck tmp2
tmp3 = cse0
return [tmp3+8]

Here occurrences of cse0 and tmp2 and tmp3 have the same liberal value number but different conservative value numbers. In theory we could propagate conservative value numbers everywhere we can after CSE, although not having def->use links in SSA might be a problem. If we do propagate conservative value numbers, assertion propagation should be able to eliminate the second nullcheck and the assignment to tmp2. We would still be left with the first nullcheck and the assignment to tmp1.

Another idea is to run null check folding after CSE either instead or in addition to the existing pass. It would still have to changed in a non-trivial way to get rid of the nullcheck's.

Since there is no easy solution here, moving this issue to future.

@erozenfeld erozenfeld modified the milestones: 5.0.0, Future Jul 16, 2020
@erozenfeld erozenfeld removed their assignment Oct 12, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@RikkiGibson
Copy link
Contributor

The design is not confirmed, but it's possible Roslyn will begin inserting implicit 'default' initialization of some struct fields in constructors as part of the "semi-auto properties" feature (dotnet/roslyn#57012). SharpLab

public struct S {
    public int Item { get => field; set => field = value; }

    public S(int item)
    {
        // 'field' is only accessible by users within accessors of 'Item', so users cannot definitely assign it in the constructor.
        // To ensure 'field' is definitely assigned, we implicitly assign 'default' before any user code in the constructor.
        Item = item;
    }
}

If the IL contains 'field = default' before any user code, and the JIT inlines access of 'Item.set', it would ideally be able to then eliminate the dead store 'field = default'.

cc @Youssef1313 @333fred

@EgorBo
Copy link
Member Author

EgorBo commented Jan 27, 2022

@RikkiGibson hm.. good to know! Thanks

@EgorBo
Copy link
Member Author

EgorBo commented Nov 10, 2023

All examples from the description are now optimized in .net 9.0

@EgorBo EgorBo closed this as completed Nov 10, 2023
@EgorBo EgorBo modified the milestones: Future, 9.0.0 Nov 10, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

9 participants