Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmove is not emitted for a simple expression #90017

Closed
EgorBo opened this issue Aug 4, 2023 · 5 comments · Fixed by #90468
Closed

cmove is not emitted for a simple expression #90017

EgorBo opened this issue Aug 4, 2023 · 5 comments · Fixed by #90468
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Aug 4, 2023

int foo(bool cond, int a, int b) => 1 + (cond ? a : b); // or e.g. "cond ? 1 : 2"

Current codegen:

; Method Program:foo(bool,int,int):int:this (FullOpts)
       test     dl, dl
       jne      SHORT G_M62184_IG04
       mov      eax, r9d
       jmp      SHORT G_M62184_IG05
G_M62184_IG04:  ;; offset=0x0009
       mov      eax, r8d
G_M62184_IG05:  ;; offset=0x000C
       inc      eax
       ret      
; Total bytes of code: 15

Expected: branchless code (cmove)

cc @jakobbotsch @dotnet/jit-contrib

@EgorBo EgorBo added the tenet-performance Performance related issue label Aug 4, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 4, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 4, 2023
@ghost
Copy link

ghost commented Aug 4, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details
int foo(bool cond, int a, int b) => 1 + (cond ? a : b); // or e.g. "cond ? 1 : 2"

Current codegen:

; Method Program:foo(bool,int,int):int:this (FullOpts)
       test     dl, dl
       jne      SHORT G_M62184_IG04
       mov      eax, r9d
       jmp      SHORT G_M62184_IG05
G_M62184_IG04:  ;; offset=0x0009
       mov      eax, r8d
G_M62184_IG05:  ;; offset=0x000C
       inc      eax
       ret      
; Total bytes of code: 15

Expected: branchless code (cmove)

cc @jakobbotsch @dotnet/jit-contrib

Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Aug 4, 2023

Works if I move + 1 to the right side:

 int foo(bool cond, int a, int b) => (cond ? a : b) + 1;
; Method Program:foo(bool,int,int):int:this (FullOpts)
       test     dl, dl
       cmove    r8d, r9d
       mov      eax, r8d
       inc      eax
       ret      
; Total bytes of code: 12

@stephentoub
Copy link
Member

Likely a separate issue, but as we discussed offline, it'd also be nice if:

int foo(bool cond, int a, int b) => (cond ? 1 : 2) + 1;

could become the equivalent of:

G_M000_IG02:
       mov      eax, 2
       mov      edx, 3
       test     cl, cl
       cmove    eax, edx

G_M000_IG03:
       ret      

instead of:

G_M000_IG02:
       mov      eax, 1
       mov      edx, 2
       test     cl, cl
       cmove    eax, edx
       inc      eax

G_M000_IG03:
       ret

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Aug 4, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 4, 2023
@jakobbotsch
Copy link
Member

This is presumably the same underlying issue as #87072, where Roslyn leaves things on the IL stack across control flow. I'm not at my PC, but I would guess that in this case that introduces additional stores in each basic block that disqualifies it from if conversion.

If that's right then we can likely teach if conversion about some of these cases, but it might fit better as an additional opportunity handled by tail merging.

@jakobbotsch
Copy link
Member

Likely a separate issue, but as we discussed offline, it'd also be nice if:

int foo(bool cond, int a, int b) => (cond ? 1 : 2) + 1;

could become the equivalent of:

G_M000_IG02:
       mov      eax, 2
       mov      edx, 3
       test     cl, cl
       cmove    eax, edx

G_M000_IG03:
       ret      

instead of:

G_M000_IG02:
       mov      eax, 1
       mov      edx, 2
       test     cl, cl
       cmove    eax, edx
       inc      eax

G_M000_IG03:
       ret

It might be worth it to invoke forward sub on the new statements created by if-conversion, and then run a folding pass over potentially resulting combined trees. For example, it would optimize ADD(SELECT(cond, cns1, cns2), cns3) into SELECT(cond, cns1 + cns3, cns2 + cns3). That would be an easy way to handle some of the simple cases.

It would require a bit of work to make forward sub work with the node threading we have at the point of if-conversion.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Aug 12, 2023
Add a pass that does head merging to compliment the existing tail
merging pass. Unlike tail merging this requires reordering the first
statement with the terminator node of the predecessor, which requires
some interference checking.

Fix dotnet#90017
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 12, 2023
jakobbotsch added a commit that referenced this issue Aug 18, 2023
…nsformation (#90468)

Add a pass that does head merging to compliment the existing tail
merging pass. Unlike tail merging this requires reordering the first
statement with the terminator node of the predecessor, which requires
some interference checking.

The head merging is enabled only for BBJ_COND since the CQ benefit
for enabling it more generally does not seem to pay off. Additionally
we mainly try to target the IR produced by the importer due to spill
cliques, which usually happen with the IL that results from using
ternaries.

Fix #90017
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants