Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT won't CSE Math.Min with integers #101920

Open
stephentoub opened this issue May 6, 2024 · 10 comments
Open

JIT won't CSE Math.Min with integers #101920

stephentoub opened this issue May 6, 2024 · 10 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@stephentoub
Copy link
Member

I was expecting multiple Math.Min calls with the same Int32 arguments to be elided via CSE, but that's not happening.
SharpLab

using System;
public class C
{
    int M1(int length) =>
        Math.Min(length, 42) +
        Math.Min(length, 42) +
        Math.Min(length, 42) +
        Math.Min(length, 42) +
        Math.Min(length, 42);
}

resulting in:

C.M1(Int32)
    L0000: cmp edx, 0x2a
    L0003: jle short L000c
    L0005: mov eax, 0x2a
    L000a: jmp short L0010
    L000c: mov eax, edx
    L000e: jmp short L003f
    L0010: mov ecx, 0x2a
    L0015: add eax, ecx
    L0017: mov ecx, 0x2a
    L001c: cmp edx, 0x2a
    L001f: cmovle ecx, edx
    L0022: add eax, ecx
    L0024: mov ecx, 0x2a
    L0029: cmp edx, 0x2a
    L002c: cmovle ecx, edx
    L002f: add eax, ecx
    L0031: mov ecx, 0x2a
    L0036: cmp edx, 0x2a
    L0039: cmovg edx, ecx
    L003c: add eax, edx
    L003e: ret
    L003f: mov ecx, edx
    L0041: jmp short L0015

It's not a big deal, but the actual case where I noticed this came from the regex source generator, where it was emitting code like this:

int iteration = span.Slice(0, Math.Min(span.Length, 100)).IndexOfAnyExcept('a');
if (iteration < 0)
{
    iteration = Math.Min(span.Length, 100);
}

in order to search a span but only up to a specific length; if it doesn't find what it's looking for within that length, it needs to update the position to the end of what it did examine. It's doing so with multiple Math.Min(span.Length, int32limit) calls, and each is resulting in the relevant comparisons rather than being CSE'd, even though the inputs provably don't change between calls.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 6, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member

EgorBo commented May 6, 2024

@dotnet/jit-contrib
The Math.Min is effectively just:

        public static int Min(int a, int b)
        {
            return (a <= b) ? a : b;
        }

I am curious if we can recognize GT_SELECT early in importer too , so then it's naturally CSE'd, etc. (and folded for constant input). Currently, GT_SELECT is recognized only during late phases.

@jakobbotsch
Copy link
Member

jakobbotsch commented May 6, 2024

I don't think anything stops us from running if-conversion earlier. One strategy would be to canonicalize into GT_SELECT form much more aggressively and then expand back to control flow later for cases where we think it's too expensive to compute the arms unconditionally. Probably the most practical solution here (vs something more general that teaches CSE to handle the hammocks in the flow graph).

@a74nh I'm curious if you have any thoughts about running if-conversion earlier. I recall you ended up moving the phase a few times when you added it.

@AndyAyersMS
Copy link
Member

Jump threading should be able to clean this sort of thing up too, but likely is currently blocked by side effects.

@EgorBo EgorBo added this to the Future milestone May 6, 2024
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label May 6, 2024
@EgorBo
Copy link
Member

EgorBo commented May 6, 2024

Jump threading should be able to clean this sort of thing up too, but likely is currently blocked by side effects.

Looks like RBO only optimizes just one condition:

image

(Before RBO - After RBO)

@AndyAyersMS
Copy link
Member

That looks like what we want, BB04 is now bypassed... is there something below there that doesn't get fixed?

@jakobbotsch
Copy link
Member

Would jump threading be expected to handle the real world case that @stephentoub showed above, in regex? That one has more interposing stuff.

@a74nh
Copy link
Contributor

a74nh commented May 7, 2024

@a74nh I'm curious if you have any thoughts about running if-conversion earlier. I recall you ended up moving the phase a few times when you added it.

An earlier version of If Conversion did have it much earlier (before SSA I think). It was eventually moved so that it was directly after Optimize Bools, because Optimise Bools needs to combine multiple conditions within an if into a single AND chain before the If Conversion.

You could move them both earlier. I think as long as you've detected loops it should be ok. I'm not sure if it would interfere with some of the loop optimisations (hoisting, inverting etc).

@AndyAyersMS
Copy link
Member

That looks like what we want, BB04 is now bypassed... is there something below there that doesn't get fixed?

Egor you probably checked but indeed the lower compares are recognized as redundant, but the chain of adds blocks jump threading.

Dominator BB10 of BB13 has relop with same liberal VN
N003 (  3,  3) [000064] J------N---                         *  LE        int    $100
N001 (  1,  1) [000018] -----------                         +--*  LCL_VAR   int    V01 arg1         u:1 $80
N002 (  1,  1) [000063] -----------                         \--*  CNS_INT   int    42 $42
 Redundant compare; current relop:
N003 (  3,  3) [000074] J------N---                         *  LE        int    $100
N001 (  1,  1) [000025] -----------                         +--*  LCL_VAR   int    V01 arg1         u:1 $80
N002 (  1,  1) [000073] -----------                         \--*  CNS_INT   int    42 $42
Both successors of idom BB10 reach BB13 -- attempting jump threading
BB13 has side effects; no threading

Either we need to be willing to destroy SSA or we need some way to update it by creating duplicate defs.

Before:
image

After: we need to copy the PHI/ADD code in BB13 into BB11 and BB12. The PHI resolves to either V9.1 or V9.2 depending on pred, but we now need two defs for V6, and we need to add a PHI for V6 down in BB16, and update that use to the PHI def.

Currently we know that V6.4 has one use, but we don't know where it is, and we don't know all the places we might need to add phis.

@AndyAyersMS
Copy link
Member

We could I suppose have RBO just build up a list of blocks where jump threading was blocked like this, and remember how to reroute each pred, and then at the end of the optimizer phases, run through that list... provided the flow structure still matches we could then just duplicate IR and rewire flow. It might be a bit fragile though.

Or if we moved RBO to the end like we've been envisioning if/when we can pay for opt repeat, it can just do this as a second action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

5 participants