Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IComparer<T> constraint leads to worse codegen than IComparisonOperators<T,T,bool> constraint #78383

Open
stephentoub opened this issue Nov 15, 2022 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:3 Work that is nice to have
Milestone

Comments

@stephentoub
Copy link
Member

We introduced the new IComparisonOperators<,,> interface as part of the generic math APIs work in .NET 7, but for general-purpose APIs that need to compare and aren't constrained to numerical types, it's preferable from a design perspective to use the longer-standing IComparable<T>. Unfortunately, the codegen that's produced with IComparable<T> can end up being worse than that for IComparisonOperators<,,>, forcing a hard choice. Can we improve the JIT here?

Example:
SharpLab

#nullable disable
using System;
using System.Numerics;
using SharpLab.Runtime;

public class C
{
    [JitGeneric(typeof(int))]
    public static bool M1<T>(T value, T other) where T : IComparable<T>, IComparisonOperators<T, T, bool> =>
        value.CompareTo(other) > 0;
    
    [JitGeneric(typeof(int))]
    public static bool M2<T>(T value, T other) where T : IComparable<T>, IComparisonOperators<T, T, bool> =>
        value > other;
    
    public static bool M3(int value, int other) =>
        value.CompareTo(other) > 0;
    
    public static bool M4(int value, int other) =>
        value > other;
}

Both M1 (generic) and M3 (non-generic) that use CompareTo end up producing:

    L0000: cmp ecx, edx
    L0002: jl short L0013
    L0004: cmp ecx, edx
    L0006: jg short L001a
    L0008: xor eax, eax
    L000a: test eax, eax
    L000c: setg al
    L000f: movzx eax, al
    L0012: ret
    L0013: mov eax, 0xffffffff
    L0018: jmp short L000a
    L001a: mov eax, 1
    L001f: jmp short L000a

whereas both M2 (generic) and M4 (non-generic) that use IComparisonOperators<T,T,bool> end up producing:

    L0000: xor eax, eax
    L0002: cmp ecx, edx
    L0004: setg al
    L0007: ret

Related to #78222
cc: @EgorBo, @tannergooding

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 15, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 15, 2022
@ghost
Copy link

ghost commented Nov 15, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We introduced the new IComparisonOperators<,,> interface as part of the generic math APIs work in .NET 7, but for general-purpose APIs that need to compare and aren't constrained to numerical types, it's preferable from a design perspective to use the longer-standing IComparable<T>. Unfortunately, the codegen that's produced with IComparable<T> can end up being worse than that for IComparisonOperators<,,>, forcing a hard choice. Can we improve the JIT here?

Example:
SharpLab

#nullable disable
using System;
using System.Numerics;
using SharpLab.Runtime;

public class C
{
    [JitGeneric(typeof(int))]
    public static bool M1<T>(T value, T other) where T : IComparable<T>, IComparisonOperators<T, T, bool> =>
        value.CompareTo(other) > 0;
    
    [JitGeneric(typeof(int))]
    public static bool M2<T>(T value, T other) where T : IComparable<T>, IComparisonOperators<T, T, bool> =>
        value > other;
    
    public static bool M3(int value, int other) =>
        value.CompareTo(other) > 0;
    
    public static bool M4(int value, int other) =>
        value > other;
}

Both M1 (generic) and M3 (non-generic) that use CompareTo end up producing:

    L0000: cmp ecx, edx
    L0002: jl short L0013
    L0004: cmp ecx, edx
    L0006: jg short L001a
    L0008: xor eax, eax
    L000a: test eax, eax
    L000c: setg al
    L000f: movzx eax, al
    L0012: ret
    L0013: mov eax, 0xffffffff
    L0018: jmp short L000a
    L001a: mov eax, 1
    L001f: jmp short L000a

whereas both M2 (generic) and M4 (non-generic) that use IComparisonOperators<T,T,bool> end up producing:

    L0000: xor eax, eax
    L0002: cmp ecx, edx
    L0004: setg al
    L0007: ret

Related to #78222
cc: @EgorBo, @tannergooding

Author: stephentoub
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member

Looks like the exact case of #70616. I'm not sure what @EgorBo's conclusion ended up being around this (there is the classic ? true : false workaround in the C# source that we reverted to in #70631)

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Nov 17, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Nov 17, 2022
@EgorBo
Copy link
Member

EgorBo commented Mar 27, 2023

Likely the same as #78383, assigning Andy who was looking at it in #83859

@EgorBo EgorBo assigned AndyAyersMS and unassigned EgorBo Mar 27, 2023
@AndyAyersMS AndyAyersMS added the Priority:3 Work that is nice to have label Jun 9, 2023
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jun 9, 2023

This still feels somewhat beyond our grasp.For M3, during RBO, we have

image (31)

And we want to recognize that the predicate value returned in BB06 is also a function of the two predicate operands in BB01, so that the entire graph collapses to a single compare.

RBO could handle it something like the following:

  • At BB06, realize that we have a return value that depends on PHI inputs
  • At BB06, realize that each PHI input resolves the return value to a constant
    • path (a) T1 = -1 ==> rv false
    • path (b) T1 = 0 ==> rv false
    • path (c) T1 = 1 ==> rv true
  • Discover that there is a control-equivalent branch (dominating BB06 and post-dominated by BB06) in BB01
  • Determine that BB01's operands can be compared in way that establishes the return value.
    • we can do this by computing the conditions under which RV is true (path c):
    • AND(GE(V00,V01), NOT(LE(V00,V01)) ==> GT(V00,V01)
  • Determine that there are no important side effects in between BB01's compare and BB06's return
  • Create a return at end of BB01 replacing the branch, with comparison above, and delete the other blocks.

The prototype changes for #81220 have a parts of this logic, but that analysis won't get triggered by returns, and does not have a sufficiently powerful side-effect analysis (currently tripped up by the assignments to T1). We should be able to argue that the only uses of T1 are in this subgraph (or perhaps, the only uses of the defs of T1 in this subgraph are also in this subgraph) and the only side effects are assignments to T1, so these assignment side effects can be disregarded as we will be removing all the uses too.

@AndyAyersMS
Copy link
Member

This is going to take more work so it's not going to happen in .NET 8.

There is a decent draft PR with some of the necessary bits here: #88527. It doesn't actually find that many cases and introduces a fair amount of new logic, so I'm going to hold off merging this until after .NET 8 is done.

@AndyAyersMS AndyAyersMS modified the milestones: 8.0.0, 9.0.0 Jul 21, 2023
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jul 8, 2024

Still work needed on the approaches taken above -- would be nice if the overall opt was less pattern matchy somehow.

For the return cases perhaps postdominance can play a role? Starting at a return (relop), if we walk up the postdominator tree and find a postdominated BBJ_COND, we can do the same sort of symbolic analysis to see if the decision made there determines (or could be modified to determine) the value that will be returned, and if so, perhaps we can short-circuit all the logic in between.

For the more general case we may need something like collective postdominance... say two blocks collectively postdominate some nest of computation, we can do similar analysis with the collective postdominator tree, to see if some decision made earlier can be modified to bypass all the nest and arrive at the right partial postdominator.

The other missing bit is the ability to recognize when the within-nest side effects' effects are contained to the nest, so if the nest goes away nothing of importance is lost.

I'm going to move this out of .NET 9 as it needs more time and thought.

@AndyAyersMS AndyAyersMS modified the milestones: 9.0.0, Future Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:3 Work that is nice to have
Projects
None yet
Development

No branches or pull requests

5 participants