Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast to ushort is dropped in release #10435

Closed
jakobbotsch opened this issue Jun 1, 2018 · 14 comments · Fixed by dotnet/coreclr#18816
Closed

Cast to ushort is dropped in release #10435

jakobbotsch opened this issue Jun 1, 2018 · 14 comments · Fixed by dotnet/coreclr#18816
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Milestone

Comments

@jakobbotsch
Copy link
Member

The following program gives different outputs for release and debug:

public class Program
{
    static short s_2 = -1000;

    public static void Main()
    {
        ulong var1 = (ushort)(1U ^ s_2);
        Console.WriteLine(var1);
    }
}

In release, the result is 4294966297. In debug, the result is 64537.
@mikedn has analyzed the cause here.

This issue repros on .NET framework as well with 64-bit JIT (it does not repro with 32-bit JIT).

@jkotas
Copy link
Member

jkotas commented Jun 1, 2018

cc @dotnet/jit-contrib

@mikedn
Copy link
Contributor

mikedn commented Jun 11, 2018

I took a closer look at this and this caused by optNarrowTree. It's handling of GT_CAST pays attention only to type sizes and concludes that the tree can be narrowed because short and ushort have the same size:

https://github.com/dotnet/coreclr/blob/422b0e81797d5428812c6ef825e22565e2dc63f3/src/jit/optimizer.cpp#L5830

This is correct with respect to the parent XOR node that can be narrowed from long to int. However, it is not correct to also eliminate the top level cast to ushort.

Ironically, the same optNarrowTree and fgMorphCast combination is also responsible for failing to eliminate casts that are truly redundant in other situations (e.g. #10337).

optNarrowTree could probably use a rewrite. As is now, the code is difficult to follow, buggy, misses various optimization opportunities and is likely inefficient (does it really need to traverse the tree twice?).

@jakobbotsch
Copy link
Member Author

This one is quite interesting:

// Generated by Fuzzlyn on 2018-06-11 20:55:08
// Seed: 6332377919370969218
// Reduced from 80.2 KiB to 0.6 KiB
// Debug: Outputs 65487
// Release: Outputs -49
struct S0
{
    public byte F0;
    public sbyte F1;
    public sbyte F2;
    public S0(byte f0, sbyte f1, sbyte f2)
    {
        F0 = f0;
        F1 = f1;
        F2 = f2;
    }
}

public class Program
{
    static bool[][] s_9 = new bool[][]{new bool[]{true}};
    public static void Main()
    {
        S0 vr45 = new S0(0, 0, 0);
        M5(vr45);
    }

    static bool M5(S0 arg1)
    {
        arg1 = new S0(0, -50, 0);
        char var0 = (char)(1U ^ arg1.F1);
        System.Console.WriteLine((int)var0);
        return s_9[0][0];
    }
}

In 32-bit and 64-bit .NET Core, the results are as described by the top comments. However in 32-bit .NET Framework (using JIT32), the results are the opposite: In debug, the program prints -49 and in release, the program prints 65487 (the correct result). Should I open a separate issue for this? I am unsure where to report JIT32 bugs.

@mikedn
Copy link
Contributor

mikedn commented Jun 12, 2018

However in 32-bit .NET Framework (using JIT32), the results are the opposite: In debug, the program prints -49 and in release, the program prints 65487 (the correct result).

Yep, the same optNarrowTree bug exists in JIT32 as well. It happens so that JIT32 manages to do constant folding (correctly) in release mode and hides the bug. Which then brings the question - why does JIT32 (and probably RyuJIT as well) attempts to do narrowing in debug mode...

I am unsure where to report JIT32 bugs.

Developer Community I'd guess - https://developercommunity.visualstudio.com/spaces/61/index.html

@jakobbotsch
Copy link
Member Author

This can cause problems in debug too:

public class Program
{
    static int s_1 = 0xff;
    public static void Main()
    {
        int vr14 = (ushort)(sbyte)s_1;
        System.Console.WriteLine(vr14);
    }
}

This program prints -1 in both debug and release, even though vr14 cannot be negative. The culprit is again optNarrowTree; this tree is morphed in these steps:

               [000014] --CXG-------              *  CAST      int <- ushort <- int
               [000013] --CXG-------              \--*  CAST      int <- byte <- int
               [000005] ----G-------                 |  /--*  FIELD     int    s_1
               [000012] --CXG-------                 \--*  COMMA     int
               [000011] H-CXG-------                    \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] ------------ arg0                  +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] ------------ arg1                  \--*  CNS_INT   int    1

to

               [000014] --CXG-------              *  CAST      int <- ushort <- int
               [000013] --CXG-------              \--*  CAST      int <- byte <- int
               [000005] ----G+------                 |  /--*  CLS_VAR   int    Hnd=0xf485288 Fseq[s_1]
               [000012] --CXG+------                 \--*  COMMA     int
               [000011] H-CXG+------                    \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] -----+------ arg0 in rcx           +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] -----+------ arg1 in rdx           \--*  CNS_INT   int    1

then the inner cast is narrowed and removed (correct):

               [000014] --CXG-------              *  CAST      int <- ushort <- int
               [000005] ----G+------              |  /--*  CLS_VAR   byte   Hnd=0xf485288 Fseq[s_1]
               [000012] --CXG+------              \--*  COMMA     int
               [000011] H-CXG+------                 \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] -----+------ arg0 in rcx        +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] -----+------ arg1 in rdx        \--*  CNS_INT   int    1

However, the outer-cast has no way to know that it should not narrow this tree further, because of the comma-node (which has type int vs its second oper, which has type byte). So the widening cast is removed and we just end up with:

               [000005] ----G+------                 /--*  CLS_VAR   byte   Hnd=0xf485288 Fseq[s_1]
               [000012] --CXG+------              /--*  COMMA     int
               [000011] H-CXG+------              |  \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] -----+------ arg0 in rcx  |     +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] -----+------ arg1 in rdx  |     \--*  CNS_INT   int    1
               [000016] -ACXG-------              *  ASG       int
               [000015] D----+-N----              \--*  LCL_VAR   int    V00 loc0

So as far as I can tell, the culprit is this:
https://github.com/dotnet/coreclr/blob/86ca8b5e282ee0b42e27c69cb42c159dfc6e5a4b/src/jit/optimizer.cpp#L5873
which should not be using genActualType. However I am not too sure why the JIT tries so hard to keep nodes wide? Does it ever make sense for a comma-node to have a different type than its right operand?

@mikedn
Copy link
Contributor

mikedn commented Jun 25, 2018

So as far as I can tell, the culprit is this:

That might be but it's not the only problem with narrow tree, at least because you can reproduce the bug without a COMMA node. It's somewhat difficult to pinpoint the actual problem because it's not really clear what optNarrowTree is trying to do and what a true result really means. My impression is that optNarrowTree's main intention was to deal with long->int narrowing (an useful optimization on 32 bit targets for which this code was originally written) but it strayed, perhaps by accident, into cast folding territory and it's ill designed for that.

I wrote a new one (originally with the intention to address some CQ issues) but it's still pretty much WIP. And it's probably better if a more surgical fix is found for this particular issue. Though I'm a bit skeptical that such a fix exists, one or two I tried resulted in unrelated CQ regressions.

However I am not too sure why the JIT tries so hard to keep nodes wide?

JIT's IR tends to follow IL model (and that one follows typical high level language models) where small integer operations (add, sub, mul etc.) do not exist. Supporting small integer ops would likely add significant complexity and have questionable benefit.

Does it ever make sense for a comma-node to have a different type than its right operand?

Sure, because normally there's no such thing as a small int typed operand. The few nodes that do use small int types (most commonly indirs) actually produce int values by implicit widening.

@AndyAyersMS
Copy link
Member

One way to think about narrow types is that when optimizing we'd prefer that they appear only at the boundaries of trees -- widening at leaves (~loads) and narrowing at the root (~stores). As @mikedn says this more or less mirrors IL and machine semantics where narrow types are a storage concept (args, locals, fields), not an operation concept.

When there are casts in the middle of a tree the general idea is to migrate them either up to the root or down to the leaves of the tree (or sometimes both, in complex trees with multiple casts), as this can often lead to simplifications.

But as should be clear, this is something that needs to be done carefully...

@jakobbotsch
Copy link
Member Author

I see. Thanks to both of you for the explanations, that helps. So to sum up: we would (almost?) never want to actually relabel internal nodes to smaller types?
I assume this is also the reason for the notation on cast-nodes: int <- ushort <- int, etc., to indicate that it is just used to discard bits but the nodes are left the same size?

@mikedn
Copy link
Contributor

mikedn commented Jun 25, 2018

So to sum up: we would (almost?) never want to actually relabel internal nodes to smaller types?

Yeah, almost. There are some places where the JIT manages to use small types in less usual circumstances. 2 or 3 cases I happen to know about:

  • XARCH lowering sometimes narrows relops to small int and in doing so may also narrow some binary ops (AFAIR AND/OR/XOR). That's intended to take advantage of the fact that XARCH does have byte/word instructions and it happens to work reasonably well because not much happens after lowering (e.g. no more optimizations).
  • fgMorphSmpOp sometimes produces TYP_BYTE relop nodes. That's also an optimization intended for XARCH that doesn't have means to produce a 32 bit integer 0/1 value from condition codes (its SETcc instruction is byte only). This particular optimization is a hack that really should be in lowering, you can tell that from the fact that the same code also needs to undo the optimization to prevent storing byte values into lclvars.
  • AFAIR it may also be possible to see TYP_BOOL relops though I don't remember the exact details.

I assume this is also the reason for the notation on cast-nodes: int <- ushort <- int, etc., to indicate that it is just used to discard bits but the nodes are left the same size?

Yep. These casts are fun, you can classify them as narrowing or widening depending on how you look at them.

@erozenfeld
Copy link
Member

I have surgical fixes for these three cases. I verified no diffs on x64 crossgen frameworks. I will verify diffs on ({x64, x86} x {crossgen, pmi} x {frameworks + tests}) and will create a PR next week.

@erozenfeld
Copy link
Member

erozenfeld commented Jul 6, 2018

@jakobbotsch I can repro and have fixes for your first and third examples above. However, I can't repro the second example (that uses struct S0). I get the correct result on core in debug and release x64. Can you double check that you get -49 in release?

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 6, 2018

Can you double check that you get -49 in release?

I think I must have made a mistake in my comment previously. That example only reproduces with 32-bit JIT. I cannot reproduce it with 64-bit JIT.
I have already reported it over at Developer Community -- except that the result is the opposite of .NET core on the full .NET framework.
If you want I can open another issue for it, if it is not the same bug after all.

EDIT: To be clear it reproduces with RyuJIT for 32-bit code. With JIT32 (on full .NET framework), the behavior is flipped: in debug it prints the wrong result, and in release the correct result.

@erozenfeld
Copy link
Member

Thank you for clarification. I reproduced that example with x86 RyuJIT and verified that my fixes cover it.

@jakobbotsch
Copy link
Member Author

No problem. Once you open the PR I can try to look at the rest of my examples and see if I can find some other corner cases not covered.

erozenfeld referenced this issue in erozenfeld/coreclr Jul 6, 2018
The fix under NARROW_IND prevents transformation of, e.g.,
CAST      int <- ushort <- int
    CLS_VAR byte

into

CLS_VAR byte.

With the fix the CAST is not removed.

The fix under GT_CAST prevents transformation of, e.g.,

CAST      int <- ushort <- long
    CAST      long <- int
       expr short

into

expt short.

With the fix it gets transformed into

CAST      int <- ushort <- int
    expr short

Fixes #18238.

No diffs in frameworks and tests (pmi and crossgen, x64 and x86), except for the added test cases.
erozenfeld referenced this issue in erozenfeld/coreclr Jul 11, 2018
The fix under NARROW_IND prevents transformation of, e.g.,
CAST      int <- ushort <- int
    CLS_VAR byte

into

CLS_VAR byte.

With the fix the CAST is not removed.

The fix under GT_CAST prevents transformation of, e.g.,

CAST      int <- ushort <- long
    CAST      long <- int
       expr short

into

expt short.

With the fix it gets transformed into

CAST      int <- ushort <- int
    expr short

Block cast optimizations in fgMorphCast if the cast exression is an
active CSE candidate.

Update cast exression value numbers when a cast is removed.

Fixes #18238, #18850.

No diffs in frameworks and tests (pmi and crossgen, x64 and x86), except for the added test cases.
erozenfeld referenced this issue in erozenfeld/coreclr Jul 11, 2018
The fix under NARROW_IND prevents transformation of, e.g.,
CAST      int <- ushort <- int
    CLS_VAR byte

into

CLS_VAR byte.

With the fix the CAST is not removed.

The fix under GT_CAST prevents transformation of, e.g.,

CAST      int <- ushort <- long
    CAST      long <- int
       expr short

into

expt short.

With the fix it gets transformed into

CAST      int <- ushort <- int
    expr short

Block cast optimizations in fgMorphCast if the cast expression is an
active CSE candidate.

Update cast expression value numbers when a cast is removed.

Fixes #18238, #18850.

No diffs in frameworks and tests (pmi and crossgen, x64 and x86), except for the added test cases.
erozenfeld referenced this issue in erozenfeld/coreclr Jul 12, 2018
The fix under NARROW_IND prevents transformation of, e.g.,
CAST      int <- ushort <- int
    CLS_VAR byte

into

CLS_VAR byte.

With the fix the CAST is not removed.

The fix under GT_CAST prevents transformation of, e.g.,

CAST      int <- ushort <- long
    CAST      long <- int
       expr short

into

expt short.

With the fix it gets transformed into

CAST      int <- ushort <- int
    expr short

Block cast optimizations in fgMorphCast if the cast expression is an
active CSE candidate.

Update cast expression value numbers when a cast is removed.

Fixes #18238, #18850.

No diffs in frameworks and tests (pmi and crossgen, x64 and x86), except for the added test cases.
erozenfeld referenced this issue in dotnet/coreclr Jul 12, 2018
The fix under NARROW_IND prevents transformation of, e.g.,
CAST      int <- ushort <- int
    CLS_VAR byte

into

CLS_VAR byte.

With the fix the CAST is not removed.

The fix under GT_CAST prevents transformation of, e.g.,

CAST      int <- ushort <- long
    CAST      long <- int
       expr short

into

expt short.

With the fix it gets transformed into

CAST      int <- ushort <- int
    expr short

Block cast optimizations in fgMorphCast if the cast expression is an
active CSE candidate.

Update cast expression value numbers when a cast is removed.

Fixes #18238, #18850.

No diffs in frameworks and tests (pmi and crossgen, x64 and x86), except for the added test cases.
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants