Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT bounds checks not eliminated for readonly arrays #5990

Closed
benaadams opened this issue Jun 1, 2016 · 21 comments
Closed

RyuJIT bounds checks not eliminated for readonly arrays #5990

benaadams opened this issue Jun 1, 2016 · 21 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@benaadams
Copy link
Member

Range analysis doesn't take into account readonly arrays and you need to make a function local reference to eliminate the range check.

Analysis https://gist.github.com/benaadams/11123e9785481e07d8ca483bef798689

Method Median StdDev Scaled
ForLoopArray 2.5021 us 0.0591 us 1.00
ForLoopOneBoundsCheck 2.3430 us 0.0296 us 0.94
ForLoopReadOnlyArray 2.5047 us 0.0334 us 1.00
ForLoopAddPointer 2.3360 us 0.0059 us 0.93
ForEachArray 2.3432 us 0.0051 us 0.94
@omariom
Copy link
Contributor

omariom commented Jun 1, 2016

Range check can't be eliminated for readonly instance fields because ctor's can call other methods.
But if a static readonly field has already been initialized JIT could skip generatin range checks.

@GSPP
Copy link

GSPP commented Jun 1, 2016

What's the CLR's policy on changing readonly fields through reflection? Is this undefined behavior? I believe so for the static case. I have seen the JIT inline static readonly constants at JIT time.

@omariom
Copy link
Contributor

omariom commented Jun 1, 2016

I have seen the JIT inline static readonly constants at JIT time.

It currently works for fields of primitive types.
would be interesting to extend it further to arrays and dynamic dispatch

@GSPP
Copy link

GSPP commented Jun 1, 2016

I was more interested in what formal guarantees there are. I guess for static readonly there are none. But what about:

var x = this.readonlyField;
SetXThroughReflection();
Assert(x == this.readonlyField);

Can this ever throw? The JIT might assume the field cannot change.

For a static readonly field this would always fail.

@omariom
Copy link
Contributor

omariom commented Jun 1, 2016

wow! I've been sure Reflection doesn't allow setting readonly fields.
But it does!

void Main()
{
// Static
    var staticField = typeof(TestClass).GetField("StaticField", BindingFlags.Static | BindingFlags.Public);

    staticField.SetValue(null, 6);

    Console.WriteLine(staticField.GetValue(null));

// Instance
    var field = typeof(TestClass).GetField("Field", BindingFlags.Instance | BindingFlags.Public);
    var instance = new TestClass();
    field.SetValue(instance, 6);

    Console.WriteLine(field.GetValue(instance));
}

class TestClass
{
    public readonly static int StaticField = 7;
    public readonly int Field = 10;
}

@RussKeldorph
Copy link
Contributor

@dotnet/jit-contrib FYI

@mikedn
Copy link
Contributor

mikedn commented Dec 18, 2016

This has nothing to do with readonly, constructors calling other methods, reflection and whatnot.

If you look at the code generated for something like

for (int i = 0; i < p.a.Length; i++)
    s += p.a[i];

you'll see something like this

G_M56240_IG03:
       4C8BC9               mov      r9, rcx
       413BD0               cmp      edx, r8d
       7316                 jae      SHORT G_M56240_IG05
       4C63D2               movsxd   r10, edx
       4303449110           add      eax, dword ptr [r9+4*r10+16]
       FFC2                 inc      edx
       443BC2               cmp      r8d, edx
       7FE9                 jg       SHORT G_M56240_IG03

The array field ended up in a register so the JIT doesn't care that the field is readonly or not. As far as it is concerned the field isn't changed inside the loop so it can load it only once, before the loop. I don't know why the range check isn't eliminated but obviously it isn't because the field may change during iteration.

And a trivial workaround - use foreach when possible. It generates this

G_M56240_IG03:
       4C63C9               movsxd   r9, ecx
       468B4C8A10           mov      r9d, dword ptr [rdx+4*r9+16]
       4103C1               add      eax, r9d
       FFC1                 inc      ecx
       443BC1               cmp      r8d, ecx
       7FEE                 jg       SHORT G_M56240_IG03

That's because foreach itself loads the field into a local variable before the loop.

@benaadams
Copy link
Member Author

@mikedn interesting; your first example is for a regular non-readonly array? So it should either be eliminating the range check or confirming the array is the same anyway?

@omariom
Copy link
Contributor

omariom commented Dec 19, 2016

According to not very well specified .net memory model JIT could do that even without explicit/implicit local copies.

@mikedn
Copy link
Contributor

mikedn commented Dec 19, 2016

@benaadams I'm not quite sure what your question is. What's for sure is that in the example p.a may be readonly or not, the same code is generated. readonly would be useful in cases where the loop contains code that might modify the array field and the JIT can't figure it out.

@mikedn
Copy link
Contributor

mikedn commented Dec 19, 2016

I took a quick look at the IR dump for that example and I think the JIT shot itself in the foot.

At the start of the analysis p.a.Length and p.a[i] considered the same in some contexts (CSE) and different in other contexts (range check optimization). That's normal, CSE of a load is allowed by the memory model. But you can't eliminate a range check if another thread may change the array from under you, that could have dire consequences.

Then CSE takes place and the array length ends up in a local variable that's used by both p.a.Length and the range check implied by p.a[i].

But after CSE the JIT still "remember" that those expressions were different as far as threading is concerned. And range check elimination fails.

Seems to me that CSE should fix up JIT's knowledge about multi-threaded access.

@benaadams
Copy link
Member Author

benaadams commented Dec 19, 2016

I'm not quite sure what your question is.

It seems the either the range check should be eliminated as its a gc safe object that's not changing length; regardless of whether the instance variable is changed; or it should reacquire the array on each loop - but its doing neither? (I'd prefer the former 😉 )

@mikedn
Copy link
Contributor

mikedn commented Dec 19, 2016

regardless of whether the instance variable is changed

Well, you can't safely eliminate the range check if the instance variable changes. The only way to do this is to copy the instance variable to a local variable and only use the local variable within the loop. The JIT does make this copy but then fails to eliminate the range check due to another problem (described above).

@JosephTremoulet
Copy link
Contributor

Seems to me that CSE should fix up JIT's knowledge about multi-threaded access.

Absolutely, yes, it should. What makes this tricky with the current set-up is that the knowledge you're referring to is embedded in the value numbers (specifically the divergence between the liberal ones and the conservative ones), but value numbering runs before CSE and doesn't leave behind enough dependence information to quickly tell which conservative value numbers could be refined (and to what) in response to performing a given CSE. So we'd have to do something akin to re-running value-numbering (and making that incremental enough to not explode the compile-time cost would be the trick), or change how the knowledge is recorded in a way that makes the incremental update more straightforward (e.g. fold CSE and value-numbering together, then let subsequent phases like range-checking use same-SSA-def as a proxy for value-equal; the hurdles here are how to update all that downstream code without upsetting the apple cart, and time to implement).

Note that checked/debug builds of the jit have "opt repeat" functionality (enabled by setting COMPlus_JitOptRepeat/COMPlus_JitOptRepeatCount), which will literally re-run value-numbering (and other optimization passes), to make it easier to gauge the severity of this sort of issue; it would certainly be interesting to know if someone's important code is running into this. Experiments so far haven't shown it to be a top issue.

@mikedn
Copy link
Contributor

mikedn commented Dec 19, 2016

but value numbering runs before CSE and doesn't leave behind enough dependence information to quickly tell which conservative value numbers could be refined (and to what) in response to performing a given CSE

But what would happen if CSE simply copies the conservative value number from the definition? Could that break something else? As is now it looks weird that we end up with IR where the def and use of a lcl var have different conservative VNs:
def:

N011 (  7,  7) [000106] -A-XG---R---                      \--*  ASG       int    $VN.Void
N010 (  1,  1) [000105] D------N----                         \--*  LCL_VAR   int    V11 cse0          <l:$105, c:$104>

use:

N006 (  6,  9) [000046] ---X--------             |  |  |  |  \--*  ARR_BOUNDS_CHECK_Rng void   <l:$2c7, c:$2c6>
N004 (  1,  1) [000110] ------------             |  |  |  |     +--*  LCL_VAR   int    V11 cse0          <l:$105, c:$106>
N005 (  1,  1) [000022] ------------             |  |  |  |     \--*  LCL_VAR   int    V08 loc1         u:4 $381

Sure, simply changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK?

@JosephTremoulet
Copy link
Contributor

But what would happen if CSE simply copies the conservative value number from the definition?

In general there could be multiple defs, and CSE isn't computing which reach each use. But single def is likely the common case and could be special-cased...

Could that break something else?

I think it would be sound... some possible gotchas:

  • I'd worry a little about losing the congruence between a rewritten use and another tree with the same conservative value number, but you'd hope that two trees with the same conservative number would also have the same liberal number and therefore CSE would rewrite both of them.
  • Hopefully we don't have code that somehow demands that an OpFoo node whose value number is FooFunc have IR arguments whose value numbers are the arguments of the FooFunc expression, as that would break. I'd guess we don't have that, so would be ok.
  • Any data structures we have that key off conservative value number or track congruence classes by conservative value number or etc. would of course also need scrutiny. I'm not aware of any off the top of my head. When CSE is running, it of course has grouped exprs, but by liberal value number...

Sure, simply changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK?

Right, my biggest concern was about the lack of propagation up the tree and then through the SSA graph. So e.g. the same issue could happen in the index expression and we'd still miss it. That said, adding a special case for GT_ARR_BOUNDS_CHECK as you suggest seems like it would be reasonably cheap (throughput-wise) and sound and could certainly catch some cases, so I like the idea.

@benaadams
Copy link
Member Author

benaadams commented Dec 19, 2016

it would certainly be interesting to know if someone's important code is running into this. Experiments so far haven't shown it to be a top issue.

It effects larger loops

Loop size 16 4000
ForLoopArray 13.4183 ns 2.4165 us
ForLoopOneBoundsCheck 13.4226 ns 1.7440 us

Framework code normally takes a function local copy of the array reference before working on it; probably for that reason?

So would likely be non-framework code that would be effected.

@mikedn
Copy link
Contributor

mikedn commented Dec 19, 2016

It effects larger loops

Somehow I doubt that a micro-benchmark is "someone's important code" 😄. Besides, foreach works fine and it is the most common way to iterate over an array.

@JosephTremoulet
Copy link
Contributor

if CSE simply copies the conservative value number from the definition?

I have the change to do that up in PR dotnet/coreclr#9451 now.

changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK

I coded that up too, only to discover that anywhere we can/do CSE the array, we also can/do CSE the array length, so propagating up the tree to arraylens in bounds checks wasn't giving me any diffs over just modifying the CSEd uses themselves, and I pulled that part back out.

dotnet/coreclr#9451 still doesn't get the case above, I'll investigate further...

@JosephTremoulet
Copy link
Contributor

Looks like the reason dotnet/coreclr#9451 doesn't get this case is that range check removal operates on the assertions that assertion prop generated when processing compare/branches, which in turn were pulled from the value numbers annotated on the compare nodes (<, >, etc.). So possibly it's worth special-casing propagating new conservative VNs up the tree to compare nodes when an array length expr gets CSEd...

JosephTremoulet referenced this issue in JosephTremoulet/coreclr Feb 15, 2017
Modify CSE to identify compares which are functions of CSE candidate array
length nodes; when those length nodes get CSEd and consequently have their
value numbers updated, also update the value number on the compare
consuming the array length; this way the assertions generated in assertion
prop for the different compares on the CSEd array will use the same value
numbers, and range check elimination becomes more effective.

Resolves #5371
JosephTremoulet referenced this issue in JosephTremoulet/coreclr Feb 15, 2017
Modify CSE to identify compares which are functions of CSE candidate array
length nodes; when those length nodes get CSEd and consequently have their
value numbers updated, also update the value number on the compare
consuming the array length; this way the assertions generated in assertion
prop for the different compares on the CSEd array will use the same value
numbers, and range check elimination becomes more effective.

Resolves #5371
jorive referenced this issue in guhuro/coreclr May 4, 2017
Modify CSE to identify compares which are functions of CSE candidate array
length nodes; when those length nodes get CSEd and consequently have their
value numbers updated, also update the value number on the compare
consuming the array length; this way the assertions generated in assertion
prop for the different compares on the CSEd array will use the same value
numbers, and range check elimination becomes more effective.

Resolves #5371
@mikedn
Copy link
Contributor

mikedn commented Jan 15, 2018

@dotnet/jit-contrib FYI this has regressed due to dotnet/coreclr#15756

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

7 participants