RyuJIT bounds checks not eliminated for readonly arrays #5990

benaadams · 2016-06-01T16:18:00Z

Range analysis doesn't take into account readonly arrays and you need to make a function local reference to eliminate the range check.

Analysis https://gist.github.com/benaadams/11123e9785481e07d8ca483bef798689

Method	Median	StdDev	Scaled
ForLoopArray	2.5021 us	0.0591 us	1.00
ForLoopOneBoundsCheck	2.3430 us	0.0296 us	0.94
ForLoopReadOnlyArray	2.5047 us	0.0334 us	1.00
ForLoopAddPointer	2.3360 us	0.0059 us	0.93
ForEachArray	2.3432 us	0.0051 us	0.94

omariom · 2016-06-01T17:00:03Z

Range check can't be eliminated for readonly instance fields because ctor's can call other methods.
But if a static readonly field has already been initialized JIT could skip generatin range checks.

GSPP · 2016-06-01T21:57:17Z

What's the CLR's policy on changing readonly fields through reflection? Is this undefined behavior? I believe so for the static case. I have seen the JIT inline static readonly constants at JIT time.

omariom · 2016-06-01T22:04:16Z

I have seen the JIT inline static readonly constants at JIT time.

It currently works for fields of primitive types.
would be interesting to extend it further to arrays and dynamic dispatch

GSPP · 2016-06-01T22:09:57Z

I was more interested in what formal guarantees there are. I guess for static readonly there are none. But what about:

var x = this.readonlyField;
SetXThroughReflection();
Assert(x == this.readonlyField);

Can this ever throw? The JIT might assume the field cannot change.

For a static readonly field this would always fail.

omariom · 2016-06-01T22:26:08Z

wow! I've been sure Reflection doesn't allow setting readonly fields.
But it does!

void Main()
{
// Static
    var staticField = typeof(TestClass).GetField("StaticField", BindingFlags.Static | BindingFlags.Public);

    staticField.SetValue(null, 6);

    Console.WriteLine(staticField.GetValue(null));

// Instance
    var field = typeof(TestClass).GetField("Field", BindingFlags.Instance | BindingFlags.Public);
    var instance = new TestClass();
    field.SetValue(instance, 6);

    Console.WriteLine(field.GetValue(instance));
}

class TestClass
{
    public readonly static int StaticField = 7;
    public readonly int Field = 10;
}

RussKeldorph · 2016-06-02T17:21:48Z

@dotnet/jit-contrib FYI

mikedn · 2016-12-18T17:49:53Z

This has nothing to do with readonly, constructors calling other methods, reflection and whatnot.

If you look at the code generated for something like

for (int i = 0; i < p.a.Length; i++)
    s += p.a[i];

you'll see something like this

G_M56240_IG03:
       4C8BC9               mov      r9, rcx
       413BD0               cmp      edx, r8d
       7316                 jae      SHORT G_M56240_IG05
       4C63D2               movsxd   r10, edx
       4303449110           add      eax, dword ptr [r9+4*r10+16]
       FFC2                 inc      edx
       443BC2               cmp      r8d, edx
       7FE9                 jg       SHORT G_M56240_IG03

The array field ended up in a register so the JIT doesn't care that the field is readonly or not. As far as it is concerned the field isn't changed inside the loop so it can load it only once, before the loop. I don't know why the range check isn't eliminated but obviously it isn't because the field may change during iteration.

And a trivial workaround - use foreach when possible. It generates this

G_M56240_IG03:
       4C63C9               movsxd   r9, ecx
       468B4C8A10           mov      r9d, dword ptr [rdx+4*r9+16]
       4103C1               add      eax, r9d
       FFC1                 inc      ecx
       443BC1               cmp      r8d, ecx
       7FEE                 jg       SHORT G_M56240_IG03

That's because foreach itself loads the field into a local variable before the loop.

benaadams · 2016-12-19T11:48:19Z

@mikedn interesting; your first example is for a regular non-readonly array? So it should either be eliminating the range check or confirming the array is the same anyway?

omariom · 2016-12-19T12:05:30Z

According to not very well specified .net memory model JIT could do that even without explicit/implicit local copies.

mikedn · 2016-12-19T12:12:44Z

@benaadams I'm not quite sure what your question is. What's for sure is that in the example p.a may be readonly or not, the same code is generated. readonly would be useful in cases where the loop contains code that might modify the array field and the JIT can't figure it out.

mikedn · 2016-12-19T12:56:52Z

I took a quick look at the IR dump for that example and I think the JIT shot itself in the foot.

At the start of the analysis p.a.Length and p.a[i] considered the same in some contexts (CSE) and different in other contexts (range check optimization). That's normal, CSE of a load is allowed by the memory model. But you can't eliminate a range check if another thread may change the array from under you, that could have dire consequences.

Then CSE takes place and the array length ends up in a local variable that's used by both p.a.Length and the range check implied by p.a[i].

But after CSE the JIT still "remember" that those expressions were different as far as threading is concerned. And range check elimination fails.

Seems to me that CSE should fix up JIT's knowledge about multi-threaded access.

benaadams · 2016-12-19T13:13:11Z

I'm not quite sure what your question is.

It seems the either the range check should be eliminated as its a gc safe object that's not changing length; regardless of whether the instance variable is changed; or it should reacquire the array on each loop - but its doing neither? (I'd prefer the former 😉 )

mikedn · 2016-12-19T13:28:34Z

regardless of whether the instance variable is changed

Well, you can't safely eliminate the range check if the instance variable changes. The only way to do this is to copy the instance variable to a local variable and only use the local variable within the loop. The JIT does make this copy but then fails to eliminate the range check due to another problem (described above).

JosephTremoulet · 2016-12-19T16:42:39Z

Seems to me that CSE should fix up JIT's knowledge about multi-threaded access.

Absolutely, yes, it should. What makes this tricky with the current set-up is that the knowledge you're referring to is embedded in the value numbers (specifically the divergence between the liberal ones and the conservative ones), but value numbering runs before CSE and doesn't leave behind enough dependence information to quickly tell which conservative value numbers could be refined (and to what) in response to performing a given CSE. So we'd have to do something akin to re-running value-numbering (and making that incremental enough to not explode the compile-time cost would be the trick), or change how the knowledge is recorded in a way that makes the incremental update more straightforward (e.g. fold CSE and value-numbering together, then let subsequent phases like range-checking use same-SSA-def as a proxy for value-equal; the hurdles here are how to update all that downstream code without upsetting the apple cart, and time to implement).

Note that checked/debug builds of the jit have "opt repeat" functionality (enabled by setting COMPlus_JitOptRepeat/COMPlus_JitOptRepeatCount), which will literally re-run value-numbering (and other optimization passes), to make it easier to gauge the severity of this sort of issue; it would certainly be interesting to know if someone's important code is running into this. Experiments so far haven't shown it to be a top issue.

mikedn · 2016-12-19T16:59:38Z

but value numbering runs before CSE and doesn't leave behind enough dependence information to quickly tell which conservative value numbers could be refined (and to what) in response to performing a given CSE

But what would happen if CSE simply copies the conservative value number from the definition? Could that break something else? As is now it looks weird that we end up with IR where the def and use of a lcl var have different conservative VNs:
def:

N011 (  7,  7) [000106] -A-XG---R---                      \--*  ASG       int    $VN.Void
N010 (  1,  1) [000105] D------N----                         \--*  LCL_VAR   int    V11 cse0          <l:$105, c:$104>

use:

N006 (  6,  9) [000046] ---X--------             |  |  |  |  \--*  ARR_BOUNDS_CHECK_Rng void   <l:$2c7, c:$2c6>
N004 (  1,  1) [000110] ------------             |  |  |  |     +--*  LCL_VAR   int    V11 cse0          <l:$105, c:$106>
N005 (  1,  1) [000022] ------------             |  |  |  |     \--*  LCL_VAR   int    V08 loc1         u:4 $381

Sure, simply changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK?

JosephTremoulet · 2016-12-19T18:33:23Z

But what would happen if CSE simply copies the conservative value number from the definition?

In general there could be multiple defs, and CSE isn't computing which reach each use. But single def is likely the common case and could be special-cased...

Could that break something else?

I think it would be sound... some possible gotchas:

I'd worry a little about losing the congruence between a rewritten use and another tree with the same conservative value number, but you'd hope that two trees with the same conservative number would also have the same liberal number and therefore CSE would rewrite both of them.
Hopefully we don't have code that somehow demands that an OpFoo node whose value number is FooFunc have IR arguments whose value numbers are the arguments of the FooFunc expression, as that would break. I'd guess we don't have that, so would be ok.
Any data structures we have that key off conservative value number or track congruence classes by conservative value number or etc. would of course also need scrutiny. I'm not aware of any off the top of my head. When CSE is running, it of course has grouped exprs, but by liberal value number...

Sure, simply changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK?

Right, my biggest concern was about the lack of propagation up the tree and then through the SSA graph. So e.g. the same issue could happen in the index expression and we'd still miss it. That said, adding a special case for GT_ARR_BOUNDS_CHECK as you suggest seems like it would be reasonably cheap (throughput-wise) and sound and could certainly catch some cases, so I like the idea.

benaadams · 2016-12-19T22:02:20Z

it would certainly be interesting to know if someone's important code is running into this. Experiments so far haven't shown it to be a top issue.

It effects larger loops

Loop size	16	4000
ForLoopArray	13.4183 ns	2.4165 us
ForLoopOneBoundsCheck	13.4226 ns	1.7440 us

Framework code normally takes a function local copy of the array reference before working on it; probably for that reason?

So would likely be non-framework code that would be effected.

mikedn · 2016-12-19T22:21:11Z

It effects larger loops

Somehow I doubt that a micro-benchmark is "someone's important code" 😄. Besides, foreach works fine and it is the most common way to iterate over an array.

JosephTremoulet · 2017-02-09T18:08:27Z

if CSE simply copies the conservative value number from the definition?

I have the change to do that up in PR dotnet/coreclr#9451 now.

changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK

I coded that up too, only to discover that anywhere we can/do CSE the array, we also can/do CSE the array length, so propagating up the tree to arraylens in bounds checks wasn't giving me any diffs over just modifying the CSEd uses themselves, and I pulled that part back out.

dotnet/coreclr#9451 still doesn't get the case above, I'll investigate further...

JosephTremoulet · 2017-02-10T22:19:01Z

Looks like the reason dotnet/coreclr#9451 doesn't get this case is that range check removal operates on the assertions that assertion prop generated when processing compare/branches, which in turn were pulled from the value numbers annotated on the compare nodes (<, >, etc.). So possibly it's worth special-casing propagating new conservative VNs up the tree to compare nodes when an array length expr gets CSEd...

Modify CSE to identify compares which are functions of CSE candidate array length nodes; when those length nodes get CSEd and consequently have their value numbers updated, also update the value number on the compare consuming the array length; this way the assertions generated in assertion prop for the different compares on the CSEd array will use the same value numbers, and range check elimination becomes more effective. Resolves #5371

mikedn · 2018-01-15T21:30:36Z

@dotnet/jit-contrib FYI this has regressed due to dotnet/coreclr#15756

JosephTremoulet closed this as completed in dotnet/coreclr#9615 Feb 17, 2017

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

dotnet locked as resolved and limited conversation to collaborators Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT bounds checks not eliminated for readonly arrays #5990

RyuJIT bounds checks not eliminated for readonly arrays #5990

benaadams commented Jun 1, 2016

omariom commented Jun 1, 2016 •

edited

GSPP commented Jun 1, 2016

omariom commented Jun 1, 2016

GSPP commented Jun 1, 2016 •

edited

omariom commented Jun 1, 2016 •

edited

RussKeldorph commented Jun 2, 2016

mikedn commented Dec 18, 2016

benaadams commented Dec 19, 2016

omariom commented Dec 19, 2016

mikedn commented Dec 19, 2016

mikedn commented Dec 19, 2016

benaadams commented Dec 19, 2016 •

edited

mikedn commented Dec 19, 2016

JosephTremoulet commented Dec 19, 2016

mikedn commented Dec 19, 2016 •

edited

JosephTremoulet commented Dec 19, 2016

benaadams commented Dec 19, 2016 •

edited

mikedn commented Dec 19, 2016

JosephTremoulet commented Feb 9, 2017

JosephTremoulet commented Feb 10, 2017

mikedn commented Jan 15, 2018

RyuJIT bounds checks not eliminated for readonly arrays #5990

RyuJIT bounds checks not eliminated for readonly arrays #5990

Comments

benaadams commented Jun 1, 2016

omariom commented Jun 1, 2016 • edited

GSPP commented Jun 1, 2016

omariom commented Jun 1, 2016

GSPP commented Jun 1, 2016 • edited

omariom commented Jun 1, 2016 • edited

RussKeldorph commented Jun 2, 2016

mikedn commented Dec 18, 2016

benaadams commented Dec 19, 2016

omariom commented Dec 19, 2016

mikedn commented Dec 19, 2016

mikedn commented Dec 19, 2016

benaadams commented Dec 19, 2016 • edited

mikedn commented Dec 19, 2016

JosephTremoulet commented Dec 19, 2016

mikedn commented Dec 19, 2016 • edited

JosephTremoulet commented Dec 19, 2016

benaadams commented Dec 19, 2016 • edited

mikedn commented Dec 19, 2016

JosephTremoulet commented Feb 9, 2017

JosephTremoulet commented Feb 10, 2017

mikedn commented Jan 15, 2018

omariom commented Jun 1, 2016 •

edited

GSPP commented Jun 1, 2016 •

edited

omariom commented Jun 1, 2016 •

edited

benaadams commented Dec 19, 2016 •

edited

mikedn commented Dec 19, 2016 •

edited

benaadams commented Dec 19, 2016 •

edited