New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT bounds checks not eliminated for readonly arrays #5990
Comments
Range check can't be eliminated for readonly instance fields because ctor's can call other methods. |
What's the CLR's policy on changing readonly fields through reflection? Is this undefined behavior? I believe so for the static case. I have seen the JIT inline static readonly constants at JIT time. |
It currently works for fields of primitive types. |
I was more interested in what formal guarantees there are. I guess for static readonly there are none. But what about:
Can this ever throw? The JIT might assume the field cannot change. For a static readonly field this would always fail. |
wow! I've been sure Reflection doesn't allow setting readonly fields. void Main()
{
// Static
var staticField = typeof(TestClass).GetField("StaticField", BindingFlags.Static | BindingFlags.Public);
staticField.SetValue(null, 6);
Console.WriteLine(staticField.GetValue(null));
// Instance
var field = typeof(TestClass).GetField("Field", BindingFlags.Instance | BindingFlags.Public);
var instance = new TestClass();
field.SetValue(instance, 6);
Console.WriteLine(field.GetValue(instance));
}
class TestClass
{
public readonly static int StaticField = 7;
public readonly int Field = 10;
} |
@dotnet/jit-contrib FYI |
This has nothing to do with readonly, constructors calling other methods, reflection and whatnot. If you look at the code generated for something like for (int i = 0; i < p.a.Length; i++)
s += p.a[i]; you'll see something like this G_M56240_IG03:
4C8BC9 mov r9, rcx
413BD0 cmp edx, r8d
7316 jae SHORT G_M56240_IG05
4C63D2 movsxd r10, edx
4303449110 add eax, dword ptr [r9+4*r10+16]
FFC2 inc edx
443BC2 cmp r8d, edx
7FE9 jg SHORT G_M56240_IG03 The array field ended up in a register so the JIT doesn't care that the field is readonly or not. As far as it is concerned the field isn't changed inside the loop so it can load it only once, before the loop. I don't know why the range check isn't eliminated but obviously it isn't because the field may change during iteration. And a trivial workaround - use G_M56240_IG03:
4C63C9 movsxd r9, ecx
468B4C8A10 mov r9d, dword ptr [rdx+4*r9+16]
4103C1 add eax, r9d
FFC1 inc ecx
443BC1 cmp r8d, ecx
7FEE jg SHORT G_M56240_IG03 That's because |
@mikedn interesting; your first example is for a regular non-readonly array? So it should either be eliminating the range check or confirming the array is the same anyway? |
According to not very well specified .net memory model JIT could do that even without explicit/implicit local copies. |
@benaadams I'm not quite sure what your question is. What's for sure is that in the example |
I took a quick look at the IR dump for that example and I think the JIT shot itself in the foot. At the start of the analysis Then CSE takes place and the array length ends up in a local variable that's used by both But after CSE the JIT still "remember" that those expressions were different as far as threading is concerned. And range check elimination fails. Seems to me that CSE should fix up JIT's knowledge about multi-threaded access. |
It seems the either the range check should be eliminated as its a gc safe object that's not changing length; regardless of whether the instance variable is changed; or it should reacquire the array on each loop - but its doing neither? (I'd prefer the former 😉 ) |
Well, you can't safely eliminate the range check if the instance variable changes. The only way to do this is to copy the instance variable to a local variable and only use the local variable within the loop. The JIT does make this copy but then fails to eliminate the range check due to another problem (described above). |
Absolutely, yes, it should. What makes this tricky with the current set-up is that the knowledge you're referring to is embedded in the value numbers (specifically the divergence between the liberal ones and the conservative ones), but value numbering runs before CSE and doesn't leave behind enough dependence information to quickly tell which conservative value numbers could be refined (and to what) in response to performing a given CSE. So we'd have to do something akin to re-running value-numbering (and making that incremental enough to not explode the compile-time cost would be the trick), or change how the knowledge is recorded in a way that makes the incremental update more straightforward (e.g. fold CSE and value-numbering together, then let subsequent phases like range-checking use same-SSA-def as a proxy for value-equal; the hurdles here are how to update all that downstream code without upsetting the apple cart, and time to implement). Note that checked/debug builds of the jit have "opt repeat" functionality (enabled by setting COMPlus_JitOptRepeat/COMPlus_JitOptRepeatCount), which will literally re-run value-numbering (and other optimization passes), to make it easier to gauge the severity of this sort of issue; it would certainly be interesting to know if someone's important code is running into this. Experiments so far haven't shown it to be a top issue. |
But what would happen if CSE simply copies the conservative value number from the definition? Could that break something else? As is now it looks weird that we end up with IR where the def and use of a lcl var have different conservative VNs:
use:
Sure, simply changing the VN of the use won't propagate up the tree but maybe it can be special cased for GT_ARR_BOUNDS_CHECK? |
In general there could be multiple defs, and CSE isn't computing which reach each use. But single def is likely the common case and could be special-cased...
I think it would be sound... some possible gotchas:
Right, my biggest concern was about the lack of propagation up the tree and then through the SSA graph. So e.g. the same issue could happen in the index expression and we'd still miss it. That said, adding a special case for GT_ARR_BOUNDS_CHECK as you suggest seems like it would be reasonably cheap (throughput-wise) and sound and could certainly catch some cases, so I like the idea. |
Framework code normally takes a function local copy of the array reference before working on it; probably for that reason? So would likely be non-framework code that would be effected. |
Somehow I doubt that a micro-benchmark is "someone's important code" 😄. Besides, |
I have the change to do that up in PR dotnet/coreclr#9451 now.
I coded that up too, only to discover that anywhere we can/do CSE the array, we also can/do CSE the array length, so propagating up the tree to arraylens in bounds checks wasn't giving me any diffs over just modifying the CSEd uses themselves, and I pulled that part back out. dotnet/coreclr#9451 still doesn't get the case above, I'll investigate further... |
Looks like the reason dotnet/coreclr#9451 doesn't get this case is that range check removal operates on the assertions that assertion prop generated when processing compare/branches, which in turn were pulled from the value numbers annotated on the compare nodes ( |
Modify CSE to identify compares which are functions of CSE candidate array length nodes; when those length nodes get CSEd and consequently have their value numbers updated, also update the value number on the compare consuming the array length; this way the assertions generated in assertion prop for the different compares on the CSEd array will use the same value numbers, and range check elimination becomes more effective. Resolves #5371
Modify CSE to identify compares which are functions of CSE candidate array length nodes; when those length nodes get CSEd and consequently have their value numbers updated, also update the value number on the compare consuming the array length; this way the assertions generated in assertion prop for the different compares on the CSEd array will use the same value numbers, and range check elimination becomes more effective. Resolves #5371
Modify CSE to identify compares which are functions of CSE candidate array length nodes; when those length nodes get CSEd and consequently have their value numbers updated, also update the value number on the compare consuming the array length; this way the assertions generated in assertion prop for the different compares on the CSEd array will use the same value numbers, and range check elimination becomes more effective. Resolves #5371
@dotnet/jit-contrib FYI this has regressed due to dotnet/coreclr#15756 |
Range analysis doesn't take into account readonly arrays and you need to make a function local reference to eliminate the range check.
Analysis https://gist.github.com/benaadams/11123e9785481e07d8ca483bef798689
The text was updated successfully, but these errors were encountered: