-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Describe the bug
The current spec concerning simple assignment is inconsistent with the current Microsoft implementation of C#. In particular, various null checks and index range checks happen in inconsistent order w.r.t. other side effects.
Example See this GitHub Gist.
Expected behavior See below for analysis. All the tests by me were done on Windows 11 version 24H2 with C# compiler that comes with VS 2026 (18.0.1), targeting a few runtimes.
Cross references #169, #401, dotnet/csharplang#9836.
Field assignment
Same issue with #401.
- Spec requires
E.M = F(for reference-typeEand fieldM) to evaluateEtoe, then checkefornull, then evaluateFtof, then storefine.M. - Implementation is to evaluate
Etoe, then evaluateFtof, then checkefornull, then storefine.M.
Note the timing of null check is different!
Property setter (not bug, but we should know for a comprehensive view) (wording bug in spec, see comment 2)
Indexer setter (not bug, for comprehensive view) (wording bug in spec similar to property setter, see comment 2)
Array element assignment, single-dimensional zero-based (SZArray)
This is the same as #169. Last time it was discussed, the resolution was to put the deviation into documentation, instead of changing the spec. I think this should be reconsidered for two reasons:
- It's still not documented after 3 years of resolution.
- Is there really an implementation that doesn't agree with Roslyn? (What's the behavior of the old native compiler?) If all implementations agree, the spec should just be updated.
- Spec requires
A[I] = F(forAof typeT[]andIof appropriate type) to be done by evaluatingAtoa, thenItoi, then checkafornull, then checka,ifor range, then evaluateFtof, then check for variance, then storefina[i]. - Implementation evaluates
Atoa, thenItoi, thenFtof, then checks fornulland range and variance, then storesfina[i].
Array element assignment, multi-dimensional
- Spec requires
A[I1, ..., In] = F(forAof typeT[,...]andIs of appropriate type) to be done by evaluatingAtoa, thenI1toi1, ...,Intoin, then checkafornull, then checka,i1,...,infor range, then evaluateFtof, then check for variance, then storefina[i1, ..., in]. - Implementation (when combined with JIT in .NET Framework 2.0-4.5) agrees with the spec.
- Implementation (when combined with JIT in .NET 8.0) doesn't agree with the spec. It evaluates
Fbefore checking fornulland bounds.
The difference with SZArray is because the C# compiler uses stelem for SZArray, but Set for MDArray. (There's also single-dimensional non-zero-based arrays, but those are unspeakable in C#, so not of concern.)
This is very worth of being documented, because behavior changes between .NET Framework and .NET (Core), which could trip people. Moreover, C# compiler emits Set method call for both .NET Framework and .NET (Core), so it's the JIT doing something differently. Consider cc'ing the JIT people?
In fact, this is very surprising, because C# compiler emits call (not callvirt) to Set, which means as far as JIT is concerned, there's no null check until control has entered the Set method:
// Suppose we're storing into int[,].
// load a onto stack
// load i1 onto stack
// ...
// load in onto stack
// load f onto stack
call instance void int32[,]::Set(int32, int32, int32)
Therefore, the only place NullReferenceException for a can be thrown is inside Set, at which point F has been evaluated into f.
- The current C# compiler always emits incorrect IL as far as IL and (specked) C# semantics are concerned.
- The JIT of .NET Framework compiles IL into incorrect native code as far as IL semantics are concerned. This (sometimes?) make the C# semantics correctly reflected in native code. (
(-1)*(-1)=+1.) - The JIT of .NET (Core) compiles IL into correct native code as far as IL semantics are concerned, so the native code does not reflect C# semantics.
Byref property and byref indexer
It's unclear what the spec says here, because the spec text does not deal with property/indexer access without a setter. (See "12.23.2 Simple assignment", where if the left-hand side is a property/indexer access, it requires a setter.)
This should be specified. The current behavior is as follows:
- Evaluate the receiver.
- Evaluate all the indices (possible 0) from left to right.
- Invoke the byref getter method, which checks for
nullreceiver and surfaces all the side effects of the getter. (We have now obtained the location.) - Evaluate the right-hand side. (We have not obtained the value.)
- Store the value into the location.
What now?
I would advocate rewording of spec to reflect the behavior on .NET 8.0. Moreover, it's based on a very simple principle:
For all constructs prior to C# 7.0,
E.M = FandE[I1, ..., In] = Fshould have the same effect as if it's someE.setter(I1, ..., In, F), regardless of the nature ofMor the indexer. In other words, things that visually appear similar should have the same order of side effects.Since C# 7.0, byref property/indexer assignments translate to
E.getter(I1, ..., In) = F.
From this we deduce
// E.Field = F; // field
[ref] var e = E;
[ref] var f = F; // [ref] in case it's ref reassign into ref field in byref value type
e.Field = [ref] f; // null check for `e` here
// == current impl
// != current spec
// E.SP = F; // property setter
E.set_SP(F); // = current impl = current spec
// E[I1, ..., In] = F; // index setter
E.set_Items(I1, ..., In, F); // = current impl = current spec
// E[I] = F; // SZArray
var e = E;
var i = I;
var f = F;
e[i] = f; // null check, range check, variance check
// == current impl
// != current spec
// E[I1, ..., In] = F; // MDArray
var e = E;
var i1 = I1;
// ...
var in = In;
var f = F;
e[i1, ..., in] = f; // null check, range check, variance check
// == IL emitted by current MS C# compiler
// == JIT'd code in .NET Core 8.0 (JIT is correct)
// != JIT'd code in .NET Framework 2.0-4.5 (JIT "misoptimizes")
// != current spec
// E.BRP = F; // byref property
[ref] var e = E;
ref var target = ref e.get_BRP();
// ^^^ null check, getter side effects
var f = F;
target = f;
// == current impl
// no current spec
// E[I1, ..., In] = F; // byref indexer
[ref] var e = E;
var i1 = I1;
// ...
var in = In;
ref var target = ref e.get_Items(i1, ..., in);
// ^^^ null check, getter side effects
var f = F;
target = f; // null check, getter effects
// == current impl
// no current specSome Considerations. An argument against my proposal is that array element access are closer to byref indexers, because it's possible to say ref E[I1, ..., In] for array type E of rank n. However, that is not true and that is an unfortunate evil due to the worst copycat feature from Java --- array variance (second next paragraph).
For the upsides, my proposal keeps the existing behavior in .NET 8.0 (I suspect it's the behavior in all .NET Core and .NET 5.0+), and the discrepancy between SZ and MD arrays in .NET Framework is not something one would want.
Arrays are really special constructs.
object[] array;
int index;
object item;
array[index] = item;
// not the same as
ref object target = ref array[index];
target = item;
because taking the address (ldelema) requires the static type of array to be exact (no variance allowed since once the ref is obtained, there's no way to control how it's used).
Why the spec should be changed. Due to byref properties and indexers, the meaning of assignment anyway has to be updated to take care of them. This is a good chance to clean up some historical issues.
If the working group is unwilling to change the spec... At least, truly document the implementation deviations. Actually, do also document the behavior of .NET Framework for multi-dimensional arrays (it's very, very surprising to me).