Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# Compiler is not consistent with the CLR for the underlying representation of bool #24652

Closed
tannergooding opened this issue Feb 5, 2018 · 23 comments

Comments

@tannergooding
Copy link
Member

Issue

Today the C# compiler does not emit code handling for bool that is consistent with the underlying representation of bool in the CLR.

The C# specification states:

The bool type represents boolean logical quantities. The possible values of type bool are true and false.

The spec also indicates that it is 1-byte. Specifically that the result of the sizeof(bool) expression is 1.

While the ECMA 335 specification states:

A CLI Boolean type occupies 1 byte in memory. A bit pattern of all zeroes denotes a value of
false. A bit pattern with any one or more bits set (analogous to a non-zero integer) denotes a
value of true. For the purpose of stack operations boolean values are treated as unsigned 1-byte
integers (§III.1.1.1).

This can lead to confusion about the handling and cause various inconsistencies in various edge cases (see dotnet/coreclr#16138 (comment), for such a thread).

Proposal

It would be good to determine:

  • Should the spec be updated?
    • The spec would explicitly list the expected values of true/false so that two implementations don't behave differently
  • Can the compiler be updated?
    • Given that we are emitting for the CLR, we would make our handling match the expected representations for the two boolean values (0/not 0) of the underlying platform

Current Behavior

The majority of the behaviors below actually match the CLR expectation that a bool can be more than just 0 or 1. However some of the behaviors (such as &&) does not match this expectation and can cause issues when interoping with any code that would assume otherwise (generally this is some kind of interop or unsafe code).

  • !value
ldarg.1
ldc.i4.0
ceq
  • left == right
ldarg.1
ldarg.2
ceq
  • left != right
ldarg.1
ldarg.2
ceq
ldc.i4.0
ceq
  • left & right
ldarg.1
ldarg.2
and
  • left | right
ldarg.1
ldarg.2
or
  • left ^ right
ldarg.1
ldarg.2
xor
  • left && right
ldarg.1
ldarg.2
and
  • left || right
ldarg.1
ldarg.2
or
  • value ? "true" : "false"
ldarg.1
brtrue.s
@tannergooding
Copy link
Member Author

Talked with @MadsTorgersen briefly and it was determined logging an issue for this would be good.

There is the normal concern of this being a breaking change, since any change to the handling of the bools will cause code reliant on that behavior to go down a different code path (this, at the very least, impacts the && code pattern, which treats any number with the first bit set as true, and any other value as false).

Also FYI. @jaredpar, @mikedn, @jkotas

@gafter
Copy link
Member

gafter commented Feb 6, 2018

The C# language specification is not specific to any particular runtime platform. The only observable consequence of placing something like that in the specification would be the behavior you get when you overlay a bool and a byte in unsafe code. Specifying that would be a peculiar departure from the normal situation in which the specification avoids telling you in detail the way unsafe code works.

@tannergooding
Copy link
Member Author

The only observable consequence of placing something like that in the specification would be the behavior you get when you overlay a bool and a byte in unsafe code.

Only if you limited such code to C#.

It is entirely possible for C# to be consuming code produced by another language where the bool value conforms to the CLR definition but doesn't line up with the C# implementation. In which case updating the spec would cause the behavior in that scenario to be well-defined instead of undefined or implementation-specific.

@gafter
Copy link
Member

gafter commented Feb 6, 2018

See also #12211 #14262 and https://web.archive.org/web/20160318193105/https://roslyn.codeplex.com/workitem/224

@tannergooding What languages do you have in mind that use a different representation for bool?

@tannergooding
Copy link
Member Author

ILASM, in general. C++/CLI for desktop, although it could realistically be any language compiled to IL (such as C or C++). There are several programs for converting between LLVM bytecode and IL bytecode, for example. extern methods are not unsafe and can also return a bool value that is not 0 or 1.

As a hypothetical example:

.method public hidebysig static 
        bool MyTrue () cil managed 
{
    .maxstack 8

    IL_0000: ldc.i4.2
    IL_0001: ret
}

The C# language would see this as static bool MyTrue() and the value returned would be true, according to the ECMA-335 spec (Common Language Infrastructure specification). However, C# would handle it incorrectly for the case of &&.

@gafter
Copy link
Member

gafter commented Feb 6, 2018

See the last entry in https://web.archive.org/web/20160318193105/https://roslyn.codeplex.com/workitem/224 for how we resolved this in 2014. The resolution was that we would document it (presumably, in https://github.com/dotnet/roslyn/tree/master/docs/compilers) but we do not appear to have done so.

@gafter gafter self-assigned this Feb 6, 2018
@gafter gafter added this to the Unknown milestone Feb 6, 2018
@tannergooding
Copy link
Member Author

tannergooding commented Feb 6, 2018

I don't believe the linked resolution is exactly correct.

The last comment is making a statement that the runtime bools have more than two states. However, it could easily be argued that they do in fact only have two states (true, which is a bit pattern with any one or more bits set; or false, which is a bit pattern of all zeros).

For the runtime, a boolean with a bit pattern of 10 is equivalent to a bit pattern of 01, they represent the same state.

@gafter gafter added this to Documentation in Compiler: Gafter Feb 6, 2018
@tannergooding
Copy link
Member Author

tannergooding commented Feb 6, 2018

I'm also not sure it will be clear what a "non-standard" bool is without clearly defining the underlying bit representation of true and false. Otherwise, the exact definition of a "standard" bool is still dependent on the underlying runtime and the implementation of a given C# compiler.

For example, the current Roslyn implementation is generally inline with the runtime definition, except for a couple of weird scenarios with the short-circuit comparisons where it treats it as true has a bit representation with the least significant big set and false has a representation with the least significant bit cleared.

@HaloFour
Copy link

HaloFour commented Feb 6, 2018

So wouldn't this also mean that any C# assembly exposing a bool wouldn't be CLS compliant because of violations of the ECMA spec?

The argument that C# is independent of the CLR is a copout. C# exists in an ecosystem with other languages bound by the common contract that is the CLS/CLR. More than once has the argument been made against various C# proposals given that they would not play well across that ecosystem. It's ludicrous to expect that every other compiler that might want to target the CLR will have to know about this backwards behavior in order for it to produce assemblies that will work as expected.

@jaredpar
Copy link
Member

jaredpar commented Feb 6, 2018

Can the compiler be updated?
Given that we are emitting for the CLR, we would make our handling match the expected representations for the two boolean values (0/not 0) of the underlying platform

The compiler is operating as designed here and has been for the entirety of its existence. This issue has been brought up several times throughout the history of the compiler. We considered the arguments but ultimately have decided to continue on with this approach.

This issues should be moved to the C# language repository. The compiler is operating as designed here. If there is a call for further clarity in the spec then csharplang is the right place to drive that.

@jaredpar jaredpar closed this as completed Feb 6, 2018
@tannergooding
Copy link
Member Author

tannergooding commented Feb 6, 2018

@jaredpar, I had opened it here, rather than csharplang, at the request of @MadsTorgersen.

This issue has been brought up several times throughout the history of the compiler. We considered the arguments but ultimately have decided to continue on with this approach.

It also continues being brought up as a pain point every few months. I believe it would help if the compiler was at least consistent, but I'm not sure you could call it that.

The current behavior can currently be described as:

  • value, !value, left | right, and left || right
    • false is 0; true is not 0
  • left == right and left != right
    • false is 0; true depends on the underlying bit values being the same
  • left & right, left && right, and left ^ right
    • result is dependent on the underlying bit representation of each value

@tannergooding
Copy link
Member Author

Basically, what I'm seeing is that the C# spec states there are two values, but the implementation is handling it as if there are more.

That is

  • it is not handling it as if it were 0 and not 0
  • it is not handling it as if it were 0 and 1

Instead, it is handling it as 0 and depends on the underlying bit pattern, which I would view as inconsistent with both the C# and the runtime specifications.

@gafter
Copy link
Member

gafter commented Feb 6, 2018

@jaredpar I'm reopening this and assigning it to myself to document the compiler behavior.

@gafter gafter reopened this Feb 6, 2018
@jaredpar
Copy link
Member

jaredpar commented Feb 6, 2018

It also continues being brought up as a pain point every few months. I believe it would help if the compiler was at least consistent, but I'm not sure you could call it that.

The job of the compiler is to emit code that will behave correctly in the face of boolean values that are 1 or 0. The behavior of other values is inconsequential.

Instead, it is handling it as 0 and depends on the underlying bit pattern, which I would view as inconsistent with both the C# and the runtime specifications.

Disagree. The compiler correctly handles the cases that it supports: 0 and 1.

@tannergooding
Copy link
Member Author

The compiler correctly handles the cases that it supports: 0 and 1.

This needs to be documented, if not in the language spec, then in a compiler implementation details spec.

In my opinion, The language spec should explicitly state that handling of values with a bit representation different than true or false is implementation defined. It should also either explicitly list the bit representation of true and false or also call out that they are implementation defined, but non-identical, values.

Additionally, it would be a nice to have to have something (probably in a compiler implementation details specification) explicitly stating that:

  • This behavior differs from the ECMA-335 runtime definition
  • We cannot change this behavior to match the ECMA-335 runtime definition as it would be considered a breaking-change
    • It would also be nice to give examples (they currently fall around ==, !=, &, &&, and ^)

@jaredpar
Copy link
Member

jaredpar commented Feb 6, 2018

This needs to be documented, if not in the language spec, then in a compiler implementation details spec.

This has been documented several times, some in this various repo. Unfortunately searching for terms like "bool" and "true" is meaningless given our issue size. I'm fine with this being documented as behavior of the compiler. Saying our code gen is inconsistent though is incorrect.

@tannergooding
Copy link
Member Author

Unfortunately searching for terms like "bool" and "true" is meaningless given our issue size.

Would it be reasonable to do the following in the language spec?

  • In the relevant sections of the spec, add an explicit Roslyn Implementation Detail annotation that calls out Roslyn specific behavior
    • Such an annotation should be called out as informative text, rather than normative text
  • Add an explicit Roslyn Implementation Details section as an annex to the spec that cross-references the individual implementation details

Saying our code gen is inconsistent though is incorrect.

I was saying it is inconsistent with the spec, which I still think it is, at least until the spec is updated. The compiler is implementing true (1), false (0), and implementation defined handling (2-255). I would view this as explicitly handling three states rather than 2 (which would have to be implemented as true is x, false is not x, as anything else results in a third state, even if that state is undefined or implementation defined)

@gafter
Copy link
Member

gafter commented Feb 6, 2018

Roslyn documentation belongs in the Roslyn repo. I will add some appropriate documentation for this issue in docs/compilers.

@Joe4evr
Copy link

Joe4evr commented Feb 7, 2018

The only observable consequence of placing something like that in the specification would be the behavior you get when you overlay a bool and a byte in unsafe code.

You wouldn't even need any unsafe, explicit layout can do that trick just fine already.

@gafter
Copy link
Member

gafter commented Feb 7, 2018

@Joe4evr Although LayoutKind.Explicit does not require unsafe, it is unsafe. There is no other way to explain the behavior of explicit layout by reference to the C# language specification. It is too late for the C# compiler to require it only be used in an unsafe context.

@GrabYourPitchforks
Copy link
Member

@Joe4evr @gafter I don't think it would make sense to require the unsafe keyword anyway. Explicit layouts are completely verifiable IL as long as you're not overlaying references (type safety violation) or private fields of structs (visibility violation). Blitting simple primitives on top of each other doesn't violate either of these conditions.

@gafter
Copy link
Member

gafter commented Feb 8, 2018

@GrabYourPitchforks Verifiable perhaps, but such code can undermine the compiler's implementation invariants.

@benaadams
Copy link
Member

benaadams commented Feb 18, 2018

ceq says 0 and 1

Compares two values. If they are equal, the integer value 1 (int32) is pushed onto the evaluation stack; otherwise 0 (int32) is pushed onto the evaluation stack.

Though ~0 (i.e. 0xFF) is more useful for bit manipulation; 2-254 are dubious values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants