Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLR bug with static variable in struct could be warned by the compiler #10126

Open
peter-perot opened this issue Mar 26, 2016 · 27 comments
Open
Labels
Area-Compilers Feature Request help wanted The issue is "up for grabs" - add a comment if you are interested in working on it New Feature - Warning Waves Warning Waves
Milestone

Comments

@peter-perot
Copy link

peter-perot commented Mar 26, 2016

The following code compiles, but throws a TypeLoadException when executed.

using System;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var x = MyStruct.Empty;
        }

        public struct MyStruct
        {
            public static readonly MyStruct? Empty = null;
        }
    }
}

Since Empty is static, this is not a cyclic struct layout issue. It would be nice if the compiler produced warnings that tells you that you are likely to run into this CLR bug at runtime

[Note from @gafter this is due to CLR bug https://github.com/dotnet/coreclr/issues/4049. See also https://github.com/dotnet/coreclr/issues/7957 and https://github.com/dotnet/coreclr/issues/22553 for other situations where the warning would be helpful]

@peter-perot
Copy link
Author

@coderealm: null is not a type; it simply means "no reference". The dot operator is also called dereferencing operator. You can have something like public static class MyClass { public static readonly MyStruct? Empty = null; }, and writing MyClass.Empty works. If it was true what you wrote, this should not work either.

@mikedn
Copy link

mikedn commented Mar 31, 2016

null is not a type; it simply means "no reference".

That's for reference types. For value types null is a Nullable<T> in disguise. That means that the static ctor of MyStruct attempts to initialize Empty and that triggers the instantiation of Nullable<MyStruct>. I don't know what happens from here on but attempting to access a type may lead to its static ctor being run. But that's what the runtime was trying to do when Nullable<MyStruct> was instantiated...

If anything this is a runtime bug/limitation, not a C# compiler bug. This issue likely belongs in the coreclr repository.

@peter-perot
Copy link
Author

You are right, @mikedn, a nullable value type is still a value type (Nullable<T>) pretending to be sort of null.

I think the runtime tries to initialize the type MyStruct, then tries to initialize the Empty field during static construction, but before it can do this the generic type Nullable<MyStruct> must be initialized. And here something goes wrong.

On the machine of my colleage the behavior is even more weird: When he is running the code with .NET 4.6.1, an AccessViolationException is thrown, in contrast to the behavior of 4.6 where he gets the TypeLoadException.

@Corey-M
Copy link

Corey-M commented Mar 31, 2016

@coderealm

Because of the dot operator, this is invoking a static field on the type MyStruct. Which is invoking null on a type.

This makes no sense. The dot operator does not "invoke" the value of the static field, whatever that means, it simply returns it. If this were not true we would never be able to store null values in static fields, which we certainly can do.

Regardless, MyStruct.Empty is a value type and so therefore cannot store the null value. Instead the Nullable<MyStruct> structure uses an internal flag to indicate whether the Value member is valid or not. Assigning null to the structure sets the flag, assigning any other value to it resets the flag. No null values are stored, referenced or "invoked."

@Corey-M
Copy link

Corey-M commented Mar 31, 2016

The compiler appears to accept this and produce an Assembly, but on load the CLR looks to be having problems loading the types. Possibly a CLR bug rather than compiler?

@peter-perot
Copy link
Author

@coderealm: Sorry, but what you say is simply not true. Evidence:

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var x = ItWorks.Null;
        }
    }

    class ItWorks
    {
        public static readonly ItWorks Null = null;
    }
}

BTW: If it was true that you are claiming then it would be also impossible to access a simple instance member variable which is null.

Look at the stack overflow discussion and see, what Eric Lippert says about this. He is THE C# language guru with a profound knowledge concerning the .NET runtime and compilers.

@peter-perot
Copy link
Author

@Corey-M: Maybe. I think there is a bug in the type initialization sequence which has to do with the cyclic reference between Nullable<MyStruct>and MyStruct.

@coderealm
Copy link

@peter-perot Now you are mixing a struct with a class with your example above. I presume the CLR will handle this example from your previous example differently because class is a reference type while struct is avalue type.

@coderealm
Copy link

My explanation may not be exactly how the CLR works, but it is not a bug in the CLR, Compiler or C# Language as @peter-perot purports.

@peter-perot
Copy link
Author

@coderealm: Of course I mixed it. When your claim was true, it would even fail with a class. Let's wait for Eric Lippert's analysis (if he finds the time to do that).

@pdelvo
Copy link
Contributor

pdelvo commented Mar 31, 2016

First of all it should never be possible (without reflection or manually messing around with the binaries) to get a TypeLoadException. So there is definitely something wrong here and its not on the end user side.

The static field access is not what is throwing the exception, it is the cctor. This can be seen by replacing var x = MyStruct.Empty; with var x = new MyStruct();. It still throws.

Edit: peverify reports the problem too.

The old compiler reports CS0523.

@coderealm
Copy link

@peter-perot, It is not called dereferencing operator in C#. It is call dot operator in C#. MSDN

@pdelvo
Copy link
Contributor

pdelvo commented Mar 31, 2016

I think this check should not be there

@mikedn
Copy link

mikedn commented Mar 31, 2016

The static field access is not what is throwing the exception, it is the cctor. This can be seen by replacing var x = MyStruct.Empty; with var x = new MyStruct(); . It still throws.

It's not the static ctor that throws, it's the runtime loader that does that. And it turns out that it's not related to type initialization as I suspected, if you don't initialize Empty (which is not needed in this case anyway) you still get an exception.

The old compiler reports CS0523

Yeah, and the error message is incorrect. It says that there is a cycle in the struct layout but such a cycle does not exist. Static fields do not contribute to the type layout.

I think this check should not be there

That chck is correct, the following struct is valid

struct A { static A a; }

@ericlippert
Copy link

There are numerous errors in much of the analysis so far; I will not address those errors in detail. Going forward, it would be better to analyse a more minimal repro:

struct N<T> {}
struct M { public N<M> E; }
class P { static void Main() { var x = default(M); } }

This compiles without error, but crashes with a type load exception at both runtime and when running PEVerify on it.

From this more minimal repro we can see:

(1) The exception is not triggered by use of E. It is triggered by any attempt to access the type M. (As one would expect in the case of a type load exception.)

(2) The exception reproduces whether the field is static or instance, readonly or not; this has nothing to do with the nature of the field.

(3) The exception has nothing whatsoever to do with "invocation"; nothing is being "invoked" in the minimal repro.

(4) The exception has nothing whatsoever to do with the member access operator ".". It does not appear in the minimal repro.

(5) The exception has nothing whatsoever to do with nullables; nothing is nullable in the minimal repro.

So, where then can we lay the blame for the bug? One or more of:

  • The CLR type loader, for crashing.
  • The C# specification, for allowing a program that crashes
  • The C# implementation, for emitting a program that crashes, or
  • The actual program itself

must be wrong, but which one? Were I asked to guess, I would guess that the CLR type loader is at fault here; it seems to me that it ought to be able to handle this type topology. However it may be the case that the loader is correct to reject this program at runtime, and it is the C# specification and its implementation which ought to detect this error. This needs more investigation and I don't have the time today to go spec diving in the CLR.

@coderealm
Copy link

@ericlippert This is more complex than anyone thought. Thanks for your input. Will be looking forward to read further from you on this subject when you can.

@coderealm
Copy link

@peter-perot @Corey-M Having read the blog from @jaredpar Are private members a part of the API surface?. It is a desired implementation of the CLR by Microsoft and your code behaviour is expected. It is not a bug in the CLR. Going back to @ericlippert options for the cause of the exception. The last option is the correct answer. "The actual program itself".

One thing is for sure, my explanation is not how the CLR handled this code. But I knew from the onset that the code was the issue and not the CLR.

As per Jared's blog "This was done in part to ease the porting from C programs."

@ericlippert
Copy link

@coderealm, your analysis continues to be specious.

First, Jared's article does not ever say that the behaviour of the CLR's type loader is as specified or by design; rather, he notes that he did an experiment -- as I did -- to determine its behaviour. The question of whether it is a bug in the CLR must be answered by examination of the specification of the CLR in order to determine whether this behaviour is within the by-design parameters of the runtime, or an error. You cannot determine what was intended by examining behaviour! The definition of a bug is that it is unintended behaviour.

Second, the quote you make from Jared's blog is a footnote on the subject of why definite assignment rules for structs are as they are. It has nothing to do with the issue at hand.

Third, let's suppose for the sake of argument that the behaviour of the CLR is correct. It is deeply unfortunate when the C# language permits an unverifiable, crashing program to be compiled cleanly. Ideally we would like the C# language to match the by-design semantics of the underlying type system implementation when possible at a reasonable cost. If the CLR behaviour is correct -- and I have no evidence from any specification either way -- then the question immediately raised is whether this program compiling cleanly is a flaw in the C# specification.

I would argue that if the CLR is correct here -- and again, I have no evidence one way or the other -- then that motivates adding the same rule to C#, and making this program formally illegal, so that the compiler catches the error rather than the runtime.

You are reasoning from lack of evidence, and that is bad reasoning. The evidence that we have is that a particular program compiles but crashes at runtime. Plainly this is a bad situation to be in. Now, most of the time, when a program compiles but crashes at runtime, it is the fault of a programmer making an unwarranted assumption that is not policed by the type system -- like a bad cast, or a null dereference. In this case, the program is simple, it seems entirely plausible, and there is no clear rule in the C# language that would imply that this program is not verifiable; there's no "unsafe" anywhere, or any such thing.

C# was not designed to be a "gotcha" language; let us consider the possibilities from the language design perspective. The possibilities are:

  1. This plausible program is correct; the CLR has a bug in its type loader.
  2. The CLR type loader is correct; the C# specification and its compiler could be making this an error at compile time with a clear diagnostic error message that informs the developer of the subtle error they've stumbled upon.
  3. The CLR type loader is correct and the C# language contains a "gotcha". A perfectly plausible program is unreliable, and moreover unpredictably unreliable. Remember, this happens the first time a type is used; a mission-critical program might have been running for a day before that type happens to be loaded and the program fails.

The C# team tries very hard to stay out of that third bucket, as they should. You can't simply make the argument that the programmer was wrong to write code like this. Either the code is right, in which case the CLR should be fixed, or the code is wrong, in which case the compiler shouldn't allow it. Allowing plausible code to crash horribly is a terrible situation to be in.

@peter-perot
Copy link
Author

@ericlippert:

Third, let's suppose for the sake of argument that the behaviour of the CLR is correct. It is deeply unfortunate when the C# language permits an unverifiable, crashing program to be compiled cleanly.

100% agreement. And it's even more weird at the setup of my colleage: When he compiles my code against .NET 4.6.1 (I used 4.6), an AccessViolationException is thrown instead of the TypeLoadException. I think, an AccessViolationException in managed code is a much more bad thing.

The C# team tries very hard to stay out of that third bucket, as they should. You can't simply make the argument that the programmer was wrong to write code like this.

Nothing more to add.

I simply don't know why @coderealm insists on his claim that my code is the mistake here.

@gafter gafter added the Blocked label Apr 1, 2016
@gafter
Copy link
Member

gafter commented Apr 1, 2016

The type load error is an implementation limitation of some CLR implementations. It is not required by the specification, and some CLR implementations do not have this bug. We are considering whether to add a warning for this case so that you can suppress the warning if you happen to be targeting a runtime that does not have this bug.

@jaredpar
Copy link
Member

jaredpar commented Apr 1, 2016

Had a chat with the CLR team this morning about this issue and gained clarity on the following items:

  • This program is indeed legal according to the CLI spec.
  • The Desktop CLR and CoreCLR implementations have a limitation that prevent them from allowing this correct program. Other runtimes, such as ProjectN, correctly handle this program.

This is a tricky type of issue to handle in the compiler. This program is correct to spec and has known runtimes where it can execute successfully. Yet Desktop / CoreClr programs are far more prolific than ProjectN ones at this time. Hence the compiler is allowing code that will fail to execute in the majority of cases. That's not a position we like to put ourselves in.

There are a couple option available to us:

  1. Make it an error (go back to 5.0 behavior). This is a back compat break which gives it an incredibly high bar and hence almost certainly won't be done.
  2. Make it a warning. Still a chance of back compat break here but can be mitigated by disabling the warning. Even so new warning still have a moderate bar to meet.
  3. Do nothing.

At this point I think we're going to wait and see how many more instances of this behavior show up. If it's limited to a single case, namely this bug, then I'd be inclined to go with 3. If more instances pop up though we'll look into doing 2.

@gafter gafter added this to the Unknown milestone Apr 1, 2016
@coderealm
Copy link

@peter-perot Well done. Excellent find. I have thrown in the towel. I personally emailed @jaredpar and I appreciate very much for his swift response and clearing the air. Well done you Peter. Thank you @jaredpar

@Suchiman
Copy link
Contributor

Suchiman commented Apr 1, 2016

@jaredpar does option 3 mean that the *CLR will fix this limitation or that it will be broken forever?

@bbarry
Copy link

bbarry commented Apr 1, 2016

Option 3 means the Roslyn project compiler will continue to build this source and emit the expected IL according to the specification.

It says nothing about changes possible in the CLR or CoreCLR or other .NET implementations.

@Suchiman
Copy link
Contributor

Suchiman commented Apr 1, 2016

@bbarry that's not what i asked, i asked specifically what this means to the CLR, SINCE it was not said.

@jaredpar
Copy link
Member

jaredpar commented Apr 1, 2016

Here is the bug tracking the underlying CLR issue:

https://github.com/dotnet/coreclr/issues/4049

@Suchiman
Copy link
Contributor

Suchiman commented Apr 1, 2016

@jaredpar Thanks!

@gafter gafter changed the title Compiler bug in VS 2015 with static variable in struct CLR bug with static variable in struct could be warned by the compiler in VS 2015 Apr 4, 2016
@gafter gafter changed the title CLR bug with static variable in struct could be warned by the compiler in VS 2015 CLR bug with static variable in struct could be warned by the compiler Apr 4, 2016
@jaredpar jaredpar modified the milestone: Unknown Sep 9, 2016
@gafter gafter added New Feature - Warning Waves Warning Waves help wanted The issue is "up for grabs" - add a comment if you are interested in working on it and removed Blocked labels Feb 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Feature Request help wanted The issue is "up for grabs" - add a comment if you are interested in working on it New Feature - Warning Waves Warning Waves
Projects
None yet
Development

No branches or pull requests