Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function returned by Microsoft.FSharp.Reflection.FSharpValue.PreComputeUnionConstructor throwing FatalExecutionEngineError when invoked #566

Closed
mjmckp opened this issue Aug 3, 2015 · 19 comments

Comments

Projects
None yet
7 participants
@mjmckp
Copy link
Contributor

commented Aug 3, 2015

The function returned by Microsoft.FSharp.Reflection.FSharpValue.PreComputeUnionConstructor is throwing the following error very occasionally when used to create a value of type float option via reflection.

FatalExecutionEngineError occurred
Message: Managed Debugging Assistant 'FatalExecutionEngineError' has detected a problem in 'C:\dev\vs2015\src\repro\bin\Debug\repro.exe'.
Additional information: The runtime has encountered a fatal error. The address of the error was at 0xf582d7f4, on thread 0xdaa0. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.

This seems to have started happening after I installed VS2015, however targeting earlier versions of either FSharp.Core or the .NET runtime makes no difference. I've also tried setting <useLegacyJit enabled="1"/> in the app.config due to the known bug in RyuJIT (http://nickcraver.com/blog/2015/07/27/why-you-should-wait-on-dotnet-46/), however this also makes no difference. I've only tried this on Windows 7, so I don't know if it's OS specific. It happens in both Debug and Release build.

I've managed to create a reproduction of the error. Compile and run the project linked to below with the conditional compilation symbol 'BUG' defined, and it should fail with the above error. Then compile and run without 'BUG' defined to replace the use of FSharpValue.PreComputeUnionConstructor with equivalent non reflection based code, and the program will complete without error.

The project can be downloaded from here: https://www.dropbox.com/l/BXokpag4b4pFtBzeTJpion Note that it's a 20Mb file as it contains 2 embedded data files which contain serialised arrays that trigger the error.

@sivarv

This comment has been minimized.

Copy link
Member

commented Aug 4, 2015

This bug repros even with

Complus_useLegacyJIT=1 ; use legacy Jit64
complus_zapDisable=1 ; don’t load ngen images, instead jit

The point of failure (or call stack) is the same even against RyuJIT.

Here is the call stack for AV under debugger:

# Child-SP          RetAddr           Call Site
00 000000dd`36c8de28 00007fff`15983bad clr!SVR::memcopy+0x4d [f:\dd\ndp\clr\src\vm\gc.cpp @ 1870]
01 (Inline Function) --------`-------- clr!WKS::gc_heap::gcmemcopy+0x23 [f:\dd\ndp\clr\src\vm\gc.cpp @ 23963]
02 000000dd`36c8de30 00007fff`15983a3f clr!WKS::gc_heap::compact_plug+0x144 [f:\dd\ndp\clr\src\vm\gc.cpp @ 24051]
03 000000dd`36c8de80 00007fff`159839c4 clr!WKS::gc_heap::compact_in_brick+0x97 [f:\dd\ndp\clr\src\vm\gc.cpp @ 24191]
04 000000dd`36c8dec0 00007fff`1598716b clr!WKS::gc_heap::compact_in_brick+0x37 [f:\dd\ndp\clr\src\vm\gc.cpp @ 24156]
05 000000dd`36c8df00 00007fff`15982df7 clr!WKS::gc_heap::compact_phase+0x1f4 [f:\dd\ndp\clr\src\vm\gc.cpp @ 24332]
06 000000dd`36c8dfa0 00007fff`15987f7a clr!WKS::gc_heap::plan_phase+0x11c1 [f:\dd\ndp\clr\src\vm\gc.cpp @ 22129]
07 000000dd`36c8e2f0 00007fff`15987e92 clr!WKS::gc_heap::gc1+0xe2 [f:\dd\ndp\clr\src\vm\gc.cpp @ 14977]
08 000000dd`36c8e340 00007fff`15988597 clr!WKS::gc_heap::garbage_collect+0x6c7 [f:\dd\ndp\clr\src\vm\gc.cpp @ 16502]
09 000000dd`36c8e3d0 00007fff`15b798ca clr!WKS::GCHeap::GarbageCollectGeneration+0x2a0 [f:\dd\ndp\clr\src\vm\gc.cpp @ 34805]
0a (Inline Function) --------`-------- clr!WKS::gc_heap::try_allocate_more_space+0x26af [f:\dd\ndp\clr\src\vm\gc.cpp @ 12795]
0b 000000dd`36c8e420 00007fff`1598e45f clr!WKS::gc_heap::allocate_more_space+0x26de [f:\dd\ndp\clr\src\vm\gc.cpp @ 13103]
0c (Inline Function) --------`-------- clr!WKS::gc_heap::allocate+0x2a86 [f:\dd\ndp\clr\src\vm\gc.cpp @ 13134]
0d 000000dd`36c8e460 00007fff`1588ab6c clr!WKS::GCHeap::Alloc+0x121 [f:\dd\ndp\clr\src\vm\gc.cpp @ 34247]
0e (Inline Function) --------`-------- clr!Alloc+0x4c [f:\dd\ndp\clr\src\vm\gcscan.cpp @ 126]
0f 000000dd`36c8e4b0 00007fff`159dc42c clr!AllocateArrayEx+0x1e7 [f:\dd\ndp\clr\src\vm\gcscan.cpp @ 1369]
10 000000dd`36c8e5e0 00007ffe`b62c4409 clr!JIT_NewArr1+0x174 [f:\dd\ndp\clr\src\vm\jithelpers.cpp @ 3062]
11 000000dd`36c8e7b0 00007ffe`b63fda82 mscorlib_dd52860000!System.Reflection.MethodBase.CheckArguments(System.Object[], System.Reflection.Binder, System.Reflection.BindingFlags, System.Globalization.CultureInfo, System.Signature)+0x39 [f:\dd\ndp\clr\src\BCL\system\reflection\methodbase.cs @ 342]
12 000000dd`36c8e860 00007ffe`b63fd75e mscorlib_dd52860000!System.Reflection.RuntimeMethodInfo.InvokeArgumentsCheck(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)+0xd2 [f:\dd\ndp\clr\src\BCL\system\reflection\methodinfo.cs @ 796]
13 000000dd`36c8e8e0 00007fff`158815c6 mscorlib_dd52860000!System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)+0x7e [f:\dd\ndp\clr\src\BCL\system\reflection\methodinfo.cs @ 712]
14 000000dd`36c8ea00 00007ffe`b654e102 clr!JIT_TailCallHelperStub_ReturnAddress [f:\dd\ndp\clr\src\vm\amd64\JitHelpers_Fast.asm @ 1057]
15 000000dd`36c8eaa0 00007ffe`b654e102 repro!Program+Program+readValue@21-1.Invoke(System.IO.BinaryReader)+0x52*** WARNING: Unable to verify checksum for repro.exe
[D:\bugs\fsrepro\Program.fs @ 24]
16 000000dd`36c8eaf0 00007ffe`b65470dd repro!Program+Program+readValue@21-1.Invoke(System.IO.BinaryReader)+0x52 [D:\bugs\fsrepro\Program.fs @ 24]
17 000000dd`36c8eb40 00007ffe`b62c8553 repro!Program+Program.readFile[[System.__Canon, mscorlib]](System.String, Microsoft.FSharp.Core.FSharpFunc`2<System.IO.BinaryReader,System.__Canon>)+0x12d [D:\bugs\fsrepro\Program.fs @ 66]
18 000000dd`36c8ebf0 00007fff`15883373 repro!Program+Program.main(System.String[])+0x133 [D:\bugs\fsrepro\Program.fs @ 73]
19 000000dd`36c8ec60 00007fff`1588325c clr!CallDescrWorkerInternal+0x83 [f:\dd\ndp\clr\src\vm\amd64\CallDescrWorkerAMD64.asm @ 97]
1a 000000dd`36c8eca0 00007fff`15884047 clr!CallDescrWorkerWithHandler+0x4e [f:\dd\ndp\clr\src\vm\callhelpers.cpp @ 89]
1b 000000dd`36c8ece0 00007fff`15999481 clr!MethodDescCallSite::CallTargetWorker+0x2af [f:\dd\ndp\clr\src\vm\callhelpers.cpp @ 649]
1c (Inline Function) --------`-------- clr!MethodDescCallSite::Call_RetArgSlot+0x5 [f:\dd\ndp\clr\src\vm\callhelpers.h @ 423]
1d 000000dd`36c8ee70 00007fff`1599926a clr!RunMain+0x20d [f:\dd\ndp\clr\src\vm\assembly.cpp @ 2598]
1e 000000dd`36c8f050 00007fff`159991a5 clr!Assembly::ExecuteMainMethod+0xba [f:\dd\ndp\clr\src\vm\assembly.cpp @ 2719]
1f 000000dd`36c8f340 00007fff`1599a192 clr!SystemDomain::ExecuteMainMethod+0x6b9 [f:\dd\ndp\clr\src\vm\appdomain.cpp @ 3738]
20 000000dd`36c8f980 00007fff`1599a146 clr!ExecuteEXE+0x43 [f:\dd\ndp\clr\src\vm\ceemain.cpp @ 3016]
21 000000dd`36c8f9f0 00007fff`1599a088 clr!_CorExeMainInternal+0xb2 [f:\dd\ndp\clr\src\vm\ceemain.cpp @ 2850]
22 000000dd`36c8fa80 00007fff`22137049 clr!_CorExeMain+0x14 [f:\dd\ndp\clr\src\vm\ceemain.cpp @ 2777]
23 000000dd`36c8fac0 00007fff`22261184 mscoreei!_CorExeMain+0xe0 [f:\dd\ndp\clr\src\dlls\shim\shim.cpp @ 6420]
24 000000dd`36c8fb10 00007fff`222610ab MSCOREE!ShellShim__CorExeMain+0xb8 [d:\th\com\netfx\windowsbuilt\shell_shim\v2api.cpp @ 284]
25 000000dd`36c8fb40 00007fff`304d2d92 MSCOREE!_CorExeMain_Exported+0xb [d:\th\com\netfx\windowsbuilt\shell_shim\v2api.cpp @ 1223]
26 000000dd`36c8fb70 00007fff`32df9f64 KERNEL32!BaseThreadInitThunk+0x22 [d:\th\base\win32\client\thread.c @ 64]
27 000000dd`36c8fba0 00000000`00000000 ntdll!RtlUserThreadStart+0x34 [d:\th\minkernel\ntdll\rtlstrt.c @ 994]
@mjmckp

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2015

I've found that using server GC instead of workstation GC works, at least in this case. That is, the problem goes away by adding the following to the app.config file:

  <runtime>
    <gcServer enabled="true" />
  </runtime>

This might seem like a bit of an esoteric bug, but it is happening to me quite frequently at the moment whenever I try and process large amounts of data with F#.

I only found this workaround because I was getting absolutely terrible workstation GC performance since updating to VS2105. Things that used to take seconds now take many minutes, as is being reported by others:
http://blogs.microsoft.co.il/dorony/2015/08/08/beware-of-net-4-6-workstation-gc/
http://stackoverflow.com/questions/31747992/garbage-collection-and-parallel-foreach-issue-after-vs2015-upgrade

@Maoni0

This comment has been minimized.

Copy link
Member

commented Aug 13, 2015

For the performance issue, please see my blog entry: http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx. You could use Server GC for now (or continue to use it if it works well for you) and try the fix when it's available (should be very soon).

@mjmckp

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2015

That's great news, thanks Maoni! What about the original error? From sivarv's stack trace above, it seems like the exception is actually thrown inside the GC. I presume you are on the GC team at MS, perhaps you could help give this issue a little oxygen?

@karelz

This comment has been minimized.

Copy link
Member

commented Aug 13, 2015

I am trying to repro your original issue (without success so far).
The issue looks like typical managed heap corruption from the callstack (which is typically not GC problem). I am working with Siva to get my hands on repro environment (+ I am running it in parallel on my machine in a loop) -- once I have a repro somewhere, I plan to try GCStress (or iDNA tracing) to get something actionable.

@mjmckp

This comment has been minimized.

Copy link
Contributor Author

commented Sep 14, 2015

@karelz Any progress on this issue?

@karelz

This comment has been minimized.

Copy link
Member

commented Sep 14, 2015

I routed it for further investigation last week (sorry I didn't have time to dig deeper myself earlier).
It is most likely a bug in Runtime. We have some partial results (component likely to blame), but not final results yet. I'll keep you posted.

@Maoni0

This comment has been minimized.

Copy link
Member

commented Sep 14, 2015

I looked at this today. And indeed this is a bug in GC. Thanks for discovering it!

You are hitting an extreme case (as in the survivors in your case were the smallest objects possible that were all separated from each other by objects that died) and my calculation was a tiny bit off which showed up in this extreme case. I will fix this ASAP. I'll need to check with our release folks when/how it will be shipped (that part is out of my control).

Thanks again for finding this. I will make sure this case is covered in our test suite.

@mjmckp

This comment has been minimized.

Copy link
Contributor Author

commented Sep 15, 2015

No worries, thanks for the prompt attention, I hope a fix can be released soon!

@bmenees

This comment has been minimized.

Copy link

commented Sep 18, 2015

Maoni, I'm pretty sure I need this GC fix too. I can reproduce an ExecutionEngineException in a multi-threaded unit test running in vstest.executionengine.exe (from a command line call to vstest.console.exe). It's all managed code on my end, and this test has worked for years in VS 2012 and 2013. But now with .NET 4.6 and VS 2015, the same unit test will fail about half the time.

I used DebugDiag to create a full dump whenever ntdll!NtTerminateProcess was called in vstest.executionengine.exe, and the call stack matched the important parts above (where things go wrong in clr!WKS::gc_heap::compact_plug). Looking at things in WinDbg, there doesn't appear to be any heap corruption. It appears to match this GC bug most closely except I end up with an ExecutionEngineException.

I'd appreciate it if you could post here (or on your blog) any release date info and other details so I could track this. I'm also anxiously awaiting the perf fix you blogged about last month.

Thanks!

ntdll!NtTerminateProcess
ntdll!RtlExitUserProcess+0x5b
kernel32!ExitProcessImplementation+0xa
mscoreei!RuntimeDesc::ShutdownAllActiveRuntimes+0x26c
mscoreei!CLRRuntimeHostInternalImpl::ShutdownAllRuntimesThenExit+0x14
clr!EEPolicy::ExitProcessViaShim+0x69
clr!SafeExitProcess+0x9d
clr!EEPolicy::HandleFatalError+0x123
clr!CLRVectoredExceptionHandlerPhase3+0x305292
clr!CLRVectoredExceptionHandlerPhase2+0x2d
clr!CLRVectoredExceptionHandler+0x94
clr!CLRVectoredExceptionHandlerShim+0x89
ntdll!RtlpCallVectoredHandlers+0x113
ntdll!RtlDispatchException+0x6e
ntdll!KiUserExceptionDispatch+0x3a
clr!WKS::gc_heap::compact_plug+0xb7
clr!WKS::gc_heap::compact_in_brick+0x6a
clr!WKS::gc_heap::compact_in_brick+0x98
clr!WKS::gc_heap::compact_phase+0x1c5
clr!WKS::gc_heap::plan_phase+0xc32
clr!WKS::gc_heap::gc1+0x9d
clr!WKS::gc_heap::garbage_collect+0x264
clr!WKS::GCHeap::GarbageCollectGeneration+0x13a
clr!WKS::gc_heap::allocate_more_space+0x1c5
clr!JIT_New+0x38c
@Maoni0

This comment has been minimized.

Copy link
Member

commented Sep 18, 2015

@bmenees, thank you for reporting this. I would really love to get the fix out to you guys ASAP but unfortunately that part is out of my control. I am sure our servicing engineers are working as fast as they can in order to get fixes out but it takes time to go through the normal hotfix process. I have asked our release manager Lee to comment. I'll check again to see if he could comment soon.

@karelz

This comment has been minimized.

Copy link
Member

commented Sep 18, 2015

@bmenees Why do you think you are hitting exactly the same bug as the original report? The callstack is pretty generic and does not identify root cause, just a symptom, so it is not sufficient for confirming if your problem is a dupe or not.

@bmenees

This comment has been minimized.

Copy link

commented Sep 18, 2015

@karelz It's an educated guess based on working with .NET since 2002 using a mix of managed and unmanaged code. I've run into ExecutionEngineExceptions in the past, and usually they were due to heap corruption issues in my (team's) unmanaged code. But once I was affected by an ExecutionEngineException that was due to a bug in the CLR.

I have a lot of experience analyzing user dumps in WinDbg and looking at valid and corrupt unmanaged heaps as well as the GC data. I'm also familiar with what this particular unit test is doing and the huge amount of pressure it puts on the GC. Everything I see in the behavior of this test now, both interactively and in the user dump, points to a GC bug and not heap corruption.

I also know it does not happen when I run it on a machine with .NET 4.5.2. It only happens on machines with .NET 4.6. Also, the performance of this unit test bombs when it runs on .NET 4.6. And when I run the same code in a process where I've forced server GC usage, it runs fine. And the RyuJit flags have no effect on this test. So all signs, including the call stack and my Magic 8 Ball, point me to this thread.

On Monday, I'll try to open an MSDN Support incident to escalate this to someone at Microsoft so they can take an official look at it. But that's usually a long and involved process that takes me multiple days to get to the right escalation engineer, and then it takes many more days of corresponding about repros, generating the right dumps, handling dump uploads with matching binaries, etc. It was much nicer to end my work week with some hope that the lead developer on the GC has already found and fixed the perf and crash bugs in .NET 4.6 that are affecting me. :-)

@karelz

This comment has been minimized.

Copy link
Member

commented Sep 18, 2015

@bmenees It's always good to see someone educated about heap corruptions and GC. Sadly it's quite rare.
I believe in your experience and that your issue is likely a GC bug. The question remaining is: Is it the same as the one reproed on this thread. Would it be possible to provide a repro for us to verify on our private bits with a fix?
If you go through support and it is indeed a GC issue, it will end up on our plate eventually anyway ... consider this as a short cut :)

@leecow

This comment has been minimized.

Copy link
Member

commented Sep 19, 2015

Hey folks,

Thanks for the report and additional feedback. Maoni has put together a fix for this and it will be making it's way through testing and the rest of the servicing machinery. We just missed the fix window for October so this will release in mid November.

@mjmckp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 14, 2015

Hi all, any info on the progress of the release of the fix for this?

@leecow

This comment has been minimized.

Copy link
Member

commented Nov 16, 2015

Hey folks,

By way of background, hotfixes for the Framework can include changes from any of the different teams which contribute to the Framework. In the case of the November release, a problem was found with an unrelated fix leading to the entire set of fixes being delayed until around the 2nd week of December.

@bmenees

This comment has been minimized.

Copy link

commented Dec 1, 2015

@leecow, is this fix in .NET 4.6.1 (since it was released on November 30)?

@leecow

This comment has been minimized.

Copy link
Member

commented Dec 1, 2015

Hi @bmenees , yes, the fix is in yesterday's 4.6.1 release.

@mjmckp mjmckp closed this Dec 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.