Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure JIT\\opt\\ObjectStackAllocation\\ObjectStackAllocationTests\\ObjectStackAllocationTests.cmd #81103

Closed
JulieLeeMSFT opened this issue Jan 24, 2023 · 26 comments · Fixed by #81192
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs
Milestone

Comments

@JulieLeeMSFT
Copy link
Member

Failed in Run: runtime-coreclr outerloop 20230124.1 and many ohter outerloop runs

Failed tests:

R2R-CG2 windows x64 Checked @ Windows.10.Amd64.Open
R2R-CG2 windows x64 Checked no_tiered_compilation @ Windows.10.Amd64.Open
R2R-CG2 windows arm64 Checked no_tiered_compilation @ Windows.11.Arm64.Open
R2R-CG2 windows arm64 Checked @ Windows.11.Arm64.Open
- JIT\\opt\\ObjectStackAllocation\\ObjectStackAllocationTests\\ObjectStackAllocationTests.cmd

Error message:

      Fatal error. Internal CLR error. (0x80131506)
      
      Return code:      1
      Raw output file:      C:\h\w\950D07E7\w\B73309EE\uploads\Reports\JIT.opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.output.txt
      Raw output:
      BEGIN EXECUTION
      ObjectStackAllocationTests.dll
              1 file(s) copied.
      11:01:24.34
      Response file: C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.dll.rsp
      C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\IL-CG2\ObjectStackAllocationTests.dll
      -o:C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.dll
      --targetarch:arm64
      --targetos:windows
      --verify-type-and-field-layout
      --method-layout:random
      -r:C:\h\w\950D07E7\p\System.*.dll
      -r:C:\h\w\950D07E7\p\Microsoft.*.dll
      -r:C:\h\w\950D07E7\p\mscorlib.dll
      -r:C:\h\w\950D07E7\p\netstandard.dll
      -O
      " "dotnet" "C:\h\w\950D07E7\p\crossgen2\crossgen2.dll" @"C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.dll.rsp"  --codegenopt JitObjectStackAllocation=1 -r:C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\IL-CG2\*.dll"
      11:01:25.66
      Crossgen2 failed with exitcode - -1073741819

Stack trace:

           at JIT_opt._ObjectStackAllocation_ObjectStackAllocationTests_ObjectStackAllocationTests_._ObjectStackAllocation_ObjectStackAllocationTests_ObjectStackAllocationTests_cmd()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
           at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
@JulieLeeMSFT JulieLeeMSFT added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs labels Jan 24, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Jan 24, 2023
@ghost
Copy link

ghost commented Jan 24, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Failed in Run: runtime-coreclr outerloop 20230124.1 and many ohter outerloop runs

Failed tests:

R2R-CG2 windows x64 Checked @ Windows.10.Amd64.Open
R2R-CG2 windows x64 Checked no_tiered_compilation @ Windows.10.Amd64.Open
R2R-CG2 windows arm64 Checked no_tiered_compilation @ Windows.11.Arm64.Open
R2R-CG2 windows arm64 Checked @ Windows.11.Arm64.Open
- JIT\\opt\\ObjectStackAllocation\\ObjectStackAllocationTests\\ObjectStackAllocationTests.cmd

Error message:

      Fatal error. Internal CLR error. (0x80131506)
      
      Return code:      1
      Raw output file:      C:\h\w\950D07E7\w\B73309EE\uploads\Reports\JIT.opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.output.txt
      Raw output:
      BEGIN EXECUTION
      ObjectStackAllocationTests.dll
              1 file(s) copied.
      11:01:24.34
      Response file: C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.dll.rsp
      C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\IL-CG2\ObjectStackAllocationTests.dll
      -o:C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.dll
      --targetarch:arm64
      --targetos:windows
      --verify-type-and-field-layout
      --method-layout:random
      -r:C:\h\w\950D07E7\p\System.*.dll
      -r:C:\h\w\950D07E7\p\Microsoft.*.dll
      -r:C:\h\w\950D07E7\p\mscorlib.dll
      -r:C:\h\w\950D07E7\p\netstandard.dll
      -O
      " "dotnet" "C:\h\w\950D07E7\p\crossgen2\crossgen2.dll" @"C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\ObjectStackAllocationTests.dll.rsp"  --codegenopt JitObjectStackAllocation=1 -r:C:\h\w\950D07E7\w\B73309EE\e\JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests\IL-CG2\*.dll"
      11:01:25.66
      Crossgen2 failed with exitcode - -1073741819

Stack trace:

           at JIT_opt._ObjectStackAllocation_ObjectStackAllocationTests_ObjectStackAllocationTests_._ObjectStackAllocation_ObjectStackAllocationTests_ObjectStackAllocationTests_cmd()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
           at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Author: JulieLeeMSFT
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr, blocking-outerloop

Milestone: 8.0.0

@JulieLeeMSFT
Copy link
Member Author

@AndyAyersMS, it is blocking outerloop for many days. Please take a look.

@AndyAyersMS
Copy link
Member

Looks like this started around 1/12:

image

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jan 25, 2023

Failure history above seems to implicate this commit range but nothing there looks all that likely:

[Note we have since determined the issue is with the crossgen 2 clr host, which is using older bits]

2038514 Fix Configuration to ensure calling the property setters. (#80438)
00596e5 Adding the Vector512 and Vector512 types (#76642)
5a48c62 Invalidate buffered data when bypassing the FileStream cache (#80432)
59649c3 Get entry assembly name from host for GetProcessInfo2 command during suspension (#80301)
8960ab9 Fix small buffer handling with the Platform Crypto Provider
1d0a69f Fix timeout in regex source generated test (#80459)
9084883 Fix NetFramework name in Directory.Build.targets (#80440)
b0eee49 Address feedback from #79828. (#80509)
3f03037 Fix build invocation on Windows without os param (#80544)
a40d411 [main] Update dependencies from dotnet/arcade (#80234)
25e84d3 [mono] Fixed getting namespace names of array types. (#80426)
d4a59b3 Remove two string.Replace calls from CreditCardAttribute.IsValid (#80523)
d87f7b2 (upstream/dnceng-wasm-test-reporting-experiment) Disable ExposedLocalsNumbering (#80522)

@AndyAyersMS
Copy link
Member

Can't repro this locally or with runfo bits. From a recent core dump it looks like it is a crash in the dotnet used to run crossgen2, which is

8.0.0-alpha.1.23058.2\coreclr.dll from hash 5da4a9e

I can't find symbols for this yet...

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jan 25, 2023

Ok, found them (thanks dotnet-symbols). This seems to be crashing in the runtime.

0:000> ~kb
 # RetAddr               : Args to Child                                                           : Call Site
00 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!FixupPrecode::Init+0xf [D:\a\_work\1\s\src\coreclr\vm\precode.cpp @ 676] 
01 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!Precode::Init+0x26 [D:\a\_work\1\s\src\coreclr\vm\precode.cpp @ 284] 
02 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!Precode::AllocateTemporaryEntryPoints+0x6bc [D:\a\_work\1\s\src\coreclr\vm\precode.cpp @ 482] 
03 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!MethodDescChunk::CreateTemporaryEntryPoints+0x6bc [D:\a\_work\1\s\src\coreclr\vm\method.cpp @ 2945] 
04 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!MethodDescChunk::EnsureTemporaryEntryPointsCreated+0x6bc [D:\a\_work\1\s\src\coreclr\vm\method.hpp @ 2201] 
05 00007ff9`44dfb0da     : 00000000`000007f8 00000000`000007f8 00000000`000007f8 0000002b`2697ace0 : coreclr!MethodTableBuilder::SetupMethodTable2+0xe1e [D:\a\_work\1\s\src\coreclr\vm\methodtablebuilder.cpp @ 10660] 
06 00007ff9`44d87a53     : 00000000`00000000 0000014c`a3853580 00007ff9`4519bed8 0000014c`a37d5210 : coreclr!MethodTableBuilder::BuildMethodTableThrowing+0x2e5a [D:\a\_work\1\s\src\coreclr\vm\methodtablebuilder.cpp @ 1769] 
07 00007ff9`44d4816d     : 00000000`00000000 00000000`00000005 00000000`00000004 00000000`00000000 : coreclr!ClassLoader::CreateTypeHandleForTypeDefThrowing+0x1413 [D:\a\_work\1\s\src\coreclr\vm\methodtablebuilder.cpp @ 12419] 
08 00007ff9`44d490ff     : 00000000`0202b50f 00000000`00000018 00000000`00000000 00000000`00000000 : coreclr!ClassLoader::CreateTypeHandleForTypeKey+0x195 [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 2943] 
09 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!ClassLoader::DoIncrementalLoad+0x47 [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 2883] 
0a 00007ff9`44d485cb     : 00007ff8`00000012 0000014c`a380d800 00007ff8`e51b9cf8 00000000`00000000 : coreclr!ClassLoader::LoadTypeHandleForTypeKey_Body+0x50f [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 3559] 
0b 00007ff9`44d5caff     : 0000002b`2697e6d8 0000014c`00000000 ffffffff`00000000 00007ff8`e51b9cf8 : coreclr!ClassLoader::LoadTypeHandleForTypeKey+0xdb [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 3278] 
0c 00007ff9`44d58d65     : 00007ff9`4519bee0 00000000`00000000 00000000`00000190 00000000`00000000 : coreclr!ClassLoader::LoadTypeDefThrowing+0x1ef [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 2258] 
0d (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!TypeHandle::IsNull+0x30 [D:\a\_work\1\s\src\coreclr\vm\typehandle.h @ 179] 
0e 00007ff9`44dba0bf     : 0000014c`00000001 00007ff8`020000aa 0000002b`2697e800 00000000`00000006 : coreclr!ClassLoader::LoadTypeHandleThrowing+0x23d [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 1573] 
0f 00007ff9`44dba080     : 0000002b`2697e880 00000000`00000031 00000000`00000000 00007ff8`e51b9530 : coreclr!ClassLoader::LoadTypeHandleThrowIfFailed+0x27 [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 366] 
10 00007ff9`44dba029     : 00000009`00000000 00000009`00000009 00007ff9`00000011 00007ff9`450f2368 : coreclr!ClassLoader::LoadTypeByNameThrowing+0x2c [D:\a\_work\1\s\src\coreclr\vm\clsload.cpp @ 328] 
11 00007ff9`44db97e5     : 00000000`00000000 00007ff8`e51b4000 00000000`00000008 00000000`00000008 : coreclr!CoreLibBinder::LookupClassLocal+0x99 [D:\a\_work\1\s\src\coreclr\vm\binder.cpp @ 71] 
12 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : coreclr!CoreLibBinder::GetClass+0x2d [D:\a\_work\1\s\src\coreclr\vm\binder.h @ 341] 
13 00007ff9`44e14254     : 00007ff9`4519c648 00007ff9`4519c648 00007ff9`4519ba30 0000002b`2697ee58 : coreclr!SystemDomain::LoadBaseSystemClasses+0x29d [D:\a\_work\1\s\src\coreclr\vm\appdomain.cpp @ 1302] 
14 00007ff9`44e1037a     : 00000000`00000001 00000000`00000051 00000000`00000000 00000000`00000001 : coreclr!SystemDomain::Init+0x154 [D:\a\_work\1\s\src\coreclr\vm\appdomain.cpp @ 1151] 
15 00007ff9`44e65d23     : 00000000`00000001 0000014c`a380f790 0000014c`a380edd0 0000014c`a3811701 : coreclr!EEStartupHelper+0x95a [D:\a\_work\1\s\src\coreclr\vm\ceemain.cpp @ 926] 
16 00007ff9`44e65cca     : 0000002b`26970063 0000002b`26970064 00000000`00000001 00000000`00000000 : coreclr!EEStartup+0x27 [D:\a\_work\1\s\src\coreclr\vm\ceemain.cpp @ 1059] 
17 00007ff9`44e65c08     : 0000014c`a3780490 00007ff9`44e10d5f 00000000`00801003 00000000`00000000 : coreclr!EnsureEEStarted+0x92 [D:\a\_work\1\s\src\coreclr\vm\ceemain.cpp @ 304] 
18 00007ff9`44de174e     : 00000000`00000000 00000000`00000001 00007ff9`45a9c220 0000002b`2697f290 : coreclr!CorHost2::Start+0x58 [D:\a\_work\1\s\src\coreclr\vm\corhost.cpp @ 101] 
19 00007ff9`45a535c0     : 0000014c`a378b780 00007ff9`45a9c220 0000014c`a378b780 00007ff9`45a9c220 : coreclr!coreclr_initialize+0x19e [D:\a\_work\1\s\src\coreclr\dlls\mscoree\exports.cpp @ 296] 
1a 00007ff9`45a716d8     : 00007ff9`45a9c230 0000014c`a3793a90 0000014c`a37e9070 0000014c`a378b590 : hostpolicy!coreclr_t::create+0x2b0 [D:\a\_work\1\s\src\native\corehost\hostpolicy\coreclr.cpp @ 73] 
1b 00007ff9`45a73427     : 00007ff9`45aac940 00000000`00000000 0000014c`a377f8f0 0000014c`a377f8f0 : hostpolicy!`anonymous namespace'::create_coreclr+0x158 [D:\a\_work\1\s\src\native\corehost\hostpolicy\hostpolicy.cpp @ 82] 
1c 00007ff9`45e3b43c     : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : hostpolicy!corehost_main+0x187 [D:\a\_work\1\s\src\native\corehost\hostpolicy\hostpolicy.cpp @ 427] 
1d 00007ff9`45e3dfd6     : 0000014c`a3788d40 0000002b`2697f9d0 00000000`00000000 00000000`00000000 : hostfxr!execute_app+0x2ac [D:\a\_work\1\s\src\native\corehost\fxr\fx_muxer.cpp @ 145] 
1e 00007ff9`45e401a6     : 00007ff9`45e77818 0000014c`a3785470 0000002b`2697f910 0000002b`2697f8c0 : hostfxr!`anonymous namespace'::read_config_and_execute+0xa6 [D:\a\_work\1\s\src\native\corehost\fxr\fx_muxer.cpp @ 532] 
1f 00007ff9`45e3e5b4     : 0000002b`2697f9d0 0000002b`2697f9f0 0000002b`2697f941 0000014c`a3770310 : hostfxr!fx_muxer_t::handle_exec_host_command+0x166 [D:\a\_work\1\s\src\native\corehost\fxr\fx_muxer.cpp @ 1017] 
20 00007ff9`45e38503     : 0000002b`2697f9f0 0000014c`a37853d0 00000000`00000006 00000000`000000cb : hostfxr!fx_muxer_t::execute+0x494 [D:\a\_work\1\s\src\native\corehost\fxr\fx_muxer.cpp @ 578] 
21 00007ff7`38ace792     : 00007ff9`4d6e44b8 00007ff9`45e399d0 0000002b`2697fb90 0000014c`a37825a0 : hostfxr!hostfxr_main_startupinfo+0xb3 [D:\a\_work\1\s\src\native\corehost\fxr\hostfxr.cpp @ 61] 
22 00007ff7`38aceb73     : 0000014c`a377f8f0 00000000`00000000 0000014c`a377f8f0 00000000`00000000 : dotnet!exe_start+0x842 [D:\a\_work\1\s\src\native\corehost\corehost.cpp @ 251] 
23 00007ff7`38ad00b8     : 00000000`00000000 00000000`00000000 0000014c`a377f8f0 00000000`00000000 : dotnet!wmain+0x83 [D:\a\_work\1\s\src\native\corehost\corehost.cpp @ 322] 
24 (Inline Function)     : --------`-------- --------`-------- --------`-------- --------`-------- : dotnet!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 
25 00007ff9`4e8584d4     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : dotnet!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
26 00007ff9`50dd1791     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
27 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

@mangod9 @janvorli does this seem at all familiar? I can look at a few more dumps to see if they're all consistent.

@janvorli
Copy link
Member

I am looking into it.

@AndyAyersMS
Copy link
Member

Added more stack trace to the above -- this is happening right at startup:

        // Load System.RuntimeType
        g_pRuntimeTypeClass = CoreLibBinder::GetClass(CLASS__CLASS);

@janvorli
Copy link
Member

janvorli commented Jan 25, 2023

There is something weird going on in this loop:

TADDR entryPoint = temporaryEntryPoints;
for (int i = 0; i < count; i++)
{
((Precode *)entryPoint)->Init((Precode *)entryPoint, t, pMD, pLoaderAllocator);
_ASSERTE((Precode *)entryPoint == GetPrecodeForTemporaryEntryPoint(temporaryEntryPoints, i));
entryPoint += oneSize;
pMD = (MethodDesc *)(dac_cast<TADDR>(pMD) + pMD->SizeOf());
}

The temporaryEntryPoints is 0x00007ffd86116000, the count should be 0xaa (that I have dug out in the coreclr!MethodTableBuilder::SetupMethodTable2 frame from the pChunk) and the oneSize is 0x18. In the FixupPrecode::Init, we are accessing the data of those entry points that are one page after that address, so they start at 0x00007ffd86117000. That means that at the end of the loop, we should reach 0x00007ffd86117ff0. But we are crashing since we went to 0x00007ffd86118000, which is the next page and it is not mapped (the fact that it is not mapped is expected).

@janvorli
Copy link
Member

The issue is not in the source code tree state that is being tested but rather in the source code tree state of the runtime used by crossgen2, which is the 8.0.0-alpha.1.23058.2\coreclr.dll from hash 5da4a9e as Andy has mentioned.

@AndyAyersMS
Copy link
Member

the next page and it is not mapped

Not sure if this is what you meant -- in the dump I'm looking at the faulting address is mapped/reserved but not committed:

0:000> !vprot 0x00007ff8e52e8000
BaseAddress:       00007ff8e52e8000
AllocationBase:    00007ff8e52e0000
AllocationProtect: 00000040  PAGE_EXECUTE_READWRITE
RegionSize:        0000000000008000
State:             00002000  MEM_RESERVE
Type:              00040000  MEM_MAPPED

From what I can see all the failures are on windows, mostly x64 but some on arm64. Maybe some kind of off by one setting up initial allocations that (for reasons unknown) only hits this particular test? Seems quite odd because crossgen2 has to run hundreds of times during this test pass, and only this invocation fails, and (I suspect) doesn't always fail.

@janvorli
Copy link
Member

Yes, the page should be reserved, but not committed. When we allocate something from the stub heap, we always commit two pages. One RX followed by one RW. The RX holds the stubs code, the RW their data. The data is always at the address of the code + page_size. Here we go over the end of those two pages while we should not. It is also not clear to my why it fails with just one test yet, especially at the init time where nothing app specific is loaded yet.

@janvorli
Copy link
Member

@AndyAyersMS this specific test run actually differs from others - the --codegenopt JitObjectStackAllocation=1 is passed to crossgen2. So that could be the trigger.

@janvorli
Copy link
Member

Although, that doesn't make sense, as the option would be processed only after the crossgen2 code was executed and the crash happens during coreclr initialization.

@jakobbotsch
Copy link
Member

The test sets several DOTNET_ environment variables:

<CLRTestEnvironmentVariable Include="DOTNET_TieredCompilation" Value="0" />
<CLRTestEnvironmentVariable Include="DOTNET_ProfApi_RejitOnAttach" Value="0" />
<CLRTestEnvironmentVariable Include="DOTNET_JITMinOpts" Value="0" />
<CLRTestEnvironmentVariable Include="DOTNET_JitNoForceFallback" Value="1" />
<CLRTestEnvironmentVariable Include="DOTNET_JitDebuggable" Value="0" />
<CLRTestEnvironmentVariable Include="DOTNET_JitStressModeNamesNot" Value="STRESS_RANDOM_INLINE,STRESS_MIN_OPTS" />
<CLRTestEnvironmentVariable Include="DOTNET_JitObjectStackAllocation" Value="1" />

Maybe these are being set for the crossgen2 invocation as part of the test wrapper?

@janvorli
Copy link
Member

Ah, so these would likely be set for the crossgen2 execution too.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jan 25, 2023

Ah right -- thanks Jakob. With those set I can repro. Problem seems to be with the pair

DOTNET_TieredCompilation=0
DOTNET_ProfApi_RejitOnAttach=0

@janvorli
Copy link
Member

Thanks @AndyAyersMS and @jakobbotsch, let me try to debug it live.

@AndyAyersMS
Copy link
Member

With just those two set you can't launch crossgen2 at all (fails with no command line args)

set DOTNET_TieredCompilation=0
set DOTNET_ProfApi_RejitOnAttach=0

C:\repos\runtime4\.dotnet\dotnet.exe c:\repos\runtime4\artifacts\tests\coreclr\Windows.x64.checked\\tests\Core_Root\crossgen2\crossgen2.dll
Fatal error. Internal CLR error. (0x80131506)

@janvorli
Copy link
Member

I understand what's going on. The MethodTableBuilder::AllocAndInitMethodDescChunk creates a chunk of size that's not limited by the maximum possible, which is number of precodes that can fit to one page of the precode heap. There was another location (MethodDescChunk::CreateChunk) where chunk is created and where I've applied the limit when creating the new form of stub heaps, but I've missed the fact there is one more.

janvorli added a commit to janvorli/runtime that referenced this issue Jan 25, 2023
When the new stub heaps were implemented, the chunk size needed to be
limited so that all precodes for a chunk fit into a single memory page.
That was done in `MethodDescChunk::CreateChunk`. But it was discovered
now that there is another place where it needs to be limited, the
`MethodTableBuilder::AllocAndInitMethodDescChunk`. The
JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail
in the outerloop due to too long chunk. The failure happens in crossgen2
as it is using a separate build of runtime in release and only that
uncovers the problem. And only when DOTNET_TieredCompilation=0 and
DOTNET_ProfApi_RejitOnAttach=0.

This change fixes the problem. With this change applied to the dotnet
runtime version used by the crossgen2 and patching it in place in the
.dotnet/shared/.... directory, the issue doesn't occur.

Close dotnet#81103
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 25, 2023
@AndyAyersMS
Copy link
Member

@janvorli thanks for getting to the bottom of this one.

@AndyAyersMS
Copy link
Member

Seems like we'll also need to disable this test and wait for a dotnet update to re-enable, right...? Fixing this in main won't completely resolve the issue as crossgen2 will still be using an older runtime.

@janvorli
Copy link
Member

Ah, that's right.

janvorli added a commit to janvorli/runtime that referenced this issue Jan 25, 2023
Disable the test until a dotnet with a fix of the issue dotnet#81103 is used
in the runtime repo to build / run tests.
@MichalStrehovsky
Copy link
Member

Alternative fix to just disabling the test would be to add the problematic environment variables to the adhoc list crossgen2 testing already suppresses:

REM Suppress some DOTNET and COMPlus variables for the duration of Crossgen2 execution
setlocal
set "DOTNET_GCName="
set "DOTNET_GCStress="
set "DOTNET_HeapVerify="
set "DOTNET_ReadyToRun="

@janvorli
Copy link
Member

That's something that we should probably do too, it doesn't seem to make sense to run crossgen with the settings that are meant for the actual test run. But let's keep it as is for now, it will allow us to double check that the issue is gone after my fix propagates to the dotnet runtime used for build / crossgen execution.

@MichalStrehovsky
Copy link
Member

That's something that we should probably do too, it doesn't seem to make sense to run crossgen with the settings that are meant for the actual test run. But let's keep it as is for now, it will allow us to double check that the issue is gone after my fix propagates to the dotnet runtime used for build / crossgen execution.

I don't know for how long we're going to run crossgen2 on top of the JIT-based CoreCLR in testing. There have been thoughts to use the shipping configuration, which is crossgen2 compiled with NativeAOT. So we're going to lose this as a "regression test" eventually.

jkotas pushed a commit that referenced this issue Jan 26, 2023
When the new stub heaps were implemented, the chunk size needed to be
limited so that all precodes for a chunk fit into a single memory page.
That was done in `MethodDescChunk::CreateChunk`. But it was discovered
now that there is another place where it needs to be limited, the
`MethodTableBuilder::AllocAndInitMethodDescChunk`. The
JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail
in the outerloop due to too long chunk. The failure happens in crossgen2
as it is using a separate build of runtime in release and only that
uncovers the problem. And only when DOTNET_TieredCompilation=0 and
DOTNET_ProfApi_RejitOnAttach=0.

This change fixes the problem. With this change applied to the dotnet
runtime version used by the crossgen2 and patching it in place in the
.dotnet/shared/.... directory, the issue doesn't occur.

Close #81103
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 26, 2023
janvorli added a commit that referenced this issue Jan 26, 2023
…#81195)

* Disable the /JIT/opt/ObjectStackAllocation/ObjectStackAllocationTests

Disable the test until a dotnet with a fix of the issue #81103 is used
in the runtime repo to build / run tests.

* Moving the disabling to crossgen2 specific section
github-actions bot pushed a commit that referenced this issue Feb 3, 2023
When the new stub heaps were implemented, the chunk size needed to be
limited so that all precodes for a chunk fit into a single memory page.
That was done in `MethodDescChunk::CreateChunk`. But it was discovered
now that there is another place where it needs to be limited, the
`MethodTableBuilder::AllocAndInitMethodDescChunk`. The
JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail
in the outerloop due to too long chunk. The failure happens in crossgen2
as it is using a separate build of runtime in release and only that
uncovers the problem. And only when DOTNET_TieredCompilation=0 and
DOTNET_ProfApi_RejitOnAttach=0.

This change fixes the problem. With this change applied to the dotnet
runtime version used by the crossgen2 and patching it in place in the
.dotnet/shared/.... directory, the issue doesn't occur.

Close #81103
carlossanlop pushed a commit that referenced this issue Feb 10, 2023
When the new stub heaps were implemented, the chunk size needed to be
limited so that all precodes for a chunk fit into a single memory page.
That was done in `MethodDescChunk::CreateChunk`. But it was discovered
now that there is another place where it needs to be limited, the
`MethodTableBuilder::AllocAndInitMethodDescChunk`. The
JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail
in the outerloop due to too long chunk. The failure happens in crossgen2
as it is using a separate build of runtime in release and only that
uncovers the problem. And only when DOTNET_TieredCompilation=0 and
DOTNET_ProfApi_RejitOnAttach=0.

This change fixes the problem. With this change applied to the dotnet
runtime version used by the crossgen2 and patching it in place in the
.dotnet/shared/.... directory, the issue doesn't occur.

Close #81103
@ghost ghost locked as resolved and limited conversation to collaborators Feb 25, 2023
carlossanlop pushed a commit that referenced this issue Oct 12, 2023
When the new stub heaps were implemented, the chunk size needed to be
limited so that all precodes for a chunk fit into a single memory page.
That was done in `MethodDescChunk::CreateChunk`. But it was discovered
now that there is another place where it needs to be limited, the
`MethodTableBuilder::AllocAndInitMethodDescChunk`. The
JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail
in the outerloop due to too long chunk. The failure happens in crossgen2
as it is using a separate build of runtime in release and only that
uncovers the problem. And only when DOTNET_TieredCompilation=0 and
DOTNET_ProfApi_RejitOnAttach=0.

This change fixes the problem. With this change applied to the dotnet
runtime version used by the crossgen2 and patching it in place in the
.dotnet/shared/.... directory, the issue doesn't occur.

Close #81103

Co-authored-by: Jan Vorlicek <janvorli@microsoft.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants