Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh failed with Assert failure(PID 1006 [0x000003ee], Thread: 1006 [0x03ee]): (GetComponentSize() <= 2) || IsArray() #33366

Closed
AriNuer opened this issue Mar 9, 2020 · 15 comments · Fixed by #34613
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI GCStress os-linux Linux OS (any supported distro)
Milestone

Comments

@AriNuer
Copy link

AriNuer commented Mar 9, 2020

Job:
runtime-coreclr gcstress0x3-gcstress0xc:20200308.1

Error message:

Assert failure(PID 1006 [0x000003ee], Thread: 1006 [0x03ee]): (GetComponentSize() <= 2) || IsArray()
File: /__w/18/s/src/coreclr/src/vm/methodtable.cpp Line: 7837
 Image: /root/helix/work/correlation/corerun

/root/helix/work/workitem/Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh: line 275: 1006 Aborted $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"

Return code: 1
Raw output file: /root/helix/work/workitem/Loader/classloader/Reports/Loader.classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.output.txt
Raw output:
BEGIN EXECUTION\n/root/helix/work/correlation/corerun Base01b_seq_ser.dll ''
Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.1006.dmp
Written 49307648 bytes (12038 pages) to core file
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/root/helix/work/correlation
> /root/helix/work/workitem/Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh
Expected: True
Actual: False

Stack trace:
at Loader_classloader._generics_Layout_General_Base01b_seq_ser_Base01b_seq_ser_._generics_Layout_General_Base01b_seq_ser_Base01b_seq_ser_sh() in /__w/17/s/artifacts/tests/coreclr/Linux.arm.Checked/TestWrappers/Loader.classloader/Loader.classloader.XUnitWrapper.cs:line 10713
Details:
https://dev.azure.com/dnceng/public/_build/results?buildId=551522&view=ms.vss-test-web.build-test-results-tab&runId=17415558&paneView=debug&resultId=102366

category:correctness
theme:testing
skill-level:expert
cost:medium

@AriNuer AriNuer added arch-arm32 os-linux Linux OS (any supported distro) GCStress labels Mar 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Infrastructure-coreclr untriaged New issue has not been triaged by the area owner labels Mar 9, 2020
@BruceForstall
Copy link
Member

@janvorli Can you triage?

@janvorli janvorli added area-GC-coreclr and removed untriaged New issue has not been triaged by the area owner labels Mar 9, 2020
@janvorli
Copy link
Member

janvorli commented Mar 9, 2020

Looks like a GC hole - my guess is that the MethodTable is corrupted.

@AriNuer
Copy link
Author

AriNuer commented Mar 30, 2020

Test Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh failed again here runtime-coreclr gcstress0x3-gcstress0xc:20200329.1
Error message:

Assert failure(PID 215 [0x000000d7], Thread: 215 [0x00d7]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
File: /__w/1/s/src/coreclr/src/vm/object.cpp Line: 557
Image: /root/helix/work/correlation/corerun\n\n/root/helix/work/workitem/Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh: line 275: 215 Aborted $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Return code: 1
Raw output file: /root/helix/work/workitem/Loader/classloader/Reports/Loader.classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.output.txt\nRaw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun Base01b_seq_ser.dll ''
Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.215.dmp\nWritten 49332224 bytes (12044 pages) to core file
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/root/helix/work/correlation
> /root/helix/work/workitem/Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh
Expected: True
Actual: False

Stack trace:

at Loader_classloader._generics_Layout_General_Base01b_seq_ser_Base01b_seq_ser_._generics_Layout_General_Base01b_seq_ser_Base01b_seq_ser_sh() in /__w/1/s/artifacts/tests/coreclr/Linux.arm.Checked/TestWrappers/Loader.classloader/Loader.classloader.XUnitWrapper.cs:line 10713

Details:
https://dev.azure.com/dnceng/public/_build/results?buildId=578587&view=ms.vss-test-web.build-test-results-tab&runId=18178846&resultId=100875&paneView=debug

@BruceForstall BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-GC-coreclr arch-arm32 labels Mar 30, 2020
@BruceForstall
Copy link
Member

Failing in GCStress=C for Linux x64/arm64/arm32:
https://dev.azure.com/dnceng/public/_build/results?buildId=578587&view=ms.vss-test-web.build-test-results-tab

for these tests:

Loader/classloader/generics/Layout/General/Base01b_seq/Base01b_seq.sh
Loader/classloader/generics/Layout/General/Base01d_seq_ser/Base01d_seq_ser.sh
Loader/classloader/generics/Layout/General/Base01c_seq_ser/Base01c_seq_ser.sh
Loader/classloader/generics/Layout/General/Base01b_seq_ser/Base01b_seq_ser.sh

@AndyAyersMS
Copy link
Member

I can repro the failure in Base01b_seq_ser.sh (0xC gc stress, no other env settings) on Linux x64.

Assert failure(PID 22990 [0x000059ce], Thread: 22990 [0x59ce]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
    File: /home/andy/repos/runtime/src/coreclr/src/vm/object.cpp Line: 557
    Image: /home/andy/repos/runtime/artifacts/tests/coreclr/Linux.x64.Checked/Tests/Core_Root/corerun

At first look this seems to go bad during a gc in Test..cctor when the stack walk looks to be incomplete. Logical stack at this point is

Test..cctor
Test.EvalBoolean
GenInt.VerifyLayout   ;; untracked "this" gc ref at rbp-8
Test.Main

From the stresslog, GC only seems to enumerate references in the topmost managed frame. There are some transition/helper frames below that so perhaps stackwalking gets thrown off there.
Note without GC stress we would normally not gc from this cctor.

Assertion failure is a bit downstream from , when VerifyLayout virtually calls string equals on a field of its this and that reference fails validation in the prestub. Presumably failure to scan the refs in VerifyLayout is the proximate cause.

Will debug the stackwalk next.

@AndyAyersMS
Copy link
Member

Also fails at the same spot with

COMPlus_JITMinOpts=1
COMPlus_GCStress=0x1
COMPlus_TieredCompilation=0

which might be a bit easier to debug.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Apr 2, 2020

More detailed logging shows the stack walk is ok, and we're reporting the this reference in GenInt.ValidateLayout to the GC, but it looks almost as if the GC is not scanning the GC refs in the object, so they become invalid at some point.

Wondering if there's something amiss in the class loader for this case. From what I can tell, ContainsPointers is not true for the GenInt class but is true for its parent class. Seems like that should never happen.

Using the minopts settings just above:

(lldb) target create "/home/andy/repos/runtime/artifacts/tests/coreclr/Linux.x64.Checked/Tests/Core_Root/corerun"
Current executable set to '/home/andy/repos/runtime/artifacts/tests/coreclr/Linux.x64.Checked/Tests/Core_Root/corerun' (x86_64).
(lldb) settings set -- target.run-args  "Base01b_seq_ser.dll"
(lldb) process launch
Process 28095 launched: '/home/andy/repos/runtime/artifacts/tests/coreclr/Linux.x64.Checked/Tests/Core_Root/corerun' (x86_64)
Process 28095 stopped
* thread #1, name = 'corerun', stop reason = signal SIGSEGV: invalid address (fault address: 0x30)
    frame #0: 0x00007ffff5f6bded libcoreclr.so`MethodTable::SanityCheck() [inlined] MethodTable::GetFlag(this=0x0000000000000030, flag=enum_flag_HasComponentSize) const at methodtable.h:3797
   3794     __forceinline DWORD GetFlag(WFLAGS_HIGH_ENUM flag) const
   3795     {
   3796         LIMITED_METHOD_DAC_CONTRACT;
-> 3797         return m_dwFlags & flag;
   3798     }
   3799     __forceinline BOOL TestFlagWithMask(WFLAGS_HIGH_ENUM mask, WFLAGS_HIGH_ENUM flag) const
   3800     {
(lldb) dso
OS Thread Id: 0x6dbf (1)
RSP/REG          Object           Name
00007FFFFFFFD590 00007fff500de3b0 System.String    string0
00007FFFFFFFD608 00007fff500de348 GenInt
00007FFFFFFFD658 00007fff500de348 GenInt
00007FFFFFFFD6B8 00007fff500de348 GenInt
00007FFFFFFFDDA8 00007fff500b34f8 System.String[]
00007FFFFFFFDE80 00007fff500b34f8 System.String[]
00007FFFFFFFDEC8 00007fff500b34f8 System.String[]
00007FFFFFFFDED8 00007fff500b34f8 System.String[]
(lldb) dumpobj 00007fff500de348
Name:        GenInt
MethodTable: 00007fff7cfa7300
EEClass:     00007fff7cfb79d0
Size:        104(0x68) bytes
File:        /home/andy/bugs/r33366/Base01b_seq_ser.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007fff7cdc90f8  4000001       28         System.Int32  1 instance                0 Fld10
00007fff7cdc90f8  4000002       2c         System.Int32  1 instance                0 _int0
00007fff7cdf0cf8  4000003       18        System.Double  1 instance 0.000000 _double0
00007fff7ce22e30  4000004        8        System.String  0 instance 00007fff500de3c8 _string0
00007fff7cf51440  4000005       40          System.Guid  1 instance 00007fff500de388 _Guid0
00007fff7cdc90f8  4000006       30         System.Int32  1 instance                0 Fld11
00007fff7cdc90f8  4000007       34         System.Int32  1 instance       2147483647 _int1
00007fff7cdf0cf8  4000008       20        System.Double  1 instance 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000 _double1
00007fff7ce22e30  4000009       10        System.String  0 instance 00007fff500de408 _string1
00007fff7cf51440  400000a       50          System.Guid  1 instance 00007fff500de398 _Guid1
00007fff7cdc90f8  400000b       38         System.Int32  1 instance                0 Fld12

// GenInt 

(lldb) exp -f hex -- *(MethodTable*)0x00007fff7cfa7300
(MethodTable) $4 = {
  m_dwFlags = 0x00000200      // ContainsPointers not set
  m_BaseSize = 0x00000068
  m_wFlags2 = 0x4089
  m_wToken = 0x0009
  m_wNumVirtuals = 0x0004
  m_wNumInterfaces = 0x0000
  debug_m_szClassName = 0x00007fff7cfa7220 "GenInt"
  m_pParentMethodTable = (m_addr = 0x00007fff7cfa7140)
  m_pLoaderModule = (m_ptr = 0x00007fff7cf1a978)
  m_pWriteableData = (m_ptr = 0x00007fff7cfa73a0)
   = {
    m_pEEClass = (m_ptr = 0x00007fff7cfb79d0)
    m_pCanonMT = (m_ptr = 0x00007fff7cfb79d0)
  }
   = {
    m_pPerInstInfo = (m_ptr = 0x00007fff7cfa7358)
    m_ElementTypeHnd = 0x00007fff7cfa7358
    m_pMultipurposeSlot1 = 0x00007fff7cfa7358
  }
   = {
    m_pInterfaceMap = (m_ptr = 0x00007fff7cd34070)
    m_pMultipurposeSlot2 = 0x00007fff7cd34070
  }
}

// GenBase`1[Int32]

(lldb) exp -f hex -- *(MethodTable*)0x00007fff7cfa7140
(MethodTable) $5 = {
  m_dwFlags = 0x01000210      // ContainsPointers set
  m_BaseSize = 0x00000068
  m_wFlags2 = 0x4089
  m_wToken = 0x0008
  m_wNumVirtuals = 0x0004
  m_wNumInterfaces = 0x0000
  debug_m_szClassName = 0x00007fff7cfb7900 "GenBase`1[Int32]"
  m_pParentMethodTable = (m_addr = 0x00007fff7cce0e48)
  m_pLoaderModule = (m_ptr = 0x00007fff7cf1a978)
  m_pWriteableData = (m_ptr = 0x00007fff7cfa71e8)
   = {
    m_pEEClass = (m_ptr = 0x00007fff7cfb7828)
    m_pCanonMT = (m_ptr = 0x00007fff7cfb7828)
  }
   = {
    m_pPerInstInfo = (m_ptr = 0x00007fff7cfa7198)
    m_ElementTypeHnd = 0x00007fff7cfa7198
    m_pMultipurposeSlot1 = 0x00007fff7cfa7198
  }
   = {
    m_pInterfaceMap = (m_ptr = 0x00007fff7cd34058)
    m_pMultipurposeSlot2 = 0x00007fff7cd34058
  }
}

@AndyAyersMS
Copy link
Member

For future reference, to dump the CDCDesc on Linux you can call a method in the debugger, but you need to have logging enabled first.

(lldb) exp -- (MethodTable*)  0x00007fff7cfb7300
(MethodTable *) $1 = 0x00007fff7cfb7300

(lldb) exp -- $1->DebugDumpGCDesc("GenInt", 0)
GC description for 'GenInt':

(lldb) exp -- (MethodTable*) 0x00007fff7cfb7140
(MethodTable *) $3 = 0x00007fff7cfb7140

(lldb) exp -- $3->DebugDumpGCDesc("GenBase`1[Int32]", 0)
TID 09c2: GC description for 'GenBase`1[Int32]':

TID 09c2: GCDesc:
TID 09c2:    offset     8 (0 w/o Object), size   -88 (   16 w/o BaseSize subtr)
TID 09c2:

If I remove the SequentialLayout attribute from GenInt and its base class, then the method table flags for GenInt show it has pointers: 0x1000200. So it would seem something about that attribute is messing up propagation of gc descriptors during class loading.

This doesn't look like a codegen issue. @jkotas can you help route this to the right person?

@davidwrighton
Copy link
Member

@AndyAyersMS This bug is a basic typesystem bug. We should send it to @fadimounir

@AndyAyersMS
Copy link
Member

Same underlying issue is there on windows but the test doesn't fail; guess we get lucky.

@fadimounir
Copy link
Contributor

I guess the GC needs to relocate the object in question in order for the process to hit the AV. I've seen cases like this before.
To make sure I understand your analysis @AndyAyersMS, we're reporting some root object, but when GC traverses the fields, it misses one of them due to a bug somewhere (either gcinfo or field layout), correct?

@AndyAyersMS
Copy link
Member

Right, the jit GC info seems correct, and GC scans the GenInt object, but apparently does not think that object contains GC refs -- while in actuality it has two string fields. So those fields eventually become bogus.

@davidwrighton
Copy link
Member

In particular, @AndyAyersMS told me offline that the ContainsPointers bit wasn't set on the MethodTable, so possibly its just a bug around ContainsPointers inheritance, or a problem with that and GCDesc creation.

@fadimounir
Copy link
Contributor

Ok thank you for the context and information you gathered so far. I will take a closer look at why the ContainsPointers bit wasn't set

@AndyAyersMS
Copy link
Member

Same details above in the LLDB spew...

(lldb) exp -f hex -- *(MethodTable*)0x00007fff7cfa7300    // GenInt 
(MethodTable) $4 = {
  m_dwFlags = 0x00000200      // ContainsPointers not set

(lldb) exp -f hex -- *(MethodTable*)0x00007fff7cfa7140    // GenInt's parent class
(MethodTable) $5 = {
  m_dwFlags = 0x01000210      // ContainsPointers set

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI GCStress os-linux Linux OS (any supported distro)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants