Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC/Scenarios/DoublinkList/doublinkgen fails occasionally in CI #6435

Closed
adityamandaleeka opened this issue Aug 2, 2016 · 18 comments
Closed
Assignees
Labels
area-GC-coreclr test-bug Problem in test source code (most likely)
Milestone

Comments

@adityamandaleeka
Copy link
Member

Example output:

GC/Scenarios/DoublinkList/doublinkgen/doublinkgen.sh
11:24:48                BEGIN EXECUTION
11:24:48                /mnt/j/workspace/dotnet_coreclr/master/checked_ubuntu_tst_prtest/bin/tests/Windows_NT.x64.Checked/Tests/coreoverlay/corerun doublinkgen.exe
11:24:48                Test should return with ExitCode 100 ...
11:24:48                999 DLinkNodes finalized
11:24:48                Test Failed
11:24:48                Expected: 100
11:24:48                Actual: 1
11:24:48                END EXECUTION - FAILED

Examples of failing jobs:

http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/checked_ubuntu_tst_prtest/3728/
http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/checked_ubuntu_tst_prtest/3747/
http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/checked_ubuntu_tst_prtest/3768/

@adityamandaleeka
Copy link
Member Author

@swgillespie Can you take a look at this please?

@swgillespie
Copy link
Contributor

swgillespie commented Aug 2, 2016

Seems the same as https://github.com/dotnet/coreclr/issues/4093 . This one hasn't shown up in a while, but it looks to be the same as when I observed it a few months ago.

It's not obvious what's happening here. I've been unable to repro this locally and it does only seem to repro on checked Ubuntu builds. Now that it's showing up again I'll see if I can get a repro.

@adityamandaleeka
Copy link
Member Author

Added another failure to the list above.

@adityamandaleeka
Copy link
Member Author

@stephentoub
Copy link
Member

stephentoub commented Nov 15, 2016

Another, this time with an abort:
https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_ubuntu_tst_prtest/82/consoleText

FAILED   - GC/Scenarios/DoublinkList/doublinkgen/doublinkgen.sh
               BEGIN EXECUTION
               /mnt/j/workspace/dotnet_coreclr/master/checked_ubuntu_tst_prtest/bin/tests/Windows_NT.x64.Checked/Tests/coreoverlay/corerun doublinkgen.exe
               Test should return with ExitCode 100 ...
               ./doublinkgen.sh: line 124: 23104 Aborted                 (core dumped) $_DebuggerFullPath "$CORE_ROOT/corerun" doublinkgen.exe $CLRTestExecutionArguments
               Expected: 100
               Actual: 134
               END EXECUTION - FAILED

@swgillespie
Copy link
Contributor

@stephentoub got a crash dump from that, thanks!

@mikedn
Copy link
Contributor

mikedn commented Nov 20, 2016

Here's another instance in case you need it: https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_ubuntu_flow_prtest/238/

@janvorli
Copy link
Member

janvorli commented Dec 5, 2016

Yet another instance:
https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_ubuntu_flow_prtest/668/
@swgillespie - this time on checked Ubuntu build, you might want to look at the crashdump:
http://dotnetrp.azurewebsites.net/dumpling/download/2168

@swgillespie
Copy link
Contributor

thanks!

@swgillespie
Copy link
Contributor

swgillespie commented Dec 6, 2016

So, what all of these dumps have in common is that, after the second induced GC after WaitForPendingFinalizers, there are objects live in the finalizer queue at the time of test failure.

In Stephen's dump:

  • DLinkNode at 0x7fb6a00335b0 is rooted by the finalizer queue:
(lldb) dumpobj 00007fb6a00335b0
Name:        DoubLink.DLinkNode
MethodTable: 00007fb7e791b860
EEClass:     00007fb7e792d048
Size:        40(0x28) bytes
File:        /mnt/j/workspace/dotnet_coreclr/master/checked_ubuntu_tst_prtest/bin/tests/Windows_NT.x64.Checked/GC/Scenarios/DoublinkList/doublinkgen/doublinkgen.exe
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007fb7e791b860  4000003        8   DoubLink.DLinkNode  0 instance 0000000000000000 Last
00007fb7e791b860  4000004       10   DoubLink.DLinkNode  0 instance 0000000000000000 Next
00007fb7e782b410  4000005       18       System.Int32[]  0 instance 00007fb39ffff048 Size
00007fb7e7702a50  4000006       3c         System.Int32  1   static              999 FinalCount
(lldb) gcroot 00007fb6a00335b0
Finalizer Queue:
    00007FB6A00335B0
    -> 00007FB6A00335B0 DoubLink.DLinkNode

Found 1 unique roots (run '!GCRoot -all' to see all roots).
  • Looking at the order of messages in the stress log, it looks like this object was marked through the finalize queue since it is the last promoted object of the final GC.
  • The final DLinkNode is not finalized by the time of the abort.

Jan's dump:

  • Two DLinkNodes are waiting to be collected, having been finalized:
(lldb) dumpheap -mt 00007f25953f8078
         Address               MT     Size
00007f21500342d0 00007f25953f8078       40
00007f2150035310 00007f25953f8078       40

Statistics:
              MT    Count    TotalSize Class Name
00007f25953f8078        2           80 DoubLink.DLinkNode
Total 2 objects
(lldb) gcroot 00007f21500342d0
Found 0 unique roots (run '!GCRoot -all' to see all roots).
(lldb) gcroot 00007f2150035310
Found 0 unique roots (run '!GCRoot -all' to see all roots).
(lldb) dumpobj 00007f2150035310
Name:        DoubLink.DLinkNode
MethodTable: 00007f25953f8078
EEClass:     00007f2595409f48
Size:        40(0x28) bytes
File:        /mnt/j/workspace/dotnet_coreclr/master/checked_ubuntu_tst_prtest/bin/tests/Windows_NT.x64.Checked/GC/Scenarios/DoublinkList/doublinkgen/doublinkgen.exe
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007f25953f8078  4000003        8   DoubLink.DLinkNode  0 instance 00007f21500342d0 Last
00007f25953f8078  4000004       10   DoubLink.DLinkNode  0 instance 0000000000000000 Next
00007f25952f49b0  4000005       18       System.Int32[]  0 instance 00007f244ffff048 Size
00007f25951bfa50  4000006       3c         System.Int32  1   static             1000 FinalCount
(lldb) dumpobj 00007f21500342d0
Name:        DoubLink.DLinkNode
MethodTable: 00007f25953f8078
EEClass:     00007f2595409f48
Size:        40(0x28) bytes
File:        /mnt/j/workspace/dotnet_coreclr/master/checked_ubuntu_tst_prtest/bin/tests/Windows_NT.x64.Checked/GC/Scenarios/DoublinkList/doublinkgen/doublinkgen.exe
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007f25953f8078  4000003        8   DoubLink.DLinkNode  0 instance 0000000000000000 Last
00007f25953f8078  4000004       10   DoubLink.DLinkNode  0 instance 0000000000000000 Next
00007f25952f49b0  4000005       18       System.Int32[]  0 instance 00007f21500342f8 Size
00007f25951bfa50  4000006       3c         System.Int32  1   static             1000 FinalCount
  • These two objects have been finalized already, but were finalized after the check that caused the test to fail. One of the nodes points to the other, while the other has two null pointers.

I'm trying to think of a feasible sequence of events that can result in things remaining in the finalizer queue after FinalizerThread::FinalizeAllObjects returns, or having things be added to the finalizer queue after FinalizerThread::FinalizeAllObjects returns but before FinalizerThread::SignalFinalizationDone is called. I'm drawing a blank, though - my understanding of the handshake between the GC and the finalizer is that (at least in this case, where the GCs are induced) is that GC.Collect signals the finalizer thread upon completion and GC.WaitForPendingFinalizers waits for FinalizerThread::SignalFinalizationDone.

@mellinoe
Copy link
Contributor

mellinoe commented Jan 3, 2017

@swgillespie
Copy link
Contributor

@AndyAyersMS
Copy link
Member

@AndyAyersMS
Copy link
Member

@janvorli
Copy link
Member

@janvorli
Copy link
Member

swgillespie referenced this issue in swgillespie/coreclr Jan 23, 2017
swgillespie referenced this issue in dotnet/coreclr Jan 24, 2017
* Disable DoublinkGen (see #6574)

* Add test to testsFailingOutsideWindows.txt
@swgillespie
Copy link
Contributor

Closing in favor of #5496, which tracks this particular issue.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr test-bug Problem in test source code (most likely)
Projects
None yet
Development

No branches or pull requests

9 participants