Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent test failures in CI on Windows.Nano.1803.Amd64 #30017

Closed
tmat opened this issue Jun 25, 2019 · 19 comments · Fixed by dotnet/corefx#39734
Closed

Frequent test failures in CI on Windows.Nano.1803.Amd64 #30017

tmat opened this issue Jun 25, 2019 · 19 comments · Fixed by dotnet/corefx#39734
Assignees
Milestone

Comments

@tmat
Copy link
Member

tmat commented Jun 25, 2019

https://dev.azure.com/dnceng/public/_build/results?buildId=237302&view=ms.vss-test-web.build-test-results-tab&runId=5884720&resultId=459291&paneView=debug

Exit code was -1073740940 but it should have been 42\r\nExpected: True\r\nActual: False
 at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose(Boolean disposing) in /_/src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor/RemoteInvokeHandle.cs:line 143
   at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose() in /_/src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor/RemoteInvokeHandle.cs:line 55
   at System.Tests.AppDomainTests.SetShadowCopyFiles() in /_/src/System.Runtime.Extensions/tests/System/AppDomainTests.cs:line 464
@ViktorHofer
Copy link
Member

I see more AppDomainTests randomly failing in CI, ie System.Tests.AppDomainTests.Unload

Exit code was -1073740940 but it should have been 42\r\nExpected: True\r\nActual: False

 at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose(Boolean disposing) in /_/src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor/RemoteInvokeHandle.cs:line 143
   at System.Tests.AppDomainTests.Unload() in /_/src/System.Runtime.Extensions/tests/System/AppDomainTests.cs:line 356

https://dev.azure.com/dnceng/public/_build/results?buildId=237626&view=ms.vss-test-web.build-test-results-tab&runId=5895846&resultId=451049&paneView=debug

@ViktorHofer
Copy link
Member

and two other failing tests: https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F38867~2Fmerge/test~2Ffunctional~2Fcli~2Finnerloop~2F/20190625.5/workItem/System.Runtime.Extensions.Tests

All failures happened on Nano. @MattGal this is one of the cases where a dump (on a docker instance) would be needed. Should we instead run tests manually on Nano instances somewhere?

@ViktorHofer ViktorHofer changed the title SetShadowCopyFiles failed in CI Frequent test failures in CI on Windows.Nano.1803.Amd64 Jun 25, 2019
@vcsjones
Copy link
Member

This issue has the same exit code dotnet/corefx#38872. I ran in to similar failures over at dotnet/corefx#38845 (comment). The exit code is C0000374, which I think is usually an indicator of the process having heap corruption.

@stephentoub
Copy link
Member

I've disabled the tests on Nano until this can be sorted out; almost every PR was failing as a result.

@ericstj
Copy link
Member

ericstj commented Jun 25, 2019

Yes this is heap corruption. When did it start? Can we trace back to a change in CoreFx or a CoreCLR ingestion?

@ViktorHofer
Copy link
Member

The first time it shows up in a rolling-build in master is https://dnceng.visualstudio.com/public/_build/results?buildId=237532 but that can't be the cause.

@stephentoub
Copy link
Member

The first time it shows up

What was the nearest coreclr ingestion prior to that?

What was the first PR it showed up in?

@ericstj
Copy link
Member

ericstj commented Jun 25, 2019

We can look in the neighborhood of that change to find culprits.

@ericstj
Copy link
Member

ericstj commented Jun 25, 2019

dotnet/corefx#38782 is the nearest CoreCLR ingestion that picks up changes between 6/20 and 6/23. Here's the diff: https://github.com/dotnet/coreclr/compare/9bd2787a9dd2aa4d2b7d4f72afebc3dbe896e896..8974a699899bdc2cc5687504e1ada606ac803e9b

@stephentoub
Copy link
Member

cc: @jkotas

@MattGal
Copy link
Member

MattGal commented Jun 25, 2019

To reiterate stuff from other threads, I think this may have started (albeit less frequently) just when moving from 1709 to 1803. I got some debugging instructions from a team member and will check them out when I have time; if I get a debugging setup I will share it.

@AriNuer
Copy link

AriNuer commented Jun 27, 2019

Test System.Tests.AppDomainTests/ProcessExit_Add_Remove has failed with same exit code:
Message :

Exit code was -1073740940 but it should have been 42
Expected: True
Actual:   False

Stack Trace :

   at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose(Boolean disposing) in /_/src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor/RemoteInvokeHandle.cs:line 143
   at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose() in /_/src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor/RemoteInvokeHandle.cs:line 55
   at System.Tests.AppDomainTests.ProcessExit_Add_Remove() in /_/src/System.Runtime.Extensions/tests/System/AppDomainTests.cs:line 213

Build: -20190626.107(Master)
Failing configuration:

@ericstj
Copy link
Member

ericstj commented Jun 28, 2019

@MattGal any update on this? We should probably re-enable Nano testing in 3.0 so keeping this in the 3.0 milestone.

@MattGal
Copy link
Member

MattGal commented Jul 1, 2019

@ericstj I haven't had free time to try debugging in nano, but there's a good chance this week will have some spare time for me to do this for you guys.

@ericstj
Copy link
Member

ericstj commented Jul 1, 2019

@ViktorHofer let me know that we should be able to repro this using docker for windows, running one of these simple EXEs in the container on the test shared framework: https://github.com/dotnet/corefx/blob/7094a5b94f4046abccfa2c1e6c9ab36243d8b3da/eng/pipelines/windows.yml#L110

@MattGal
Copy link
Member

MattGal commented Jul 1, 2019

@ericstj yes I've told many people who asked about this, I'd be happy to stop by and help you do this locally as well, you just need any RS4+ machine with Docker installed. Ping me if you'd like the publicly available debugging instructions too.

@wtgodbe
Copy link
Member

wtgodbe commented Jul 9, 2019

@safern I'm looking at this now

@danmoseley
Copy link
Member

@safern
Copy link
Member

safern commented Jul 24, 2019

@ViktorHofer put up a PR to update it 😄 dotnet/corefx#39734

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@stephentoub stephentoub removed the disabled-test The test is disabled in source code against the issue label Feb 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.