Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux/arm64] Test Failed: System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide Failure #11177

Closed
echesakov opened this issue Oct 2, 2018 · 36 comments
Assignees
Labels
arch-arm64 area-PAL-coreclr area-System.Threading disabled-test The test is disabled in source code against the issue os-linux Linux OS (any supported distro)
Milestone

Comments

@echesakov
Copy link
Contributor

/home/robox/echesako/git/coreclr/_/fx/bin/testhost/netcoreapp-Linux-Release-arm64/dotnet xunit.console.dll System.Threading.Tests.dll  -xml testResults.xml -notrait category=nonnetcoreapptests -notrait category=nonlinuxtests  -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing
===========================================================================================================
xUnit.net Console Runner v2.4.1-pre.build.4059 (64-bit .NET Core 4.6.26927.0)
  Discovering: System.Threading.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Threading.Tests (found 178 of 206 test cases)
  Starting:    System.Threading.Tests (parallel test collections = on, max threads = 46)
    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999999
  Finished:    System.Threading.Tests
=== TEST EXECUTION SUMMARY ===
   System.Threading.Tests  Total: 396, Errors: 0, Failed: 1, Skipped: 0, Time: 44.019s

This took me a while to reproduce - but I got at least five exact failures.

@jkotas
Copy link
Member

jkotas commented Oct 2, 2018

This is most likely caused by FlushProcessWriteBuffers in PAL not being reliable. We may want to try to fix this by implementing https://github.com/dotnet/coreclr/issues/1563.

@danmoseley
Copy link
Member

kouvel referenced this issue in kouvel/corefx Oct 30, 2018
- Temporarily disabling on arm64 due to https://github.com/dotnet/coreclr/issues/20215
- The issue may exist on other architectures, but we have only seen failures on arm64 so far
stephentoub referenced this issue in dotnet/corefx Oct 31, 2018
* Disable failing test InterlockedTests.MemoryBarrierProcessWide

- Temporarily disabling on arm64 due to https://github.com/dotnet/coreclr/issues/20215
- The issue may exist on other architectures, but we have only seen failures on arm64 so far

* Move property to PlatformDetection

* Fix
@sandreenko
Copy link
Contributor

The issue was marked as fixed in dotnet/coreclr#20949.

VSadov referenced this issue in dotnet/corefx Jun 9, 2019
* Enable test for MemoryBarrierProcessWide on ARM64 since corresponding bug is closed.

Re:https://github.com/dotnet/coreclr/issues/20215
@AriNuer
Copy link

AriNuer commented Jun 10, 2019

Same test System.Threading.Tests.InterlockedTests/MemoryBarrierProcessWide has failed.

Message :

Assert.Equal() Failure
Expected: 1000000
Actual:   999999

Stack Trace :

 at System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide() in /_/src/System.Threading/tests/InterlockedTests.netcoreapp.cs:line 122

Details:
https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fheads~2Fmaster/test~2Ffunctional~2Fcli~2Finnerloop~2F/20190609.21/workItem/System.Threading.Tests/analysis/xunit/System.Threading.Tests.InterlockedTests~2FMemoryBarrierProcessWide

@stephentoub stephentoub reopened this Jun 10, 2019
@danmoseley
Copy link
Member

When fixed please un-disable test in linked https://github.com/dotnet/corefx/issues/38400

@sergiy-k
Copy link
Contributor

sergiy-k commented Jul 3, 2019

This requires Linux kernel 4.14 or later to run.

@sergiy-k
Copy link
Contributor

sergiy-k commented Jul 3, 2019

Our lab machines need to be updated to have this version of Linux kernel before we can enable this test.

@VSadov
Copy link
Member

VSadov commented Jul 3, 2019

Just to mention - the test is actually enabled. It is only disabled for the Linux/ARM64 config.

@VincentBu
Copy link

Test System.Threading.Tests.InterlockedTests/MemoryBarrierProcessWide failed again in:
https://mc.dot.net/#/user/coreclr-corefx/ci~2Fdotnet~2Fcoreclr~2Frefs~2Fheads~2Fmaster/test~2Ffunctional~2Fcorefx~2F/20190723.1/workItem/System.Threading.Tests/analysis/xunit/System.Threading.Tests.InterlockedTests~2FMemoryBarrierProcessWide

message:

Assert.Equal() Failure
Expected: 1000000
Actual:   999999

Stack Trace:

at System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide() in /_/src/System.Threading/tests/InterlockedTests.netcoreapp.cs:line 122

@VincentBu
Copy link

Same test failed in pipeline coreclr-corefx-jitstressregs

Log:

System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999999
      Stack Trace:
        /_/src/System.Threading/tests/InterlockedTests.netcoreapp.cs(122,0): at System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide()

Details:
https://helix.dot.net/api/2019-06-17/jobs/ee569cda-26bb-44b1-b0f0-0df5bd9942e0/workitems/System.Threading.Tests/console

@jeffschwMSFT jeffschwMSFT changed the title [Linux/arm64] System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide Failure [Linux/arm64] Test Failed: System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide Failure Aug 22, 2019
@gkarabin
Copy link

gkarabin commented Nov 3, 2019

Apologies for the hijack, but can anyone on this thread help me to identify the kernel patches in 4.14 that are required for resolving at least this issue? I am developing a .NET core 3.0 app on ARM64 kernel 4.9. It will be a while before my device can update to 4.14; but backports of specific patch sets may be an option for me. After reviewing the 4.14 changelog I don’t see any obvious ARM64 patches that point to this issue. I would appreciate any advice that can be offered.

@janvorli
Copy link
Member

janvorli commented Nov 6, 2019

@gkarabin the change was not ARM64 specific. See the related commit here: torvalds/linux@22e4ebb

@gkarabin
Copy link

gkarabin commented Nov 6, 2019

Thanks! I appreciate the pointer.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@v-haren
Copy link

v-haren commented May 7, 2020

failed again in job: runtime 20200506.84

failed test: System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide

Error message
Assert.Equal() Failure
Expected: 1000000
Actual: 999999

Stack trace
at System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide() in /_/src/libraries/System.Threading/tests/InterlockedTests.cs:line 326

@VSadov
Copy link
Member

VSadov commented May 7, 2020

This should no longer be failing.
Assuming the reason for the failure was outdated Linux kernel.

@BruceForstall
Copy link
Member

@VSadov This is failing in our CI in every run. So if the reason is an "outdated" Linux kernel, that doesn't really matter, if our CI machines are running such a kernel.

@VSadov
Copy link
Member

VSadov commented May 7, 2020

I thought the runs use new enough kernel now. It looked so from the logs.

@VSadov
Copy link
Member

VSadov commented May 7, 2020

Reverting- #35974

@VSadov
Copy link
Member

VSadov commented May 7, 2020

Was the failure actually on Linux?

@BruceForstall
Copy link
Member

Ah, right, looks like Windows: net5.0-Windows_NT-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open

@VSadov
Copy link
Member

VSadov commented May 7, 2020

Are other failures on Windows too?
I wonder if there could be a testcase issue, since Windows implementation is supposed to be reliable.

@BruceForstall
Copy link
Member

Here's another failure, on mono:

net5.0-Windows_NT-Release-x64-Mono_release-Windows.81.Amd64.Open

https://dev.azure.com/dnceng/public/_build/results?buildId=634699&view=ms.vss-test-web.build-test-results-tab&runId=19761034&paneView=debug

@BruceForstall
Copy link
Member

The arm64 one: net5.0-Windows_NT-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open

https://dev.azure.com/dnceng/public/_build/results?buildId=634482&view=ms.vss-test-web.build-test-results-tab&runId=19756224&paneView=debug

I take back what I said: it doesn't look like the failure is consistent in every run: it has failed on different platforms/architectures.

@BruceForstall
Copy link
Member

Wow, I really need some coffee or something -- those links I gave are for a whole separate test failure :-(

@BruceForstall
Copy link
Member

However, the issue opened by v-haren is this test, for Windows arm64.

@VSadov
Copy link
Member

VSadov commented May 7, 2020

Yes, the original failure is real, and on Win10, which is a concern

@VSadov
Copy link
Member

VSadov commented May 7, 2020

I wonder if there are more failures to see if there is some pattern.
These kinds of failures often do not repro when you want them, so any extra information would help to narrow the search.

@BruceForstall
Copy link
Member

Ok, looking at the Kusto data, this test failed only twice in the last 30 days, both on Windows arm64, both this morning.

@BruceForstall
Copy link
Member

@VSadov I've been working to enable JIT stress testing of libraries test assets with checked CoreCLR. This means many, many more runs of the tests, with various stress modes, and also with a checked CoreCLR (which probably changes the timing compared to most runs). I've seen this failure several times. Multiple times last night for Windows arm64:

https://helix.dot.net/api/2019-06-17/jobs/fab27305-28ce-45fe-bb93-668d48d26279/workitems/System.Threading.Tests/console
with:

COMPlus_ReadyToRun=0
COMPlus_TieredCompilation=0
COMPlus_ZapDisable=1

    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999999

https://helix.dot.net/api/2019-06-17/jobs/7546e93d-1c70-495f-aab4-157f47307c71/workitems/System.Threading.Tests/console
with:

COMPlus_JITMinOpts=1
COMPlus_TieredCompilation=0

    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999978

https://helix.dot.net/api/2019-06-17/jobs/daf295dc-bd71-4b8b-941f-09ff6b6b4775/workitems/System.Threading.Tests/console
with:

COMPlus_JitStress=2
COMPlus_JitStressRegs=0x1000
COMPlus_TieredCompilation=0

    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999993

https://helix.dot.net/api/2019-06-17/jobs/1b78b81e-6003-4a99-85b7-f50485901120/workitems/System.Threading.Tests/console

COMPlus_JitStress=2
COMPlus_JitStressRegs=0x80
COMPlus_TieredCompilation=0

    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999997

https://helix.dot.net/api/2019-06-17/jobs/ac761b83-2f06-4ca4-b1ae-4d0fcc72e5dd/workitems/System.Threading.Tests/console
with:

COMPlus_JitStress=2
COMPlus_JitStressRegs=8
COMPlus_TieredCompilation=0

    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999998

https://helix.dot.net/api/2019-06-17/jobs/62be07e6-d52f-4b6f-b7d0-9d377024bb20/workitems/System.Threading.Tests/console
with:

COMPlus_JitStress=2
COMPlus_JitStressRegs=4
COMPlus_TieredCompilation=0

    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999998

I'm going to disable this test for JIT stress modes, for now.

@VSadov
Copy link
Member

VSadov commented Jun 1, 2020

I have an open PR to re-disable the test - #35974
We can merge that for now.

There is clearly an issue. Now on Windows...

@VSadov
Copy link
Member

VSadov commented Jun 1, 2020

@BruceForstall I have reverted the change that enabled this test

@BruceForstall
Copy link
Member

This just failed in Linux arm (JitStress=2, JitStressRegs=3):

https://dev.azure.com/dnceng/public/_build/results?buildId=736732&view=ms.vss-test-web.build-test-results-tab&runId=22818246&resultId=174249&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

COMPlus_TieredCompilation=0
COMPlus_JitStress=2
COMPlus_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
COMPlus_JitStressRegs=3
COMPlus_DbgEnableMiniDump=1
+ ./RunTests.sh --runtime-path /root/helix/work/correlation
----- start Sat Jul 18 07:27:01 UTC 2020 =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Threading.Tests.runtimeconfig.json --depsfile System.Threading.Tests.deps.json xunit.console.dll System.Threading.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem /root/helix/work/workitem
  Discovering: System.Threading.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Threading.Tests (found 210 of 236 test cases)
  Starting:    System.Threading.Tests (parallel test collections = on, max threads = 4)
    System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide [FAIL]
      Assert.Equal() Failure
      Expected: 1000000
      Actual:   999999
      Stack Trace:
        /_/src/coreclr/src/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs(319,0): at System.Runtime.CompilerServices.RuntimeHelpers.DispatchTailCalls(IntPtr callersRetAddrSlot, IntPtr callTarget, IntPtr retVal)
        /_/src/libraries/System.Threading/tests/InterlockedTests.cs(325,0): at System.Threading.Tests.InterlockedTests.MemoryBarrierProcessWide()
  Finished:    System.Threading.Tests
=== TEST EXECUTION SUMMARY ===

@VSadov Looks like you only disabled it for arm64, not for arm32.

@VSadov
Copy link
Member

VSadov commented Jul 20, 2020

@BruceForstall - the failure was on Windows ARM64. This is something new. Is there only one hit?

@BruceForstall
Copy link
Member

@VSadov I've only seen one failure of this Linux arm32 case. However, I don't trust the Kusto data because it looks to me like the scripts to upload failures from Linux arm32 to Kusto have been broken.

@VSadov
Copy link
Member

VSadov commented Jul 20, 2020

@BruceForstall - looked at the test again. I think there is a memory ordering issue in the test code.

@VSadov
Copy link
Member

VSadov commented Aug 3, 2020

I think this was fixed in #39668

@VSadov VSadov closed this as completed Aug 3, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-PAL-coreclr area-System.Threading disabled-test The test is disabled in source code against the issue os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests