Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ppc64le] System.Collections.Concurrent.Tests timeout #71079

Closed
directhex opened this issue Jun 21, 2022 · 12 comments
Closed

[ppc64le] System.Collections.Concurrent.Tests timeout #71079

directhex opened this issue Jun 21, 2022 · 12 comments

Comments

@directhex
Copy link
Member

directhex commented Jun 21, 2022

Description

Console log: 'System.Collections.Concurrent.Tests' from job 0704ee99-5984-4856-b01e-ac29d37e7872 workitem 59a0e8dd-96e5-4601-adc4-686d2c70de70 (ubuntu.2004.ppc64le.experimental.open) executed on machine dotnet-4
+ ./RunTests.sh --runtime-path /home/helixbot/work/9F6908CF/p
----- start Tue 21 Jun 2022 06:35:03 PM UTC =============== To repro directly: =====================================================
pushd .
/home/helixbot/work/9F6908CF/p/dotnet exec --runtimeconfig System.Collections.Concurrent.Tests.runtimeconfig.json --depsfile System.Collections.Concurrent.Tests.deps.json xunit.console.dll System.Collections.Concurrent.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
~/work/9F6908CF/w/AFB2097D/e ~/work/9F6908CF/w/AFB2097D/e
  Discovering: System.Collections.Concurrent.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Collections.Concurrent.Tests (found 884 of 893 test cases)
  Starting:    System.Collections.Concurrent.Tests (parallel test collections = on, max threads = 2)
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_int_int.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_int_int.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_ulong_ulong.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_ulong_ulong.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
Killed
['System.Collections.Concurrent.Tests' END OF WORK ITEM LOG: Command timed out, and was killed]

Reproduction Steps

Run test suite as normal

Configuration

git master on ppc64el (community port)

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 21, 2022
@ghost
Copy link

ghost commented Jun 21, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Console log: 'System.Collections.Concurrent.Tests' from job 0704ee99-5984-4856-b01e-ac29d37e7872 workitem 59a0e8dd-96e5-4601-adc4-686d2c70de70 (ubuntu.2004.ppc64le.experimental.open) executed on machine dotnet-4
+ ./RunTests.sh --runtime-path /home/helixbot/work/9F6908CF/p
----- start Tue 21 Jun 2022 06:35:03 PM UTC =============== To repro directly: =====================================================
pushd .
/home/helixbot/work/9F6908CF/p/dotnet exec --runtimeconfig System.Collections.Concurrent.Tests.runtimeconfig.json --depsfile System.Collections.Concurrent.Tests.deps.json xunit.console.dll System.Collections.Concurrent.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
~/work/9F6908CF/w/AFB2097D/e ~/work/9F6908CF/w/AFB2097D/e
  Discovering: System.Collections.Concurrent.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Collections.Concurrent.Tests (found 884 of 893 test cases)
  Starting:    System.Collections.Concurrent.Tests (parallel test collections = on, max threads = 2)
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_int_int.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_int_int.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_ulong_ulong.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
    System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_ulong_ulong.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP]
      Condition(s) not met: "IsPreciseGcSupported"
Killed
['System.Collections.Concurrent.Tests' END OF WORK ITEM LOG: Command timed out, and was killed]

Reproduction Steps

Run test suite as normal

Expected behavior

Don't time out

Actual behavior

Times out

Regression?

No response

Known Workarounds

No response

Configuration

git master on ppc64el (community port)

Other information

No response

Author: directhex
Assignees: -
Labels:

area-System.Collections, arch-ppc64le

Milestone: -

@Sapana-Khemkar
Copy link
Contributor

Yes I can reproduce issue. This testcase few time works on ppc64le and few time fails.

Here are results when it passed successfully

ubuntu@dotnet:~/runtime_preview3_test/artifacts/bin/System.Collections.Concurrent.Tests/Release/net7.0$ ../../../testhost/net7.0-Linux-Release-ppc64le/dotnet exec --runtimeconfig System.Collections.Concurrent.Tests.runtimeconfig.json --depsfile System.Collections.Concurrent.Tests.deps.json xunit.console.dll System.Collections.Concurrent.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing Discovering: System.Collections.Concurrent.Tests (method display = ClassAndMethod, method display options = None) Discovered: System.Collections.Concurrent.Tests (found 883 of 892 test cases) Starting: System.Collections.Concurrent.Tests (parallel test collections = on, max threads = 8) System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_ulong_ulong.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_int_int.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_int_int.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_string_string.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_ulong_ulong.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_string_string.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_enum_enum.ICollection_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_enum_enum.IDictionary_Generic_Remove_ReferenceRemovedFromCollection [SKIP] Condition(s) not met: "IsPreciseGcSupported" Finished: System.Collections.Concurrent.Tests === TEST EXECUTION SUMMARY === System.Collections.Concurrent.Tests Total: 2379, Errors: 0, Failed: 0, Skipped: 8, Time: 21.539s
Issue is observed on preview3 build as well. We will be looking into this.

@alhad-deshpande
Copy link
Contributor

We could reproduce this on Power by implementing a small test program. The test program tries to simultaneously push and pop items to ConcurrentStack. Only first item gets pushed onto stack and then nothing happens. This is the reason that this test case is timing out on CICD pipeline. We are debugging this to find out the root cause and probably something is going wrong in generating power assembly code for OP_ATOMIC_CAS_I8 opcode.

@alhad-deshpande
Copy link
Contributor

alhad-deshpande commented Jul 25, 2022

We debugged this and find out that this is related to a call to GC.GetTotalAllocatedBytes(precise: true); from the test case Concurrent_Push_TryPop_WithSuspensions witin ConcurrentStackTests.cs file then the test case passes. This is due to the fact that Consumer thread is waiting for GC to complete but GC is not able to complete. Below is the stack trace for GC thread:

(gdb) thread 7
[Switching to thread 7 (Thread 0x7fffef9ef150 (LWP 939975))]
#0  0x00007ffff7f374c0 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7ffff76da078 <suspend_semaphore>)
    at ../sysdeps/nptl/futex-internal.h:320
320     ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0  0x00007ffff7f374c0 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7ffff76da078 <suspend_semaphore>)
    at ../sysdeps/nptl/futex-internal.h:320
#1  do_futex_wait (sem=sem@entry=0x7ffff76da078 <suspend_semaphore>, abstime=0x0, clockid=0) at sem_waitcommon.c:112
#2  0x00007ffff7f37668 in __new_sem_wait_slow (sem=0x7ffff76da078 <suspend_semaphore>, abstime=0x0, clockid=0) at sem_waitcommon.c:184
#3  0x00007ffff72cb18c in mono_os_sem_wait (sem=0x7ffff76da078 <suspend_semaphore>, flags=MONO_SEM_FLAGS_NONE)
    at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/utils/mono-os-semaphore.h:204
#4  0x00007ffff72cb8c4 in mono_os_sem_timedwait (sem=0x7ffff76da078 <suspend_semaphore>, timeout_ms=4294967295, flags=MONO_SEM_FLAGS_NONE)
    at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/utils/mono-os-semaphore.h:237
#5  0x00007ffff72cb5d4 in mono_threads_wait_pending_operations () at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/utils/mono-threads.c:323
#6  0x00007ffff727da04 in unified_suspend_stop_world (flags=MONO_THREAD_INFO_FLAGS_NO_GC,
    thread_stopped_callback=0x7ffff727df40 <sgen_client_stop_world_thread_stopped_callback>)
    at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/metadata/sgen-stw.c:345
#7  0x00007ffff727d368 in sgen_client_stop_world (generation=0, serial_collection=0) at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/metadata/sgen-stw.c:155
#8  0x00007ffff72fb8c0 in sgen_stop_world (generation=0, serial_collection=0) at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/sgen/sgen-gc.c:3995
#9  0x00007ffff72e2b58 in sgen_get_total_allocated_bytes (precise=1 '\001') at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/sgen/sgen-alloc.c:566
#10 0x00007ffff72858b8 in mono_gc_get_total_allocated_bytes (precise=1 '\001') at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/metadata/sgen-mono.c:2692
#11 0x00007ffff718a19c in ves_icall_System_GC_GetTotalAllocatedBytes (precise=1 '\001', error=0x7fffef9ed5a8)
    at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/metadata/icall.c:7159
#12 0x00007ffff718b814 in ves_icall_System_GC_GetTotalAllocatedBytes_raw (a0=1 '\001')
    at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/mini/../metadata/icall-def.h:208
#13 0x00007ffff6911a7c in ?? ()
#14 0x00007ffff68dc5e8 in ?? ()
#15 0x00007ffff68da568 in ?? ()
#16 0x00007ffff68da470 in ?? ()
#17 0x00007ffff68d9fc4 in ?? ()
#18 0x00007ffff68d92c4 in ?? ()
#19 0x00007ffff68d7bc8 in ?? ()
#20 0x00007ffff68d7ab0 in ?? ()
#21 0x00007ffff68d7974 in ?? ()
#22 0x00007ffff68caf40 in ?? ()
#23 0x00007ffff68c5818 in ?? ()
#24 0x00007ffff68c1e78 in ?? ()
#25 0x00007ffff68c1a88 in ?? ()
#26 0x00007ffff68c14f8 in ?? ()
#27 0x00007ffff68c1760 in ?? ()
#28 0x00007ffff73a540c in mono_jit_runtime_invoke (method=0x0, obj=0x7ffff68d92c4, params=0x3fdffb122a0037e9, exc=0x7fffef9edc40, error=0x7fffef9edac0)
    at /root/alhad/dotnet-ppc64le/runtime/src/mono/mono/mini/mini-runtime.c:3570

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@janani66
Copy link

On enabling debug code for thread suspensions, we found that it isn't actually using signals to interrupt the other threads, but rather it uses the cooperative method. This works by the JIT inserting "safe points" into every JITed method, which regularly check a flag to see whether the thread was requested to suspend itself.

The opcode that handles this on power OP_GC_SAFE_POINT does nothing!

@directhex
Copy link
Member Author

It looks like an empty placeholder OP_GC_SAFE_POINT was added to various architectures (MIPS, Itanium, etc) in 2015 to keep the API consistent between architectures, but the implementation was never filled in

f5a4164

@janani66
Copy link

Yes, we are working on implementing the OP_GC_SAFE_POINT opcode for ppc64le

@alhad-deshpande
Copy link
Contributor

@directhex
I have implemented opcode OP_GC_SAFE_POINT for ppc64le and now tests are passing.

@SamMonoRT
Copy link
Member

@alhad-deshpande @directhex : Is there any more work remaining here or can this be closed ?

@SamMonoRT SamMonoRT removed the untriaged New issue has not been triaged by the area owner label Aug 3, 2022
@SamMonoRT SamMonoRT added this to the 7.0.0 milestone Aug 3, 2022
@directhex
Copy link
Member Author

directhex commented Aug 3, 2022 via email

@SamMonoRT
Copy link
Member

Closing the issue - please reach out if issue isn't resolved.

@alhad-deshpande
Copy link
Contributor

@SamMonoRT
Yes, we are good to close this issue. Thanks.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants