Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libraries-pgo] segfault in System.Linq.Tests #98292

Closed
jakobbotsch opened this issue Feb 12, 2024 · 7 comments · Fixed by #101709
Closed

[libraries-pgo] segfault in System.Linq.Tests #98292

jakobbotsch opened this issue Feb 12, 2024 · 7 comments · Fixed by #101709
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@jakobbotsch
Copy link
Member

Example pipeline run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=561955&view=results
Example console log: https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-a462ac17c3a4407a9b/System.Linq.Tests/1/console.9c49f47e.log?helixlogtype=result

DOTNET_TC_QuickJitForLoops=1
DOTNET_EnableCrashReport=1
DOTNET_TieredCompilation=1
DOTNET_JitRandomOnStackReplacement=15
DOTNET_TC_OnStackReplacement=1
DOTNET_OSR_HitLimit=2
DOTNET_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
DOTNET_DbgEnableMiniDump=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
+ ./RunTests.sh --runtime-path /datadisks/disk1/work/AE710998/p
========================= Begin custom configuration settings ==============================
export __IsXUnitLogCheckerSupported=1
========================== End custom configuration settings ===============================
----- start Sun Feb 11 04:54:09 PM UTC 2024 =============== To repro directly: =====================================================
pushd .
/datadisks/disk1/work/AE710998/p/dotnet exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/datadisks/disk1/work/AE710998/w/BB410A69/e /datadisks/disk1/work/AE710998/w/BB410A69/e
  Discovering: System.Linq.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Linq.Tests (found 1754 of 1758 test cases)
  Starting:    System.Linq.Tests (parallel test collections = on [2 threads], stop on fail = off)
    System.Linq.Tests.WhereTests.IndexOverflows [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
./RunTests.sh: line 180:  2568 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/datadisks/disk1/work/AE710998/w/BB410A69/e
----- end Sun Feb 11 04:54:30 PM UTC 2024 ----- exit code 139 ----------------------------------------------------------
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.

cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 12, 2024
@ghost
Copy link

ghost commented Feb 12, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Example pipeline run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=561955&view=results
Example console log: https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-a462ac17c3a4407a9b/System.Linq.Tests/1/console.9c49f47e.log?helixlogtype=result

DOTNET_TC_QuickJitForLoops=1
DOTNET_EnableCrashReport=1
DOTNET_TieredCompilation=1
DOTNET_JitRandomOnStackReplacement=15
DOTNET_TC_OnStackReplacement=1
DOTNET_OSR_HitLimit=2
DOTNET_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
DOTNET_DbgEnableMiniDump=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
+ ./RunTests.sh --runtime-path /datadisks/disk1/work/AE710998/p
========================= Begin custom configuration settings ==============================
export __IsXUnitLogCheckerSupported=1
========================== End custom configuration settings ===============================
----- start Sun Feb 11 04:54:09 PM UTC 2024 =============== To repro directly: =====================================================
pushd .
/datadisks/disk1/work/AE710998/p/dotnet exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/datadisks/disk1/work/AE710998/w/BB410A69/e /datadisks/disk1/work/AE710998/w/BB410A69/e
  Discovering: System.Linq.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Linq.Tests (found 1754 of 1758 test cases)
  Starting:    System.Linq.Tests (parallel test collections = on [2 threads], stop on fail = off)
    System.Linq.Tests.WhereTests.IndexOverflows [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
./RunTests.sh: line 180:  2568 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/datadisks/disk1/work/AE710998/w/BB410A69/e
----- end Sun Feb 11 04:54:30 PM UTC 2024 ----- exit code 139 ----------------------------------------------------------
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.

cc @dotnet/jit-contrib

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch added this to the 9.0.0 milestone Feb 12, 2024
@jakobbotsch jakobbotsch removed the untriaged New issue has not been triaged by the area owner label Feb 12, 2024
@AndyAyersMS AndyAyersMS self-assigned this Feb 26, 2024
@JulieLeeMSFT
Copy link
Member

CC @AndyAyersMS, it happened in Pipelines - Runs for runtime-coreclr libraries-jitstress2-jitstressregs (azure.com)

----- start Sat Mar 9 09:47:31 UTC 2024 =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Linq.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Linq.Tests (found 1796 of 1800 test cases)
  Starting:    System.Linq.Tests (parallel test collections = on [2 threads], stop on fail = off)
    System.Linq.Tests.SelectTests.Overflow [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.ToArrayTests.ToArray_FailOnExtremelyLargeCollection [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.IndexTests.LargeEnumerable [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.IndexTests.LargeEnumerable_ThrowsOverflowException [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
[createdump] Gathering state for process 28 dotnet
[createdump] Crashing thread 002a signal 11 (000b)
[createdump] Writing crash report to file /home/helixbot/dotnetbuild/dumps/coredump.28.dmp.crashreport.json
[createdump] Crash report successfully written
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.28.dmp
[createdump] Written 183283712 bytes (44747 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 294ms
waitpid() returned successfully (wstatus 00000000) WEXITSTATUS 0 WTERMSIG 0
./RunTests.sh: line 180:    28 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Sat Mar 9 09:47:57 UTC 2024 ----- exit code 139 ----------------------------------------------------------
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.
ulimit -c value: unlimited
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dumps...
dmesg: read kernel buffer failed: Operation not permitted

@BruceForstall
Copy link
Member

More segfaults in runtime-coreclr libraries-pgo pipeline, net9.0-linux-Release-x64-jitosr_stress_random-Ubuntu.2204.Amd64.Open job:

System.Linq.Tests Work Item
System.Private.Uri.Functional.Tests Work Item
System.Runtime.Numerics.Tests Work Item

DOTNET_TC_QuickJitForLoops=1
DOTNET_EnableCrashReport=1
DOTNET_TieredCompilation=1
DOTNET_JitRandomOnStackReplacement=15
DOTNET_TC_OnStackReplacement=1
DOTNET_OSR_HitLimit=2
DOTNET_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
DOTNET_DbgEnableMiniDump=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
+ ./RunTests.sh --runtime-path /datadisks/disk1/work/9F7A0969/p
========================= Begin custom configuration settings ==============================
export __IsXUnitLogCheckerSupported=1
========================== End custom configuration settings ===============================
----- start Mon Mar 18 05:52:48 PM UTC 2024 =============== To repro directly: =====================================================
pushd .
/datadisks/disk1/work/9F7A0969/p/dotnet exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/datadisks/disk1/work/9F7A0969/w/B55D0983/e /datadisks/disk1/work/9F7A0969/w/B55D0983/e
  Discovering: System.Linq.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Linq.Tests (found 1796 of 1800 test cases)
  Starting:    System.Linq.Tests (parallel test collections = on [2 threads], stop on fail = off)
    System.Linq.Tests.ToArrayTests.ToArray_FailOnExtremelyLargeCollection [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.SelectTests.Overflow [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.WhereTests.IndexOverflows [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.SkipWhileTests.IndexSkipWhileOverflowBeyondIntMaxValueElements [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.IndexTests.LargeEnumerable [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.IndexTests.LargeEnumerable_ThrowsOverflowException [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.SelectManyTests.IndexOverflow [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
    System.Linq.Tests.TakeWhileTests.IndexTakeWhileOverflowBeyondIntMaxValueElements [SKIP]
      Condition(s) not met: "IsStressModeEnabled"
[createdump] Gathering state for process 35471 dotnet
[createdump] Crashing thread 8a9f signal 11 (000b)
[createdump] Writing crash report to file /home/helixbot/dotnetbuild/dumps/coredump.35471.dmp.crashreport.json
[createdump] Crash report successfully written
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.35471.dmp
[createdump] Written 478232576 bytes (116756 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 871ms
./RunTests.sh: line 180: 35471 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Linq.Tests.runtimeconfig.json --depsfile System.Linq.Tests.deps.json xunit.console.dll System.Linq.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/datadisks/disk1/work/9F7A0969/w/B55D0983/e
----- end Mon Mar 18 05:54:03 PM UTC 2024 ----- exit code 139 ----------------------------------------------------------
+ grep DOTNET
DOTNET_TC_QuickJitForLoops=1
DOTNET_EnableCrashReport=1
DOTNET_TieredCompilation=1
DOTNET_JitRandomOnStackReplacement=15
DOTNET_TC_OnStackReplacement=1
DOTNET_OSR_HitLimit=2
DOTNET_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
DOTNET_DbgEnableMiniDump=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
+ ./RunTests.sh --runtime-path /datadisks/disk1/work/9F7A0969/p
========================= Begin custom configuration settings ==============================
export __IsXUnitLogCheckerSupported=1
========================== End custom configuration settings ===============================
----- start Mon Mar 18 05:52:50 PM UTC 2024 =============== To repro directly: =====================================================
pushd .
/datadisks/disk1/work/9F7A0969/p/dotnet exec --runtimeconfig System.Private.Uri.Functional.Tests.runtimeconfig.json --depsfile System.Private.Uri.Functional.Tests.deps.json xunit.console.dll System.Private.Uri.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/datadisks/disk1/work/9F7A0969/w/B329092E/e /datadisks/disk1/work/9F7A0969/w/B329092E/e
  Discovering: System.Private.Uri.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Private.Uri.Functional.Tests (found 388 of 390 test cases)
  Starting:    System.Private.Uri.Functional.Tests (parallel test collections = on [2 threads], stop on fail = off)
[createdump] Gathering state for process 42350 dotnet
[createdump] Crashing thread a57f signal 11 (000b)
[createdump] Writing crash report to file /home/helixbot/dotnetbuild/dumps/coredump.42350.dmp.crashreport.json
[createdump] Crash report successfully written
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.42350.dmp
[createdump] Written 263061504 bytes (64224 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 630ms
./RunTests.sh: line 180: 42350 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Private.Uri.Functional.Tests.runtimeconfig.json --depsfile System.Private.Uri.Functional.Tests.deps.json xunit.console.dll System.Private.Uri.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/datadisks/disk1/work/9F7A0969/w/B329092E/e
----- end Mon Mar 18 05:53:12 PM UTC 2024 ----- exit code 139 ----------------------------------------------------------
+ grep DOTNET
DOTNET_TC_QuickJitForLoops=1
DOTNET_EnableCrashReport=1
DOTNET_TieredCompilation=1
DOTNET_JitRandomOnStackReplacement=15
DOTNET_TC_OnStackReplacement=1
DOTNET_OSR_HitLimit=2
DOTNET_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
DOTNET_DbgEnableMiniDump=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
+ ./RunTests.sh --runtime-path /datadisks/disk1/work/9F7A0969/p
========================= Begin custom configuration settings ==============================
export __IsXUnitLogCheckerSupported=1
========================== End custom configuration settings ===============================
----- start Mon Mar 18 05:53:03 PM UTC 2024 =============== To repro directly: =====================================================
pushd .
/datadisks/disk1/work/9F7A0969/p/dotnet exec --runtimeconfig System.Runtime.Numerics.Tests.runtimeconfig.json --depsfile System.Runtime.Numerics.Tests.deps.json xunit.console.dll System.Runtime.Numerics.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/datadisks/disk1/work/9F7A0969/w/B6570A6A/e /datadisks/disk1/work/9F7A0969/w/B6570A6A/e
  Discovering: System.Runtime.Numerics.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Numerics.Tests (found 558 of 566 test cases)
  Starting:    System.Runtime.Numerics.Tests (parallel test collections = on [2 threads], stop on fail = off)
[createdump] Gathering state for process 30783 dotnet
[createdump] Crashing thread 7850 signal 11 (000b)
[createdump] Writing crash report to file /home/helixbot/dotnetbuild/dumps/coredump.30783.dmp.crashreport.json
[createdump] Crash report successfully written
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.30783.dmp
[createdump] Written 340934656 bytes (83236 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 680ms
./RunTests.sh: line 180: 30783 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Numerics.Tests.runtimeconfig.json --depsfile System.Runtime.Numerics.Tests.deps.json xunit.console.dll System.Runtime.Numerics.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/datadisks/disk1/work/9F7A0969/w/B6570A6A/e
----- end Mon Mar 18 05:53:26 PM UTC 2024 ----- exit code 139 ----------------------------------------------------------

https://dev.azure.com/dnceng-public/public/_build/results?buildId=607039&view=ms.vss-test-web.build-test-results-tab

@AndyAyersMS

@BruceForstall BruceForstall added the blocking-clean-ci-optional Blocking optional rolling runs label Mar 18, 2024
@AndyAyersMS
Copy link
Member

These haven't failed (in this way) since 3/18. Will try to repro.

@AndyAyersMS
Copy link
Member

Several hundred local runs with no repro. Going to unmark this as blocking for now.

@JulieLeeMSFT
Copy link
Member

@AndyAyersMS, it happend in libraries-pgo:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=658780&view=logs&j=faaa3af4-06d2-51dd-5d8a-97b08f66fd9a
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.

@jakobbotsch
Copy link
Member Author

We expect #101709 to fix this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants