Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Runtime.Tests Outerloop "killed" on Debian.10 and Debian.11 #58616

Open
karelz opened this issue Sep 3, 2021 · 6 comments
Open

System.Runtime.Tests Outerloop "killed" on Debian.10 and Debian.11 #58616

karelz opened this issue Sep 3, 2021 · 6 comments
Labels
area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration test-bug Problem in test source code (most likely)
Milestone

Comments

@karelz
Copy link
Member

karelz commented Sep 3, 2021

PR #58129 hit 4 times this failure:

  Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Tests (found 28 of 6264 test cases)
  Starting:    System.Runtime.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162:    24 Killed

Seems related to #56567

Not sure how to query Kusto to see all these failures. I bet they will be fairly common ...

@ghost
Copy link

ghost commented Sep 3, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

PR #58129 hit 4 times this failure:

  Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Tests (found 28 of 6264 test cases)
  Starting:    System.Runtime.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162:    24 Killed

Seems related to #56567

Not sure how to query Kusto to see all these failures. I bet they will be fairly common ...

Author: karelz
Assignees: -
Labels:

area-System.Runtime

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 3, 2021
@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration test-bug Problem in test source code (most likely) and removed untriaged New issue has not been triaged by the area owner labels Sep 30, 2021
@jeffhandley jeffhandley added this to the Future milestone Sep 30, 2021
@buyaa-n
Copy link
Member

buyaa-n commented Dec 2, 2021

Failing again in this build, the log:

  Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Tests (found 28 of 6279 test cases)
  Starting:    System.Runtime.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162:    24 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Thu 02 Dec 2021 10:40:04 PM UTC ----- exit code 137 ----------------------------------------------------------
exit code 137 means SIGKILL Killed eg by kill
ulimit -c value: unlimited
[ 2892.025746] 264 total pagecache pages
[ 2892.025747] 0 pages in swap cache
[ 2892.025748] Swap cache stats: add 0, delete 0, find 0/0
[ 2892.025748] Free swap  = 0kB
[ 2892.025749] Total swap = 0kB
[ 2892.025749] 2097038 pages RAM
[ 2892.025750] 0 pages HighMem/MovableOnly
[ 2892.025750] 58679 pages reserved
[ 2892.025751] 0 pages cma reserved
[ 2892.025751] 0 pages hwpoisoned
[ 2892.025752] Tasks state (memory values in pages):
[ 2892.025752] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 2892.025756] [    465]     0   465    27774      163   233472        0             0 systemd-journal
[ 2892.025758] [    468]     0   468    24428       45    94208        0             0 lvmetad
[ 2892.025760] [    476]     0   476    11040      397   118784        0         -1000 systemd-udevd
[ 2892.025761] [    575]     0   575     3067      267    69632        0             0 hv_kvp_daemon
[ 2892.025763] [    731] 62583   731    35446      110   176128        0             0 systemd-timesyn
[ 2892.025765] [    860]   100   860    20010      156   176128        0             0 systemd-network
[ 2892.025767] [    890]   101   890    17654      150   180224        0             0 systemd-resolve
[ 2892.025769] [   1096]     0  1096    20059     3262   196608        0             0 python3
[ 2892.025770] [   1104]   102  1104    66818      263   172032        0             0 rsyslogd
[ 2892.025772] [   1106]     0  1106     7938       75   102400        0             0 cron
[ 2892.025773] [   1109]     0  1109    40270       33    86016        0             0 lxcfs
[ 2892.025775] [   1110]     0  1110    42708     1954   229376        0             0 networkd-dispat
[ 2892.025777] [   1111]     0  1111    15502      142   159744        0             0 systemd-logind
[ 2892.025778] [   1114]     0  1114    72000      214   196608        0             0 accounts-daemon
[ 2892.025786] [   1118]     0  1118     7084       52    98304        0             0 atd
[ 2892.025813] [   1119]   103  1119    12514      158   143360        0          -900 dbus-daemon
[ 2892.025815] [   1134]     0  1134    27605       55   114688        0             0 irqbalance
[ 2892.025816] [   1137]     0  1137   226267     7026   303104        0          -999 containerd
[ 2892.025818] [   1145]     0  1145     4104       37    77824        0             0 agetty
[ 2892.025819] [   1146]     0  1146     3723       33    77824        0             0 agetty
[ 2892.025821] [   1150]     0  1150    72221      197   200704        0             0 polkitd
[ 2892.025823] [   1270]     0  1270     1128       17    53248        0             0 none
[ 2892.025824] [   1358]     0  1358    96652     4263   262144        0             0 python3
[ 2892.025826] [   1546]     0  1546    18076      181   192512        0         -1000 sshd
[ 2892.025827] [   2151]  1000  2151     2899       66    65536        0             0 helix.sh
[ 2892.025829] [   2616]     0  2616   284735    24690   577536        0          -500 dockerd
[ 2892.025831] [   2982]  1000  2982    44344     6855   253952        0             0 python3
[ 2892.025832] [   2986]   106  2986     7150       45   106496        0             0 uuidd
[ 2892.025834] [   2999]  1000  2999    63772     7335   274432        0             0 python3
[ 2892.025835] [   3000]  1000  3000   146774    33653   565248        0             0 python3
[ 2892.025837] [  15019]     0 15019    27991     1437    86016        0          -998 containerd-shim
[ 2892.025839] [  15038]  1000 15038      596       17    40960        0             0 helix_docker_wo
[ 2892.025841] [  15116]  1000 15116      596       17    40960        0             0 execute.sh
[ 2892.025842] [  15118]  1000 15118     1714       86    45056        0             0 bash
[ 2892.025844] [  15128]  1000 15128  2953699  1877208 15458304        0             0 dotnet
[ 2892.025845] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=7fa72756f4d363408f0bb47c6ee1a16bb96ebd7d7ca37cb2c2e2f33f8f86c62a,mems_allowed=0,global_oom,task_memcg=/docker/7fa72756f4d363408f0bb47c6ee1a16bb96ebd7d7ca37cb2c2e2f33f8f86c62a,task=dotnet,pid=15128,uid=1000
[ 2892.025890] Out of memory: Killed process 15128 (dotnet) total-vm:11814796kB, anon-rss:7508832kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:15096kB oom_score_adj:0
[ 2892.209529] oom_reaper: reaped process 15128 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Waiting a few seconds for any dump to be written..
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dump..
... found no dump in /root/helix/work/workitem/e

@danmoseley
Copy link
Member

We are missing a way to get a dump when we are killed for using too much memory. #56165 is similar but for IO tests.

@adamsitnik
Copy link
Member

Hit in #66387.

Sample log file: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-66387-merge-8f05b743b17c405b9f/System.Runtime.Tests/1/console.d0128aa7.log?sv=2019-07-07&se=2022-03-29T13%3A33%3A14Z&sr=c&sp=rl&sig=mDrKRAWhoS3nfIdvqjfHj9WPL5uoPC5Rw6Kk5r5E0Hg%3D

/root/helix/work/correlation/dotnet exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Tests (found 28 of 6309 test cases)
  Starting:    System.Runtime.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 168:    25 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Wed 09 Mar 2022 01:53:24 PM UTC ----- exit code 137 ----------------------------------------------------------
exit code 137 means SIGKILL Killed eg by kill
ulimit -c value: unlimited
[  317.229311] 0 pages hwpoisoned
[  317.229311] Tasks state (memory values in pages):
[  317.229312] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  317.229316] [    451]     0   451    23710      198   184320        0             0 systemd-journal
[  317.229318] [    465]     0   465    24428       44    94208        0             0 lvmetad
[  317.229320] [    474]     0   474    11006      348   118784        0         -1000 systemd-udevd
[  317.229321] [    540]     0   540     3067      267    65536        0             0 hv_kvp_daemon
[  317.229323] [    854] 62583   854    35446      108   180224        0             0 systemd-timesyn
[  317.229325] [    945]   100   945    20009      163   167936        0             0 systemd-network
[  317.229326] [    969]   101   969    17654      150   184320        0             0 systemd-resolve
[  317.229328] [   1173]     0  1173    42708     1957   233472        0             0 networkd-dispat
[  317.229329] [   1176]     0  1176    15502      143   167936        0             0 systemd-logind
[  317.229331] [   1177]     0  1177    20059     3266   196608        0             0 python3
[  317.229333] [   1178]     0  1178    40270       33    81920        0             0 lxcfs
[  317.229334] [   1183]     0  1183     7964       78    98304        0             0 cron
[  317.229336] [   1186]     0  1186     7084       52    98304        0             0 atd
[  317.229337] [   1187]     0  1187    72000      215   200704        0             0 accounts-daemon
[  317.229339] [   1188]   103  1188    12539      179   147456        0          -900 dbus-daemon
[  317.229340] [   1200]     0  1200    27605       55   122880        0             0 irqbalance
[  317.229342] [   1202]     0  1202   337818     5223   294912        0          -999 containerd
[  317.229343] [   1221]     0  1221     4104       37    73728        0             0 agetty
[  317.229345] [   1230]     0  1230     3723       34    69632        0             0 agetty
[  317.229346] [   1232]     0  1232    72221      197   204800        0             0 polkitd
[  317.229348] [   1370]     0  1370     1128       17    53248        0             0 none
[  317.229349] [   1443]     0  1443   115283     4961   270336        0             0 python3
[  317.229351] [   1732]     0  1732    18076      183   180224        0         -1000 sshd
[  317.229353] [   1899]     0  1899   410936    21841   589824        0          -500 dockerd
[  317.229354] [   2242]  1000  2242     2899       64    61440        0             0 helix.sh
[  317.229356] [   2794]     0  2794    37139     2964   192512        0             0 python3
[  317.229357] [   2799]     0  2799   900949     7022   303104        0             0 amacoreagent
[  317.229358] [   2886]     0  2886   229350      299   151552        0             0 agentlauncher
[  317.229360] [   2932]     0  2932    79371       69   118784        0             0 mdsdmgr
[  317.229362] [   3296]  1000  3296    44344     6857   253952        0             0 python3
[  317.229363] [   3364]   106  3364     7150       45   102400        0             0 uuidd
[  317.229365] [   3398]  1000  3398    63562     7108   270336        0             0 python3
[  317.229366] [   3399]  1000  3399   105350    10700   323584        0             0 python3
[  317.229368] [   3473]   102  3473    67333      139   180224        0             0 rsyslogd
[  317.229369] [   3605]   102  3605   150857     2983   446464        0             0 mdsd
[  317.229371] [   3893]     0  3893   297958     3060   249856        0             0 auoms
[  317.229372] [   3921]     0  3921   215737     1698   188416        0             0 auomscollect
[  317.229374] [   4237]     0  4237   201897     4319   241664        0             0 azsecd
[  317.229375] [   4248]     0  4248   177083     3317   188416        0             0 azsecmond
[  317.229377] [   5040]     0  5040   178212     1696   114688        0          -998 containerd-shim
[  317.229378] [   5062]  1000  5062      596       17    40960        0             0 helix_docker_wo
[  317.229380] [   5133]  1000  5133      596       17    40960        0             0 execute.sh
[  317.229381] [   5135]  1000  5135     1714       86    49152        0             0 bash
[  317.229383] [   5145]  1000  5145  2904006  1869990 15400960        0             0 dotnet
[  317.229384] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=d6d83bd2640930892e9431ca04fa850dffadde62971abe155f3701da4bb9a1cc,mems_allowed=0,global_oom,task_memcg=/docker/d6d83bd2640930892e9431ca04fa850dffadde62971abe155f3701da4bb9a1cc,task=dotnet,pid=5145,uid=1000
[  317.229424] Out of memory: Killed process 5145 (dotnet) total-vm:11616024kB, anon-rss:7479960kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:15040kB oom_score_adj:0
[  317.359200] oom_reaper: reaped process 5145 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Waiting a few seconds for any dump to be written..
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dump..
... found no dump in /root/helix/work/workitem/e

@danmoseley
Copy link
Member

Ideally we'd have a way to get some kind of diagnostics on OOM - as there's never a dump. Eg xunit could log the test name if memory is above a threshold, as it does for long running tests.

@adamsitnik
Copy link
Member

There is one test that allocates more than 8GB:

image

I'll try to find it tomorrow (I've hit a bug in VS which does not seem to work well with such huge profiles)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration test-bug Problem in test source code (most likely)
Projects
None yet
Development

No branches or pull requests

5 participants