Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures on dartkp-linux-debug-simriscv64 #48354

Closed
sstrickl opened this issue Feb 9, 2022 · 4 comments
Closed

Failures on dartkp-linux-debug-simriscv64 #48354

sstrickl opened this issue Feb 9, 2022 · 4 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening

Comments

@sstrickl
Copy link
Contributor

sstrickl commented Feb 9, 2022

There are new test failures on [dart2js] Cleanup ImpactCacheDeleter....[release] Add 2.16.1 release notes.

The tests

lib_2/collection/queue_test Crash (expected Pass)
standalone_2/out_of_memory_recovery_test Crash (expected Pass)

are failing on configurations

dartkp-linux-debug-simriscv64

Logs:

/==================================================================\
| lib_2/collection/queue_test broke (Pass -> Crash, expected Pass) |
\==================================================================/

--- Command "vm_compile_to_kernel" (took 03.000299s):
DART_CONFIGURATION=DebugSIMRISCV64 /b/s/w/ir/pkg/vm/tool/gen_kernel --aot --platform=out/DebugSIMRISCV64/vm_platform_strong.dill -o /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_lib_2_collection_queue_test/out.dill /b/s/w/ir/tests/lib_2/collection/queue_test.dart -Dtest_runner.configuration=dartkp-linux-debug-simriscv64 --packages=/b/s/w/ir/.packages -Ddart.vm.product=false

exit code:
0

--- Command "precompiler" (took 17.000378s):
DART_CONFIGURATION=DebugSIMRISCV64 out/DebugSIMRISCV64/gen_snapshot --snapshot-kind=app-aot-elf --elf=/b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_lib_2_collection_queue_test/out.aotsnapshot --loading-unit-manifest=/b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_lib_2_collection_queue_test/ignored.json -Dtest_runner.configuration=dartkp-linux-debug-simriscv64 --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_lib_2_collection_queue_test/out.dill

exit code:
0

--- Command "remove_kernel_file" (took 5ms):
DART_CONFIGURATION=DebugSIMRISCV64 rm /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_lib_2_collection_queue_test/out.dill

exit code:
0

--- Command "vm" (took 794ms):
DART_CONFIGURATION=DebugSIMRISCV64 out/DebugSIMRISCV64/dart_precompiled_runtime -Dtest_runner.configuration=dartkp-linux-debug-simriscv64 --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_lib_2_collection_queue_test/out.aotsnapshot

exit code:
-6

stderr:
===== CRASH =====
si_signo=Segmentation fault(11), si_code=128, si_addr=(nil)
version=2.17.0-edge.21a5f734562eb0d2fca0f9b5b646632fbcdb04cb (be) (Wed Feb 9 02:28:41 2022 +0000) on "linux_simriscv64"
pid=8030, thread=8034, isolate_group=main(0x557430909800), isolate=main(0x55743090a000)
isolate_instructions=7ffa7f219fd0, vm_instructions=7ffa7f216000
Stack dump aborted because GetAndValidateThreadStackBounds failed.

--- Re-run this test:
python3 tools/test.py -n dartkp-linux-debug-simriscv64 lib_2/collection/queue_test


/===============================================================================\
| standalone_2/out_of_memory_recovery_test broke (Pass -> Crash, expected Pass) |
\===============================================================================/

--- Command "vm_compile_to_kernel" (took 03.000142s):
DART_CONFIGURATION=DebugSIMRISCV64 /b/s/w/ir/pkg/vm/tool/gen_kernel --aot --platform=out/DebugSIMRISCV64/vm_platform_strong.dill -o /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_standalone_2_out_of_memory_recovery_test/out.dill /b/s/w/ir/tests/standalone_2/out_of_memory_recovery_test.dart -Dtest_runner.configuration=dartkp-linux-debug-simriscv64 --packages=/b/s/w/ir/.packages -Ddart.vm.product=false

exit code:
0

--- Command "precompiler" (took 13.000329s):
DART_CONFIGURATION=DebugSIMRISCV64 out/DebugSIMRISCV64/gen_snapshot --snapshot-kind=app-aot-elf --elf=/b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_standalone_2_out_of_memory_recovery_test/out.aotsnapshot --loading-unit-manifest=/b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_standalone_2_out_of_memory_recovery_test/ignored.json --old_gen_heap_size=20 -Dtest_runner.configuration=dartkp-linux-debug-simriscv64 --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_standalone_2_out_of_memory_recovery_test/out.dill

exit code:
0

--- Command "remove_kernel_file" (took 4ms):
DART_CONFIGURATION=DebugSIMRISCV64 rm /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_standalone_2_out_of_memory_recovery_test/out.dill

exit code:
0

--- Command "vm" (took 22.000538s):
DART_CONFIGURATION=DebugSIMRISCV64 out/DebugSIMRISCV64/dart_precompiled_runtime --old_gen_heap_size=20 -Dtest_runner.configuration=dartkp-linux-debug-simriscv64 --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/out/DebugSIMRISCV64/generated_compilations/dartkp-linux-debug-simriscv64/tests_standalone_2_out_of_memory_recovery_test/out.aotsnapshot

exit code:
-6

stdout:
>> [SendPort, 1]
<< [1, Okay]
>> [SendPort, 2]
<< [2, Failed: Out of Memory
#0      handleRequest (file:///b/s/w/ir/tests/standalone_2/out_of_memory_recovery_test.dart)
#1      handleMessage (file:///b/s/w/ir/tests/standalone_2/out_of_memory_recovery_test.dart)
#2      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart)
]
>> [SendPort, 3]
<< [3, Okay]
>> [SendPort, 4]

stderr:
Exhausted heap space, trying to allocate 32 bytes.
Exhausted heap space, trying to allocate 32 bytes.

===== CRASH =====
si_signo=Segmentation fault(11), si_code=128, si_addr=(nil)
version=2.17.0-edge.21a5f734562eb0d2fca0f9b5b646632fbcdb04cb (be) (Wed Feb 9 02:28:41 2022 +0000) on "linux_simriscv64"
pid=8660, thread=8664, isolate_group=main(0x56260e755800), isolate=main(0x56260e756000)
isolate_instructions=7f48bf406fd0, vm_instructions=7f48bf403000
Stack dump aborted because GetAndValidateThreadStackBounds failed.

--- Re-run this test:
python3 tools/test.py -n dartkp-linux-debug-simriscv64 standalone_2/out_of_memory_recovery_test

I initially tried bisecting these from the revisions for the last pass results in the history, just to narrow down where the failures began, but each test also crashes locally on my machine for its lass pass revision.

/cc @rmacnak-google

@sstrickl sstrickl added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening labels Feb 9, 2022
@sstrickl
Copy link
Contributor Author

sstrickl commented Feb 9, 2022

Looking more into it,

Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 22831.22835]
dart::Simulator::MemoryRead<long> (this=this@entry=0x55ec825b8500, addr=11400693657607131395, base=dart::T1) at ../../runtime/vm/simulator_riscv.cc:2540
2540	  return *reinterpret_cast<type*>(addr);

The address being odd is suspicious, so I tried adjusting Simulator::MemoryRead as follows:

$ git diff
diff --git a/runtime/vm/simulator_riscv.cc b/runtime/vm/simulator_riscv.cc
index 73cdb726d6f..602ef43549d 100644
--- a/runtime/vm/simulator_riscv.cc
+++ b/runtime/vm/simulator_riscv.cc
@@ -2528,6 +2528,10 @@ type Simulator::MemoryRead(uintx_t addr, Register base) {
       PrintStack();
       FATAL("Out-of-bounds stack access");
     }
+  } else if (!Utils::IsAligned(addr, sizeof(type))) {
+    PrintRegisters();
+    PrintStack();
+    FATAL("Unaligned memory access");
   } else {
     const uintx_t kPageSize = 16 * KB;
     if ((addr < kPageSize) || (addr + sizeof(type) >= ~kPageSize)) {

and got the resulting log:

zero:                0                    0   ft0: -nan
  ra:     7fdba6040b84      140581359848324   ft1: -nan
  sp:     555a44f21c58       93846192135256   ft2: -nan
  gp: 90db1abfd2d64ab8 -8008778101169960264   ft3: -nan
  tp: a4f6052dcf4973c9 -6560050112909184055   ft4: -nan
  t0:     7fdba5808041      140581351227457   ft5: -nan
  t1: 5ce325d4dd7f428e  6693235067382088334   ft6: -nan
  t2:     7fdba5808041      140581351227457   ft7: -nan
  fp:     555a44f21c60       93846192135264   fs0: -nan
 thr:     555a44c7ee00       93846189370880   fs1: -nan
  a0:     7fdba3efffc9      140581324980169   fa0: -nan
  a1:     7fdba4681c31      140581332851761   fa1: -nan
  a2: 6cdb64e70d7c0fdb  7843974119522308059   fa2: -nan
 tmp:                0                    0   fa3: -nan
tmp2:                4                    4   fa4: -nan
  pp: 5ce325d4dd7ef28e  6693235067382067854   fa5: -nan
  a6: 71fbebfba8ebfa1c  8213417811543587356   fa6: -nan
  a7: 6c4577d75f745fc0  7801773696392388544   fa7: -nan
  s2:              562                 1378   fs2: -nan
  s3:     7fdba5808041      140581351227457   fs3: -nan
  s4:     7fdba5808041      140581351227457   fs4: -nan
  s5:     7fdba5808041      140581351227457   fs5: -nan
  s6: 526cf6d869453905  5939393417934354693   fs6: -nan
  s7: ef30c7a0cd1f0d2c -1211248806312604372   fs7: -nan
  s8:  c001a255547590d   864719876101986573   fs8: -nan
  s9:     555a44ce4800       93846189787136   fs9: -nan
null:     7fdba5808041      140581351227457  fs10: -nan
mask:                6                    6  fs11: -nan
  t3:     7fdba5808041      140581351227457   ft8: -nan
  t4: 69c967087e3e5d30  7622737130476690736   ft9: -nan
  t5: fd9586ac61e8c9ce  -174084935648753202  ft10: -nan
  t6: 88d54476928e2821 -8586881838456362975  ft11: -nan
  pc:     7fdba6040bc8
  pc 0x00007fdba6040bc8 fp 0x0000555a44f21c60 sp 0x0000555a44f21c58 [Optimized] _DoubleLinkedQueueEntry@3220832._link@3220832
  pc 0x00007fdba6040b84 fp 0x0000555a44f21c90 sp 0x0000555a44f21c70 [Optimized] _DoubleLinkedQueueEntry@3220832._prepend@3220832
  pc 0x00007fdba60412ae fp 0x0000555a44f21cd8 sp 0x0000555a44f21ca0 [Optimized] new DoubleLinkedQueue.from
  pc 0x00007fdba60ad2b2 fp 0x0000555a44f21d40 sp 0x0000555a44f21ce8 [Optimized] QueueTest.testFromListToList
  pc 0x00007fdba60ad15c fp 0x0000555a44f21d88 sp 0x0000555a44f21d50 [Optimized] QueueTest.testMain
  pc 0x00007fdba60b04c0 fp 0x0000555a44f21da0 sp 0x0000555a44f21d98 [Optimized] DoubleLinkedQueueTest.testMain
  pc 0x00007fdba60b19c6 fp 0x0000555a44f21db8 sp 0x0000555a44f21db0 [Optimized] main
  pc 0x00007fdba60b1d8e fp 0x0000555a44f21dc8 sp 0x0000555a44f21dc8 [Optimized] main
  pc 0x00007fdba60b5c18 fp 0x0000555a44f21e28 sp 0x0000555a44f21dd8 [Optimized] _Closure@0150898.dyn:call
  pc 0x00007fdba603cd2a fp 0x0000555a44f21e50 sp 0x0000555a44f21e38 [Optimized] _delayEntrypointInvocation@1026248.<anonymous closure>
  pc 0x00007fdba60b5992 fp 0x0000555a44f21ec0 sp 0x0000555a44f21e60 [Optimized] _Closure@0150898.dyn:call
  pc 0x00007fdba603c5a8 fp 0x0000555a44f21ee8 sp 0x0000555a44f21ed0 [Optimized] _RawReceivePortImpl@1026248._handleMessage@1026248
  pc 0x00007fdba5fc1e18 fp 0x0000555a44f21fe8 sp 0x0000555a44f21ef8 [Stub] InvokeDartCode
../../runtime/vm/simulator_riscv.cc: 2534: error: Unaligned memory access
version=2.17.0-edge.3de72ec6b0714197002326d728cd37f327c77b06 (be) (Wed Feb 9 06:51:07 2022 +0000) on "linux_simriscv64"
pid=24287, thread=24291, isolate_group=main(0x555a44c4d000), isolate=main(0x555a44c4d800)
isolate_instructions=7fdba5fc3fd0, vm_instructions=7fdba5fc0000
Stack dump aborted because GetAndValidateThreadStackBounds failed.
Afbrudt (SIGABRT)

@sstrickl
Copy link
Contributor Author

sstrickl commented Feb 9, 2022

Adding offset from code payload start in stack traces gives:

  pc 0x00007f98519dabc8 (0x000000000000001c) fp 0x0000560563891c60 sp 0x0000560563891c58 [Optimized] _DoubleLinkedQueueEntry@3220832._link@3220832
  pc 0x00007f98519dab84 (0x00000000000000c4) fp 0x0000560563891c90 sp 0x0000560563891c70 [Optimized] _DoubleLinkedQueueEntry@3220832._prepend@3220832
  pc 0x00007f98519db2ae (0x000000000000016e) fp 0x0000560563891cd8 sp 0x0000560563891ca0 [Optimized] new DoubleLinkedQueue.from
  pc 0x00007f9851a472b2 (0x000000000000005e) fp 0x0000560563891d40 sp 0x0000560563891ce8 [Optimized] QueueTest.testFromListToList
  pc 0x00007f9851a4715c (0x0000000000000adc) fp 0x0000560563891d88 sp 0x0000560563891d50 [Optimized] QueueTest.testMain
  pc 0x00007f9851a4a4c0 (0x000000000000001c) fp 0x0000560563891da0 sp 0x0000560563891d98 [Optimized] DoubleLinkedQueueTest.testMain
  pc 0x00007f9851a4b9c6 (0x0000000000000022) fp 0x0000560563891db8 sp 0x0000560563891db0 [Optimized] main
  pc 0x00007f9851a4bd8e (0x0000000000000016) fp 0x0000560563891dc8 sp 0x0000560563891dc8 [Optimized] main
  pc 0x00007f9851a4fc18 (0x0000000000000248) fp 0x0000560563891e28 sp 0x0000560563891dd8 [Optimized] _Closure@0150898.dyn:call
  pc 0x00007f98519d6d2a (0x00000000000001b6) fp 0x0000560563891e50 sp 0x0000560563891e38 [Optimized] _delayEntrypointInvocation@1026248.<anonymous closure>
  pc 0x00007f9851a4f992 (0x000000000000027a) fp 0x0000560563891ec0 sp 0x0000560563891e60 [Optimized] _Closure@0150898.dyn:call
  pc 0x00007f98519d65a8 (0x00000000000000c4) fp 0x0000560563891ee8 sp 0x0000560563891ed0 [Optimized] _RawReceivePortImpl@1026248._handleMessage@1026248
  pc 0x00007f985195be18 (0x00000000000000b8) fp 0x0000560563891fe8 sp 0x0000560563891ef8 [Stub] InvokeDartCode

and disassembling _DoubleLinkedQueueEntry._link:

Code for optimized function 'dart:_internal_DoubleLinkedQueueEntry__link@10040228' (RegularFunction) {
        ;; B0
        ;; B1
        ;; Enter frame
0x0        1121               addi sp, sp, -24
0x2        e806               sd ra, 16(sp)
0x4        e422               sd fp, 8(sp)
0x6        0020               addi fp, sp, 8
        ;; ParallelMove a1 <- S+4
0x8        700c               ld a1, 32(fp)
        ;; v49 <- LoadField(v2 . :type_arguments {final}) T{TypeArguments}
0xa    0075b283               ld t0, 7(a1)
        ;; ParallelMove a0 <- S+2, t2 <- t0, t3 <- C, S-1 <- t0
0xe        6808               ld a0, 16(fp)
0x10        8396               mv t2, t0
0x12    fe543c23               sd t0, -8(fp)
0x16        8e6a               mv t3, null
        ;; AssertAssignable:4(v4 T{DoubleLinkedQueueEntry?}, v26, 'value', instantiator_type_args(v49), function_type_args(v0)) T{DoubleLinkedQueueEntry<X0>??}
        ;; AssertAssignable for compile-time type
        ;; Inlined [DoubleLinkedQueueEntry.set__nextLink@10040228]
0x18        6315               lui t1, 20480
0x1a        933e               add t1, t1, pp
0x1c    2d833303               ld t1, 728(t1)
        ;; TTSCall
        ;; Inlined [DoubleLinkedQueueEntry.set__nextLink@10040228]
0x20    00733983               ld s3, 7(t1)
0x24        6e95               lui t4, 20480
0x26        9ebe               add t4, t4, pp
0x28    330ebe83               ld t4, 816(t4)
0x2c        9982               jalr s3
...

@sstrickl
Copy link
Contributor Author

sstrickl commented Feb 9, 2022

So the problem here is that LoadWordFromPoolIndex for RISCV assumes an untagged PP, but from my current local run:

zero:                0                    0   ft0: -nan
  ra:     7ff1a9e05b84      140675913898884   ft1: -nan
  sp:     55db710ebc58       94400982989912   ft2: -nan
  gp: f905d426c7cc9544  -502762520356022972   ft3: -nan
  tp: 4f6226514a4cf699  5720176607294715545   ft4: -nan
  t0:     7ff1a9588041      140675904995393   ft5: -nan
  t1: 591d67602d6d33cf  6421402306476848079   ft6: -nan
  t2:     7ff1a9588041      140675904995393   ft7: -nan
  fp:     55db710ebc60       94400982989920   fs0: -nan
 thr:     55db70f2ce00       94400981159424   fs1: -nan
  a0:     7ff1a737ffc9      140675869310921   fa0: -nan
  a1:     7ff1a8981c31      140675892386865   fa1: -nan
  a2: 1cebecccdf2acf8f  2084019617250594703   fa2: -nan
 tmp:                0                    0   fa3: -nan
tmp2:                4                    4   fa4: -nan
  pp: 591d67602d6ce3cf  6421402306476827599   fa5: -nan
  a6: 658ef1b770b54979  7318052214695872889   fa6: -nan
  a7: 3157c3a31bec27ac  3555525536147842988   fa7: -nan
  s2:              562                 1378   fs2: -nan
  s3:     7ff1a9588041      140675904995393   fs3: -nan
  s4:     7ff1a9588041      140675904995393   fs4: -nan
  s5:     7ff1a9588041      140675904995393   fs5: -nan
  s6: a18f65c7f8775111 -6805108602392063727   fs6: -nan
  s7: 10bc7f7954f78367  1205978959321727847   fs7: -nan
  s8: 8503fd0b23eab615 -8861961417445951979   fs8: -nan
  s9:     55db70f72800       94400981444608   fs9: -nan
null:     7ff1a9588041      140675904995393  fs10: -nan
mask:                6                    6  fs11: -nan
  t3:     7ff1a9588041      140675904995393   ft8: -nan
  t4: e1b7e547853a26b1 -2182023399097096527   ft9: -nan
  t5: bb02272c0f3a62c9 -4971367968476077367  ft10: -nan
  t6: d577a3da2f3d1a26 -3064800863911601626  ft11: -nan
  pc:     7ff1a9e05bc8

Using rr, I see that the current value of xregs_[PP] was set to a random value during Simulator::ClobberVolatileRegisters, which I assume is done in debug mode to ensure there's no invalid assumptions about register liveness. So PP is being clobbered by this, which means there's some runtime call that's happening between the last assignment to the PP register and its use at offset 0x1a above.

This explains why sometimes it's an unaligned memory address, sometimes a bad memory address, and sometimes just a segfault without any error reporting.

@sstrickl
Copy link
Contributor Author

sstrickl commented Feb 9, 2022

... and apparently if we get really unlucky, it runs without issue because it happens to pick a random integer to memory that decodes to instructions which don't cause an issue 😅

Adding some code that forces the random PP value to be odd (and thus look tagged, which breaks the assumption by the assembler and will ensure that any RISCV code loading an object from the object pool will fail).

copybara-service bot pushed a commit that referenced this issue Feb 14, 2022
…n RV.

In the JIT, this PP is saved and restored in stub's Dart frame. In AOT, Dart frames do not save PP because it is a global register within Dart code. On other architectures, PP is a preserved register in the C ABI.

TEST=ci
Bug: #48333
Bug: #48354
Change-Id: I1b6702805a6fb556a1695197e40a89c364af3f8f
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232520
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening
Projects
None yet
Development

No branches or pull requests

2 participants