Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI callbacks aren't profiler-safe #52814

Closed
mkustermann opened this issue Jun 29, 2023 · 2 comments
Closed

FFI callbacks aren't profiler-safe #52814

mkustermann opened this issue Jun 29, 2023 · 2 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. library-ffi P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team

Comments

@mkustermann
Copy link
Member

Our current (sync or async) ffi callbacks aren't profiler-safe: When enabling the profiler and it samples in the middle of a ffi callback it may crash the program.

A simple example how to provoke this:

import 'dart:ffi';

final lib = DynamicLibrary.open('out/ReleaseX64/libffi_test_functions.so');
final invokeCallback = lib.lookupFunction<IntPtr Function(Pointer<Void>),
    int Function(Pointer<Void>)>('TestSimpleAddition');

int add(int a, int b) => a + b;

main() {
  while (true) {
    invokeCallback(Pointer.fromFunction<Int Function(Int, Int)>(add, 0).cast());
  }
}

and a small change to the C code:

diff --git a/runtime/bin/ffi_test/ffi_test_functions.cc b/runtime/bin/ffi_test/ffi_test_functions.cc
index 3b78b34114a..f6ecfe4314c 100644
--- a/runtime/bin/ffi_test/ffi_test_functions.cc
+++ b/runtime/bin/ffi_test/ffi_test_functions.cc
@@ -823,8 +823,9 @@ DART_EXPORT void CallbackWithStruct(void (*f)(Struct8BytesNestedIntCopy)) {
 
 // Sanity test.
 DART_EXPORT intptr_t TestSimpleAddition(intptr_t (*add)(int, int)) {
+  __asm__ ("xorq %rbp, %rbp\n");
   const intptr_t result = add(10, 20);
-  std::cout << "result " << result << "\n";
+  //std::cout << "result " << result << "\n";
   CHECK_EQ(result, 30);
   return 0;
 }

Running this without the profiler works just fine

% out/ReleaseX64/dart ffi_callback.dart

Running this with the profiler will crash pretty quickly

% out/ReleaseX64/dart --profiler ffi_callback.dart
===== CRASH =====
si_signo=Segmentation fault(11), si_code=1, si_addr=0x8
version=3.1.0-edge.2f85e7a14980703269c3235d388728bd3b1e9f15 (be) (Thu Jun 29 10:24:14 2023 +0200) on "linux_x64"
pid=912563, thread=912573, isolate_group=main(0x56247ccb47c0), isolate=main(0x56247ccb7020)
os=linux, arch=x64, comp=no, sim=no
isolate_instructions=56247bcee260, vm_instructions=56247bcee260
fp=7f64f4e7d3a0, sp=7f64f4e7d380, pc=56247bf1cb26
  pc 0x000056247bf1cb26 fp 0x00007f64f4e7d3a0 out/ReleaseX64/dart+0x237db26
  pc 0x000056247bf1cfe4 fp 0x00007f64f4e7d4e0 dart::Profiler::SampleThread(dart::Thread*, dart::InterruptedThreadState const&)+0x3b4
  pc 0x000056247bf93ff4 fp 0x00007f64f4e7d630 out/ReleaseX64/dart+0x23f4ff4
  pc 0x00007f651b3dcf90 fp 0x00007f64f4e7e380 /lib/x86_64-linux-gnu/libc.so.6+0x3bf90
  pc 0x0000000000000000 fp 0x00007f64f4e7e3a0 Unknown symbol
-- End of DumpStackTrace

The root cause is that there's a window where we have transitioned from native to generated code, we have set the VMTag::kDartTagId (at which point the profiler will attempt to sample the dart stack - assuming it's executing dart code) - but it will not identify any entry frame.

See il_x64.cc:NativeEntryInstr::EmitNativeCode:


void NativeEntryInstr::EmitNativeCode(FlowGraphCompiler* compiler) {
  ...

  // Here we'll set thread->set_vm_tag(VMTag::kDartTagId)` and clear out exit frame
  // After doing so we can get profiler ticks.
  __ TransitionNativeToGenerated(...);

  <... XXX ... >

  // This is a fake return address for the profiler. We'll set up yet-another-frame, at which point
  // the profiler will recognize this one here as an entry frame.
  __ movq(RAX, :Address(THR, compiler::target::Thread::invoke_dart_code_stub_offset()));
  FunctionEntryInstr::EmitNativeCode(compiler);
}

Any profiler tick into <... XXX ...> can lead to a crash of the program - as it will blindly continue unwinding using PC/FP across C++ and Dart (instead of only walking dart frames recognizing entry-frame/exit-frame correctly)

The native code that invokes the callback doesn't have to have frame pointers enabled. This is simulated above by using inline assembly to clear out rbp.

/cc @dcharkes @liamappelbe

@mkustermann mkustermann added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. library-ffi labels Jun 29, 2023
@a-siva a-siva added P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team labels Nov 30, 2023
@a-siva
Copy link
Contributor

a-siva commented Nov 30, 2023

//cc @liamappelbe are you looking into this issue?

@liamappelbe
Copy link
Contributor

Not at the moment. This is a very long standing issue with FFI callbacks. It's not specific to async callbacks, it predates them.

copybara-service bot pushed a commit that referenced this issue May 28, 2024
Delay changing Thread::vm_tag on callback entry and restore the tag early on callback return so that the profiler doesn't see the "running Dart" tag unless it can also see the fake return address marking the entry frame.

TEST=ffi/async_void_function_callbacks, ffi/function_callbacks_subtype, ffi/function_callbacks, ffi/isolate_local_function_callbacks
Bug: #52814
Change-Id: I40d80ec7c44063d078db0e211565e2d127c6b81e
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367460
Reviewed-by: Daco Harkes <dacoharkes@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. library-ffi P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team
Projects
None yet
Development

No branches or pull requests

4 participants