Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm/ffi] Support FFI callbacks in AOT configurations #37295

Closed
sjindel-google opened this issue Jun 18, 2019 · 8 comments
Closed

[vm/ffi] Support FFI callbacks in AOT configurations #37295

sjindel-google opened this issue Jun 18, 2019 · 8 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi
Milestone

Comments

@sjindel-google
Copy link
Contributor

No description provided.

@sjindel-google sjindel-google added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi labels Jun 18, 2019
@sjindel-google sjindel-google self-assigned this Jul 8, 2019
@sjindel-google
Copy link
Contributor Author

sjindel-google commented Jul 22, 2019

API Changes

Currently the function argument to Pointer.fromFunction is required to be a static tearoff (constant), but no restriction is placed on the exceptionalReturn parameter.
We will need to force the exceptional return value to be a constant as well, since Pointer.fromFunction needs to return the same function pointer for each callsite.

Since we don't have a const constructor for Pointers, we will require that the exceptionalReturn parameter be omitted when the native function returns a Pointer, and we will return NULL (0) instead.

Implementation

Precompiling callback trampolines

In the JIT, we allocate "callback ids" and setup associated thead-local state when compiling a callback (during Pointer.fromFunction).
The callback ids enable the generated code to find its Code object inside the Thread upon re-entry from native code.
The callback ids need to be baked into the generated code, so they must be allocated at compile-time.
However, Thread::ffi_callback_code_ needs to be populated at runtime.
This may be done lazily when fromFunction is invoked; therefore, we suggest splitting up fromFunction into calls to two internal methods, one of which is elaborated to IL at compile-time (like _asFunctionInternal) and the other which is dispatched to the runtime:

Object _nativeCallbackFunction<NS extends Function>(Function target, Object exceptionalReturn) native "...";

Pointer<NS> _pointerFromFunction<NS extends NativeFunction>(Object function) native "...";

// Compile-time transformation (after static checks):
fromFunction<NS>(f, e) =>
    _pointerFromFunction<NativeFunction<NS>>(
        _nativeCallbackFunction<NS>(f, e));

_nativeCallbackFunction

In AOT, we will identify calls to this method in the flow-graph builder and create the needed Function object immediately.
The call will then be replaced with a constant instruction in IL:

ta <- Constant(<type arg>);
fn <- Constant(#function);
NativeCall(_pointerFromFunction, ta, fn);

We do not allow storing native callbacks into JIT snapshots, so we cannot use the AOT approach for handling _nativeCallbackFunction which stores the callback's Function in the object pool of the callsite to fromFunction.
Instead, we will still dispatch to the runtime to compile the callback trampoline, ensuring that Thread::ffi_callback_code_ has the only reference to the compiled code.

_pointerFromFunction

This function will be implemented as a runtime entry in both AOT and JIT modes.
It is responsible for updating Thread::ffi_callback_code_ to hold the incoming Function's Code object and minting the Pointer which holds the Function's entrypoint (which is only known at runtime after the instructions image is loaded).

Finding the Thread

Runtime entries are typically loaded from a Thread object which is threaded through all generated code via a designated register, and Dart_Invoke passes the thread object directly to generated code.
However, native function pointers have no designated "current thread" argument, so we must retrieve the thread object from TLS upon entry to the native callback.

In JIT mode, we embed a pointer to the symbol DLRT_GetThreadForNativeCallback directly in the generated code during compilation.
Since the symbol's location is only determined when the VM is loaded, this relies on the fact that we do not save native callbacks into JIT snapshots.

However, in AOT mode it is essential to save native callbacks in AOT snapshots.
The only information a native callback receives upon entry is its program counter, so we need to save the address of DLRT_GetThreadForNativeCallback at a fixed offset from the callback trampoline's entry-point, similarly to a relocation against the data (or BSS) section in native code.
Unfortunately, our snapshot formats have very different levels of abstraction, and only one involves the native linker.
We therefore need to re-invent cross-segment relocations for callback trampolines.

Relocation

We presently support a kind of relocation for bare instructions.
These relocations only refer to offsets within the instructions image (not between the VM and Isolate images).
Moreover, these relocations have bounds on the size of the inserted offset in order to patch call/branch instructions and insert trampolines where the offset is too large to the patch the instruction in-place.
These relocations are patched before the snapshot is serialized in a pass over the code objects to be written.

For BSS-relative relocations, we cannot use trampolines to bridge large offsets (we are not patching calls and the BSS segment is not executable), we need to insert relocations relative to other segments, and we aren't constrained by need to patch instructions in-place (we can insert the offset as entire word in the instruction stream and skip over it).
Therefore we will add a new mechanism for inter-segment relocations which is processed during snapshot serialization.

We will add a field RawCode::bss_relocation_offset_ holding an offset into the associated Instructions where a relocation to the start of the BSS segment should be inserted (as a target machine word) during serialization.
Since our snapshot formats incorporate different abstractions, we need to process this field differently depending on the format.

ELF

We will insert a BSS segment of a fixed size (one page minimum) during snapshot serialization after the text segment, so that the offset (backward) to the BSS segment is known at every point during the instructions serialization.
For consistency with blobs (see below), we will attach the BSS section after the image, so we will patch the BSS relocations in a second pass after serializing the instructions image.

Assembly

In the Assembly snapshots we cannot fix the size or offset of the BSS segment, and therefore cannot patch the relocations as we serialize the snapshot.
Instead we have to force the assembler to create the relocation instead.

All instructions are serialized with the .quad assembler directive because we do not rely on the native assembler.
However, we can express the relocation as an assembly expression:

;; Definition of the BSS section:
.globl _kDartBssSection
.zerofill __DATA,__common,_kDartBssSection,1000,8

;; ...

.Lbssreloc: .quad _kDartBssSection - .Lbssreloc

Blobs

Blobs are the most restrictive format since we do not have access to a native linker or loader.
To ensure that the BSS section is loaded at the appropriate position, we will need to extend the API for the blobs format:

DART_EXPORT DART_WARN_UNUSED_RESULT
intptr_t Dart_BssAlignmentForImage(uint8_t* instructions_buffer, intptr_t instructions_size);

Since we need one BSS segment for each text segment, the size of the BSS segment will be stored in the corresponding text segment's header.
The embedder will be responsible for creating an appropriately sized and zero-filled mapping following the text.
The placement of the BSS segment after the instructions leverages the default behavior of mmap when a size larger than the file size is passed. Unfortuantely creating adjacent mappings with VirtualAlloc on Windows is not as easy.

Loading

To simplify loading, we will store the offset from the start of the text segment to the bss segment in the HeapPage header (which has alignment already used for other hidden fields).

After loading a snapshot we will use the offset to paste the resolved address of DLRT_GetThreadForNativeCallback into the BSS segment.

@sjindel-google
Copy link
Contributor Author

/cc @mkustermann @rmacnak-google @alexmarkov

Martin: I think this reflects the design we discussed.

@mkustermann
Copy link
Member

Since the Dart callbacks which C code invokes have a lifetime until the end of the isolate (we don't know how long C code hangs on to them, so the GC cannot know it either) we want to have a bounded number of those (i.e. we restrict ourselves to static methods). We also enforce the error value to be a compile-time constant.

Precompiling callback trampolines: Since we restrict ourselves to static methods it seems weird if we have to call _pointerFromFunction every time we're interested in having this Pointer (e.g. in a loop). This pointer object is for all practical purposes a constant/singleton. Couldn't we re-write the call-site in the kernel transformer to simply read the value of a static global field (of type Pointer<NFT>), where the field would have an initializer expression which calls into the runtime to return a pointer with the right address?

Finding the Thread & Relocation: This aligns with my original suggestion, i.e. use pc-relative loads to get the address of the c function which, when invoked, returns the current thread. So far there seem to be no blockers on this. It also might allow us to use the data/bss section for other purposes in the future.

Blobs: Preferably we would change the embedder API to give us the filename (and/or buffer) of the blob and the VM can map the text and data/bss section appropriately. We can add a small header to the blobs format which describes the relative offsets to which the data/bss sections have to be mapped. This would simplify our embedders and gives us more flexibility

@sjindel-google
Copy link
Contributor Author

Precompiling callback trampolines: Since we restrict ourselves to static methods it seems weird if we have to call _pointerFromFunction every time we're interested in having this Pointer (e.g. in a loop). This pointer object is for all practical purposes a constant/singleton. Couldn't we re-write the call-site in the kernel transformer to simply read the value of a static global field (of type Pointer<NFT>), where the field would have an initializer expression which calls into the runtime to return a pointer with the right address?

Absolutely, I'll do that.

Blobs: Preferably we would change the embedder API to give us the filename (and/or buffer) of the blob and the VM can map the text and data/bss section appropriately. We can add a small header to the blobs format which describes the relative offsets to which the data/bss sections have to be mapped. This would simplify our embedders and gives us more flexibility

I agree with the idea of consolidating the blobs format from 4 files into 1, and adding a header so the VM can create the mappings as appropriate.

dart-bot pushed a commit that referenced this issue Sep 3, 2019
…cl. blobs).

To do this, we add writable data sections (currently uninitialzed) to ELF and Asm snapshots
and allow Instructions to have patchable relocations against (the start of) these sections.

Issue #37295 (see also for design & discussion).

Change-Id: If20bfa55776f4044aaa6bb8ea2101d2ada41842c
Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-android-release-arm-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/110221
Commit-Queue: Samir Jindel <sjindel@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
dart-bot pushed a commit that referenced this issue Sep 3, 2019
…hots (excl. blobs)."

This reverts commit ecbea5a.

Reason for revert: broken with bare instructions and ABI bot

Original change's description:
> [vm/ffi] Implement FFI callbacks on AOT for ELF and Asm snapshots (excl. blobs).
> 
> To do this, we add writable data sections (currently uninitialzed) to ELF and Asm snapshots
> and allow Instructions to have patchable relocations against (the start of) these sections.
> 
> Issue #37295 (see also for design & discussion).
> 
> Change-Id: If20bfa55776f4044aaa6bb8ea2101d2ada41842c
> Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-android-release-arm-try
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/110221
> Commit-Queue: Samir Jindel <sjindel@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

TBR=kustermann@google.com,rmacnak@google.com,alexmarkov@google.com,sjindel@google.com

Change-Id: I9787da6d42575ca4f5ae0a698052a19ac4275afd
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-debug-x64-try, vm-kernel-precomp-linux-product-x64-try, vm-kernel-precomp-linux-release-x64-try, vm-kernel-precomp-android-release-arm-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/115240
Reviewed-by: Samir Jindel <sjindel@google.com>
Commit-Queue: Samir Jindel <sjindel@google.com>
@dcharkes dcharkes added this to the D26 Release milestone Sep 30, 2019
@mkustermann
Copy link
Member

@sjindel-google We can close this issue, right?

@sjindel-google
Copy link
Contributor Author

I was leaving it open as a marker for deprecating blobs, but as we discussed that should happen asynchronously.

@OneeMe
Copy link

OneeMe commented Oct 8, 2019

@sjindel-google It seems like the code related this issue has been reverted. I wonder will it still can be used in next dev release?

@sjindel-google
Copy link
Contributor Author

@ForelaxX Yes, code has been re-landed and it's ready to use.

tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
…cl. blobs).

To do this, we add writable data sections (currently uninitialzed) to ELF and Asm snapshots
and allow Instructions to have patchable relocations against (the start of) these sections.

Issue dart-lang#37295 (see also for design & discussion).

Change-Id: If20bfa55776f4044aaa6bb8ea2101d2ada41842c
Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-android-release-arm-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/110221
Commit-Queue: Samir Jindel <sjindel@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
…hots (excl. blobs)."

This reverts commit ecbea5a.

Reason for revert: broken with bare instructions and ABI bot

Original change's description:
> [vm/ffi] Implement FFI callbacks on AOT for ELF and Asm snapshots (excl. blobs).
> 
> To do this, we add writable data sections (currently uninitialzed) to ELF and Asm snapshots
> and allow Instructions to have patchable relocations against (the start of) these sections.
> 
> Issue dart-lang#37295 (see also for design & discussion).
> 
> Change-Id: If20bfa55776f4044aaa6bb8ea2101d2ada41842c
> Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-android-release-arm-try
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/110221
> Commit-Queue: Samir Jindel <sjindel@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

TBR=kustermann@google.com,rmacnak@google.com,alexmarkov@google.com,sjindel@google.com

Change-Id: I9787da6d42575ca4f5ae0a698052a19ac4275afd
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-debug-x64-try, vm-kernel-precomp-linux-product-x64-try, vm-kernel-precomp-linux-release-x64-try, vm-kernel-precomp-android-release-arm-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/115240
Reviewed-by: Samir Jindel <sjindel@google.com>
Commit-Queue: Samir Jindel <sjindel@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi
Projects
None yet
Development

No branches or pull requests

4 participants