-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffi: Support memory allocations in the current stack frame #42509
Comments
One of the C libraries I'm considering using require that you pass a Pointer when calling certain functions that create C objects and as far as I know, the only way to get such a pointer is to allocate it on the heap, where C could just allocate it on the stack. Not having to free it is also a nice bonus. |
cc @dcharkes I took a look at whether I could this in a local branch but I realized that suspending functions ( Maybe in & out params could work for the most common cases where stack allocation is desired. Like: typedef QueryInterface_Native = Int32 Function(Pointer<Void> self, Pointer<Guid> riid, Out<Pointer<Void>> object);
typedef QueryInterface_Dart = (int result, Pointer<Void> object) Function(Pointer<Void> self, Pointer<Guid> riid); Two new FFI marker types |
Good observation. We can force-optimize compile in JIT, but even then discovering new classes in the class hierarchy can lead to OSR I believe.
We have logic for this for struct-by-value returns that have the ABI where a pointer to memory is passed in. This allocates on the stack inside the I'd probably only want to commit to adding a feature such as this if using the We should already be able to get some benefit by converting the current package:ffi stuff to leaf-calls dart-lang/native#922. Are you currently running into performance issues? |
Not hitting any performance issues yet (though I'm using a custom zone-based allocator). It would be great to avoid depending on an allocator for these simple cases since they're quite frequent. Practically every C API uses this pattern to pass back pointers since they reserve the return value for error codes, etc. |
Experimented with adding It's pretty hacky. I ended up introducing two IL instructions to launder a value through the stack.
Ideally support for explicit stack IR instructions would make this easy. Could create a fixed-size stack resource, then use @pragma('vm:testing:print-flow-graph')
@pragma('vm:never-inline')
int bar(int iUnknownPtr) {
Pointer<Pointer<Pointer<NativeFunction<Void Function(IntPtr, Out<IntPtr> pp)>>>> ptr =
Pointer.fromAddress(iUnknownPtr);
return ptr.value.value.asFunction<int Function(int, ())>(isLeaf: true)(0,
());
}
|
Interesting!
Yeah that's a lot of technical complexity added. But before we commit to added technical complexity, we should asses the performance and API benefits. We should set up some benchmarks that would benefit from having this. And benchmark against alternatives. (And assess the API differences for alternatives.) One performance oriented alternative I can think about to reduce allocations is to introduce a new import 'package:ffi/ffi.dart';
void main() {
final function = library.lookupFunction<...>('load_some_string');
final namePtr = bump<Pointer<Uint8>>();
final lengthPtr = bump<Int>();
function(0, namePtr, lengthPtr);
print(utf8.decode(namePtr.asTypedList(lenghtPtr.value)));
bump.free(namePtr);
bump.free(lengthPtr);
} (The internal book-keeping of the bump allocator needs to save size for each pointer, and keep some efficient data structure of which pointers have not been freed yet.) For an API, maybe we also want to have some arena-style API so that you can free multiple pointers at the same time. Or a completely different approach if performance is not of a concern but only API, is to have a CFE transform that basically uses the arena under the hood (the only issue is that the Arena allocator lives in |
Understandable, I primarily wanted to do some experimentation on my own. Once
Looks like there's three approaches here:
Initializing the record inside of Multiple IL outputs would be similar to the record approach, but without the need to allocate one up front. But its a very very heavy lift since everything assumes there's only a single output from each definition instruction. Stack allocation requires adding a new representation and extending the register allocator to support allocating multiple contiguous stack slots. Stack allocation vs multiple outputs would most likely generate the same code. |
The TypedData that's allocated to back the struct is already bump-pointer allocated in the Dart heap. I don't believe stack allocating is going to be much faster. (The objects should not outlive the new space, so the only effect is marginally more often new space collection.) We should definitely include this pattern in benchmarks. Even more so, even if we have APIs with pointers, as long as the FFI calls are marked leaf-calls, we can used TypedDatas. So that approach would work for non-struct arguments.
P.S. I really appreciate having an extra set of eyes on the FFI internals! ❤️ |
I think there should be a way to allocate memory on the stack with
dart:ffi
, for instance with a builtin function likealloca
.As a motivation for this, let's say we had a function
int load_some_string(int index, char **nameOut, int *lengthOut)
. When calling that in C, we'd might doObviously that doesn't work in Dart, so we have to use
I think this is rather annoying because
free
and leak memorymalloc
/free
. Obviously we can re-use those pointers but that also amplifies the other problems.This would be much simpler if we could allocate memory directly on the stack:
The text was updated successfully, but these errors were encountered: