Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2: Pure dart, single file bindings #773

Closed
mahesh-hegde opened this issue Sep 21, 2022 · 12 comments
Closed

v0.2: Pure dart, single file bindings #773

mahesh-hegde opened this issue Sep 21, 2022 · 12 comments
Assignees

Comments

@mahesh-hegde
Copy link
Contributor

mahesh-hegde commented Sep 21, 2022

The big idea

The current flow is

Dart: lookup C function corresponding to Java fn and call it.
C: lookup class & method, cache it, call it with args, check exception, return result if no exception

However, as we can see in package:jni, it's indeed possible to call a method reflectively without generating extra C code for it.

So how about approaching the problem from other angle, that's auto generating these package:jni calls?

class J extends JniObject {
    JMethodID _myMethodID = getMethodID("myMethod", "(...)I");
    int myMethod(...) => callMethod(_myMethodID, jvalues(args));
}

It's possible, but would not be clean (or efficient for that matter). JniObject's methods cache the dynamic class of the object, which is most specific subclass, per instance. That may mean the method ID is looked up every time it's called on a new object. JniObject is designed with more focus on debugging / one-off use, and boundary conditions where a class is not available as a typed binding.

Moreover package:jni reflective calls like callStaticMethod are quite inefficient & roundabout due to type casting, and allocate using calloc a few times (eg: for the JValue array passed as arguments).

But we can get one level below this, because we know the type of every argument and return type.

Now assume there are some hypothetical functions eg: callMethod which directly call a C implementation. For simplicity, assume the last arguments are variadic, even though dart doesn't support variadics currently.

class J extends JniObject {
    static class _classRef = env.loadClass("");

    static _myMethod = env.getMethodID("myMethod", "<signature>");
    int myMethod(args) => env.callMethod(_classRef, _myMethod, JniType.intType, arg1, arg2);
}

This avoids the array allocation issue. Now callMethod is a variadic function in C which does most bookkeeping like keeping JNIEnv and checking exceptions.

In current dart, we can't directly use a variadic function, so we can add another line something like the following, so we can cast the function pointer to required arity before use.

// env.ref.callMethod is direct pointer to function
static _fn_myMethod = env.ref.callMethod.cast<int Function(argTypes ....)>();

Exceptions

There's an issue regarding JNI exceptions being caught probably by flutter embedding. #772 .

So instead of direct int or float return types, introducing a JniResult wrapper which contains both result and error and checking in dart layer can fix this. It's a minor issue which can be resolved during the implementation of this idea.

Other advantages

  • We can avoid the entire C library building thing for generated code. End user can just use it like ffigen in flutter projects, with a dependency on package:jni
  • Deprecate C+Dart hybrid bindings, to reduce maintenance burden.
  • With a different naming and resolution scheme, by avoiding qualified names and renaming with numbers, we can generate all bindings in single file. (It seems practical to generate bindings classwise than packagewise, since most java libraries are just too heavy).

Dilemas

There are 2 APIs in package:jni already. Low level GlobalJniEnv (A) and high level JniObject (B).

introducing a private API (C) for use by generated code may be more maintenance burden, even if it's few methods.

Maybe we should use (C) in (B) indirectly. That will be smaller refactor in mostly one file, and the interface will remain the same.

CC: @dcharkes @liamappelbe

@mahesh-hegde mahesh-hegde self-assigned this Sep 21, 2022
@dcharkes
Copy link
Collaborator

I believe I see 3 separate subprojects in this big idea.

It might be worth making three separate GitHub issues out of this rather than 1 if they can be implemented independently.

1. No generating C code from jnigen

VarArgs

This avoids the array allocation issue. Now callMethod is a variadic function in C which does most bookkeeping like keeping JNIEnv and checking exceptions.

Note that var-args do not support all types, for example float and int8_t. So we should double check that we support all types.

How would you cover different return types? That is not covered by variadic arguments. E.g. a Java object, a double or an int.

Tracking issue for varargs support in Dart:

API levels

There are 2 APIs in package:jni already. Low level GlobalJniEnv (A) and high level JniObject (B).

introducing a private API (C) for use by generated code may be more maintenance burden, even if it's few methods.

Maybe we should use (C) in (B) indirectly. That will be smaller refactor in mostly one file, and the interface will remain the same.

Maybe we should move the low-level ones that are targeted by generated code but should not be used to package:jni/low_level.dart (which then exports the src/ files).

Threading

That may mean the method ID is looked up every time it's called on a new object.

Can we cache the method ID in Dart or C, as long as we ensure we always attach the thread to the JVM? Or do we get a different method ID back if we do it from a different thread?

Pros & Cons

If I understand your design correctly.

Pros:

  • No more C code generated on jnigen
    • Simpler/shorter native build for a full Flutter app
    • No native build for a Dart standalone app (after running the jni setup once).
    • Enables developers using hot-reload (instead of rebuilding a Flutter app)

Cons:

  • Can we get the same performance?
  • Blocked on varargs (guaranteed lower performance without varargs).

2. Generate a single Dart file instead of many

Pros:

  • Only a single generated file with diffs when updating bindings.

Cons:

  • Generated file might be huge and less easy to understand for the user than a generated folder.

Any other pros and cons? I'm not sure if people would want what we currently have or whether a single generated file would make more sense.

3. Exceptions

Exceptions

Yes, wrapping in a result object/struct is a good approach, or adding an void** or error** output parameter that contains nullptr if no error was emitted.

Should exception support be a separate GitHub issue?

@dcharkes dcharkes changed the title Big Idea(tm): Pure dart, single file bindings v0.2: Pure dart, single file bindings Sep 21, 2022
@mahesh-hegde
Copy link
Contributor Author

performance

With varargs, there's 2-3 allocations for class and method lookups by name, which is initial cost.

Without varargs, one more calloc per call. If you assume java code will be doing some heavy lifting, this is negligible overhead.

On the other hand, it's in dart so can be tree shaken. We gain in app size some of what we lose in speed.

3 separate issues.

Possibly. But these can be better solved in one shot IMO. There's already #772 for exceptions. I just mentioned it here.

@dcharkes
Copy link
Collaborator

Without varargs, one more calloc per call. If you assume java code will be doing some heavy lifting, this is negligible overhead.

Agreed, even with short Java calls, I don't expect JNI to be that cheap. We should get around to do some measuring.

  1. Generate a single Dart file instead of many

I'm not fully bought in to this yet. If the Java package has a reasonable structure, why wouldn't the same structure be useful for the Dart bindings?

@mahesh-hegde
Copy link
Contributor Author

mahesh-hegde commented Sep 21, 2022

I'm not fully bought in to this yet. If the Java package has a reasonable structure, why wouldn't the same structure be useful for the Dart bindings?

I also thought the same.

But I think single file output does match with FFIGEN behavior, and most of the time java classes names are unique (because they are imported unqualified). The argument against it is regarding too big jars with several packages. But I think in practice they are too big anyway, and a user will probably want few classes.

So there are all these approaches:

  • File per package (current)

  • File per class, with an exporter file per package probably

  • All classes in single file

Tradeoffs:

  • as library becomes large, the file can become 20k - 40k lines. How will dart analyzer and other tools perform with it? I am not sure all tools scale linearly with either 1) size of input files 2) number of input files.

  • Organizational overhead: if I am binding n classes, n ≤ 100 with almost all class names unique, why would I want to deal with package structure?

  • Interoperating with types from other jnigen packages. : If 2 jnigen generated files define same type X, is a.X same as b.X? How do we specify that to a? I am thinking ffigen's import files mechanism can be reused here. But not sure how that plays out if 2 jnigen generated pacakge want to depend on a common library.

I think common use-case is binding very few classes.

Java library API surfaces seem gigantic to me in comparison to C libraries which are one header or 2. So I think more common use case will be binding very few classes than all of them, probably even writing some bridge which provides minimal required glue, than wrapping entire JARs.

@mahesh-hegde
Copy link
Contributor Author

mahesh-hegde commented Sep 21, 2022

Micro-performance considerations in detail:

  1. Pure dart approach avoids the dll symbol lookup.

  2. For class lookup and each method lookup, 1-2 strings need to be calloc'd.

  3. If a method can't be called via varargs function, the arguments have to be allocated through a buffer, which is another calloc.

Return type can be struct { jvalue, error_ptr}. I think that doesn't require a heap allocation on dart side, right? (Jvalue is a union, of which we know which member to access).

@dcharkes
Copy link
Collaborator

Organizational overhead: if I am binding n classes, n ≤ 100 with almost all class names unique, why would I want to deal with package structure?

That is a good argument. We could also address that by generating a file that exports all the individual classes.

Java library API surfaces seem gigantic to me in comparison to C libraries which are one header or 2.

Well, those can also be gigantic, leading to users specifying a filter for the symbols they want.

However, this would maybe be different if you're actually wanting to generate bindings for a reusable package that users will use a subset of. For example package:win32 is huge. (It does not use ffigen, but its own generator based on the Microsoft meta-data. But we can take inspiration from it on how to structure a huge package that uses many ffi bindings.)

Return type can be struct { jvalue, error_ptr}.

That does allocate a heap value. The struct info is backed by a typed-data. Only if you return Pointer and do .ref is it allocated in C memory. But then C allocates it and it needs to be freed.

Passing a single error pointer pointer might lead to more idiomatic code:

using((arena) {
  final errorOut = arena<Pointer<Void>>()..value = nullptr;
  final result1 = ffiCall1(errorOut, ...);
  if(errorOut.value != nullptr){
    // throw ...
  }
  final result2 = ffiCall2(errorOut, result1, ...)
  // ...

vs

using((arena) {
  final result1 = ffiCall1(errorOut, ...); // what is the type of result1? and how do you type `result1.value`?
  if(result1.error != nullptr){
    // throw ...
  }
  final result2 = ffiCall2(errorOut, result1.value as int, ...)
  // ...

The first one is more often how one would write C code.

@mahesh-hegde
Copy link
Contributor Author

mahesh-hegde commented Sep 21, 2022 via email

@dcharkes
Copy link
Collaborator

@timsneath How have you structured your bindings in package:win32, do the bindings follow the file structure they have in the windows API? I presume multiple files? How did you make that design decision?
(We are trying to figure out if we prefer package:jnigen to generate the bindings in a file/folder structure in Dart files that mirrors the Java files, or if we'd like to collapse them all into a single file.)

@mahesh-hegde
Copy link
Contributor Author

if you're actually wanting to generate bindings for a reusable package that users will use a subset of.

At the risk of configuration explosion, we can provide a config option

mode: multi_file

which will replicate the java package structure into a folder, then export all classes from each package. let's say package a.b has [X, Y] classes.

a
|__ b
   |__ X.dart // or class_x.dart?
   |__ Y.dart
   |__ b.dart // exports both X.dart and Y.dart

But I think single file is a good default.

It also depends on tooling. What kind of directory structure does current tooling work well with? Lot of small files and imports vs few large files and imports?

I don't expect any tooling to be O(N) in number of files or number of lines in given file. But I wouldn't be surprised if there's quadratic behavior somewhere. On my Linux laptop with 4 GiB RAM, generating full bindings for pdfbox.text would make LSP take 10 seconds to start when I open an importing file.

I would also want to know what limits other codegen tools are hitting.

@liamappelbe
Copy link
Contributor

Before I added class filtering, our ObjC bindings were typically millions of lines. With class filtering, the bindings are about 50k - 100k lines. The tooling seems fine with this. I'm planning to add method filtering, and I expect most use cases to get down to a few thousand lines with this. You should add this sort of filtering to jnigen.

@timsneath
Copy link

timsneath commented Sep 21, 2022

@timsneath How have you structured your bindings in package:win32?

I've seen some evidence that more, smaller files is better, at least in terms of the IDE experience, some of that is outside Dart's control (e.g. VSCode code folding and outlining), some of it is very much down to Dart (e.g. dart format seems to slow significantly as files grow beyond a certain limit, particularly if what the generator produces is not close to idiomatic Dart formatting).

The traditional Win32 APIs have no real hierarchy: they're just a flat set of exports. I have a single file per DLL for each set of typedefs and function wrappers, and then separate flat files for structs, callbacks, and constants. Some of these files are approaching 10,000 lines long, and that seems fine.

For the modern WinRT APIs, we've adopted hierarchical folders. This causes some pain in the generator, due to Dart style preferring relative paths for imports (e.g. IVector).

Like @mahesh-hegde, the main performance hit has been when I've generated a really large mapping (in my case, the full mapping of all Windows APIs). Importing a library with many tens of thousands of APIs is unwieldy from the client IDE.

Hope this is helpful!

@dcharkes
Copy link
Collaborator

Thanks @liamappelbe and @timsneath!

I have similar experiences with the dart formatter starting to be slow above 20.000 LOC in generated files. (We generate test cases and benchmarks for dart:ffi in the SDK.)

I think we have two use cases here:

  1. Wrap a Java (or ObjectiveC, or Swift, or C) general API and expose it through a package. For example OS APIs. In this use case we don't filter classes/methods/fields because it is the users of that package that will only use a small subset. An example of this is packge:win32.
  2. Wrap a Java (or ObjectiveC, or Swift, or C) library for a certain purpose and expose a dedicated Dart API for that. For example a networking/database/... package. For these, the bindings generated will likely be filtered to be the subset required. For example packages sqlite3, objectbox, webcrypto, and cupertinohttp.

Use case 1 will not scale with single file.
Use case 2 is unnecessarily messy with multi file.
So, I think we should support both.

Side note: I think sometimes it would make sense to split out the bindings from a (2) use case to (1). For example maybe we could have a general BoringSSL package, that webcrypto could build on (cc @jonasfj). Similarly, We could have a foundations APIs package that cupertino_http could build on (cc @brianquinlan).

we can provide a config option

mode: multi_file

Maybe output_structure: single_file || package_structure, which communicates what we are selecting a mode for.

or output_file_structure: ... which would be even better, but verbose.

which will replicate the java package structure into a folder, then export all classes from each package. let's say package a.b has [X, Y] classes.

a
|__ b
   |__ X.dart // or class_x.dart?
   |__ Y.dart
   |__ b.dart // exports both X.dart and Y.dart

sgtm

But I think single file is a good default.

sgtm. The people making a reusable API can use the package_structure option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants