Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: OCAP bindings #1291

Open
PoignardAzur opened this issue Jul 21, 2019 · 3 comments
Open

Proposal: OCAP bindings #1291

PoignardAzur opened this issue Jul 21, 2019 · 3 comments

Comments

@PoignardAzur
Copy link

PoignardAzur commented Jul 21, 2019

Abstract

This is a proposal for adding inter-module bindings in the form of opaque capabilities that can be exported by arbitrary wasm modules.

Rationale

WebAssembly is currently good at executing a self-contained program managing a monolithic bloc of memory. It lacks key features when it comes to communicating fine-grained data between modules, or between a module and its host.

Some use cases are made harder by lacking these features:

The simplest use cases, where wasm code calls hosts functions which immediately return data (DOM access, WASI) can be covered by simple "static" bindings, which seems to be the direction WebIDL/Snowperson-bindings seem to be headed in.

However, in some cases, you may want to pass persistent data back and forth to a module, without copying it all as a JSON/Protobuff object, eg:

fn myReactComponent()
{
    wasm_react_createElement(
        "ul",
        None,
        [
            wasm_react_createElement("li", None, ...),
            wasm_react_createElement("li", None, ...),
            wasm_react_createElement("li", None, ...),
            wasm_react_createElement("li", None, ...),
            wasm_react_createElement("li", None, ...),
        ]
    )
}

This proposal tries to outline what a scheme capable of supporting the above code would look like.

Description

Each module can export capability types; from the module's perspective, these capabilities are simple wrappers around wasm value types (i32, f64, ref); for other modules/instances' perspectives, they are opaque types can only be copied, moved, dropped, and passed back to the original module instance; they cannot be forged, mutated or downcasted from anyref or other capability types. They can only be stored in local variables and tables (similar to reference types).

Capabilities are statically-checked, nominal types. Modules can't swap the values of two capabilities with the same type name, unless they come from the same instance of the same module (that last part must be checked at runtime).

Note that, because of static type-checking requirement, these types implement an "object capability" scheme, not an "object-oriented programming" scheme. Concepts usually associated with OOP, such as vtables and polymorphism, aren't covered by this proposal. In fact, this proposal doesn't require function references at all, though it depends on the typed imports proposal.

Memory Management

For every capability type, modules also provide lifetime hooks rc_increment and rc_decrement. These hooks are private, accessible only to the host, and allow the host to provide the module trusted information about its capabilities' lifetime, in the form of a reference count.

Using these hooks, modules are responsible for the memory-safety of capabilities in relation to their linear memory. For instance, a C++ exporting shared pointers as capabilities would be responsible for making sure that the linear memory slice these shared pointers are stored in isn't overwritten with garbage data.

The wasm host is only responsible for calling the hooks when capabilities are copied or dropped.

As a result, a program may end-up with reference cycles that can't be trivially detected at compile-time. As with most reference-counting schemes, the developer is ultimately responsible for making sure these cycles don't happen.

An alternative lifetime scheme based on garbage collection may be added, but isn't part of this proposal. The rationale is that most hosts and languages are capable of implementing a RC scheme; but some languages (eg Rust/C++) and hosts may not be able / willing to implement a GC scheme.

Overhead

The overhead of capabilities should be constant-time at worst. Capabilities might be stored as a tuple of a pointer on a store plus its declared type.

In the worst case, passing a capability to a function should only incur a pointer check at runtime. In the best case, the function call could be inlined and using the capability could be equivalent to a single pointer read/write.

Example

The following example is meant to be indicative of what exporting and importing OCAP bindings would look like; the syntax is still fairly loose, and takes a few shortcuts.

main.cpp:

#include <WasmRefWrapper>
#include "Database/generated_header.h"

using DatabaseRef = WasmRefWrapper<Database>;
using DatabaseRequestRef = WasmRefWrapper<DatabaseRequest>;

int main() {
    int some, parameters;
    DatabaseRef db = { Database_open("foobar.db", some, parameters) };

    DatabaseRequestRef rq = { Database_exec(db, "SELECT ... FROM ...") };
    while (!Database_request_isDone(rq))
    {
        Database_request_next(rq);
        int x = Database_request_readInt(rq, 0);
        float y = Database_request_readFloat(rq, 1);
        doSomething(x, y);
    }
    Database_request_close(rq);
    Database_close(db);
    return 0;
}

Database.wasm exports:

(@ocap type Database base i64
    rc_increment $shared_ptr::increment
    rc_decrement $shared_ptr::decrement
)

(@ocap type DatabaseRequest
    rc_increment $shared_ptr::increment
    rc_decrement $shared_ptr::decrement
)

(func $Database_open
    ;; __str__ would probably be (i32 i32) depending on the snowperson bindings proposal
    (param __str__ i32 i32 )
    (result $Database)
)

(func $Database_exec
    (param $Database __str__)
    (result $DatabaseRequest)
)

(func $Database_request_isDone
    (param $DatabaseRequest)
    (result i32)
)

(func $Database_request_next
    (param $DatabaseRequest)
)

(func $Database_request_readInt
    (param $DatabaseRequest i32)
    (result i32)
)

(func $Database_request_readFloat
    (param $DatabaseRequest i32)
    (result f32)
)

(func $Database_request_close
    (param $DatabaseRequest)
)

(func $Database_close
    (param $Database)
)

In the above example, the Database module exports:

  • A i32 pointing to a shared_ptr holding a database handle,
  • A i32 pointing to a shared_ptr holding a request handle,
  • Functions capable of taking and returning these pointers.

The C++ part of the code isn't aware that the capabilities it imports are i32 values; it can only manipulate them through Database_* functions.

Also, note that shared_ptr::increment + shared_ptr::decrement are different from Database_request_close + Database_close. Unlike the reference-counting hooks, the Database_close function isn't guarded by the host. Which means, for instance, that the following code can compile:

Database_request_close(rq);
Database_request_next(rq);
int x = Database_request_readInt(rq, 0);

and should be accounted for in the Database implementation.

Snowperson bindings dependency

This proposal is meant as an extension of snowperson-bindings.

For those who don't know, snowperson-bindings is the WIP name of an intermediate layer between wasm and WebIDL-bindings that was first presented at the June CG meeting.

This proposal is written under the assumption that snowperson-bindings will cover passing and returning typed data between functions (eg structs, arrays, enums, unions). This assumption may be invalidated, as snowperson-bindings are still evolving.

Note that even if, say, structs aren't a part of snowperson bindings, they may still be emulated with OCAP bindings, eg:

Vec3 getPos(const Entity&);

may become:

float getPos_x(const Entity&);
float getPos_y(const Entity&);
float getPos_z(const Entity&);

Possible improvements

Interop with JS

Interoperability with JS is outside the scope of this proposal.

Capabilities could be exported as opaque objects (maybe using private fields) with an explicit dispose() method. These objects could only be produced from and passed to methods from the module exporting them.

Generics

Some use cases need generics.

For instance, if a module A wants to create an object graph, and import a function from a module B that reads through or mutates that graph (eg B exports pathfinding functions), there are several possible implementations:

  • The graph is allocated and controlled by module B,
  • The graph is allocated by module A, and the node types are exported by module A and imported by module B,
  • The functions exported by module B are generic and can be called with any capability type which has the right methods.

Using generics may be desirable if B was developed with no knowledge of A (eg B was downloaded from a package manager), and in any situation where a user wants to pass complex data to a module without allocating directly in that module.

However, generics fall outside the scope of this proposal.

Const types

Some functions may be declared as taking a read-only view of a capability. While this is mostly impossible to verify for a wasm host, it could still enable optimizations when compiling the module importing these functions.

The host would need to take into account the possibility of a malicious module lying in its exported definition, so that the optimizations would produce logically incorrect, but memory-safe code.

Tuple types

Modules might want to bind capabilities to tuple types. For instance, std::shared_ptr<T> is usually implemented as a tuple of a T* pointer, and a pointer to the reference counter, for performance reasons.

@PoignardAzur
Copy link
Author

@jgravelle-google @fgmccabe This is the proposal I mentioned in the last webidl video-call.

@PoignardAzur PoignardAzur changed the title Wasm OCAP bindings Proposal: OCAP bindings Jul 21, 2019
@lukewagner
Copy link
Member

Thanks for the thoughtful writeup of the idea! I think the use case you've identified is important. It's possible that what you want is already expressible in terms of existing proposals.

In particular, even without ☃ bindings, the type imports and exports proposal has recently been factored out of the GC proposal (thanks @rossberg!), and I think these give you what you want:

  • on the Web, you could import a DOM node, allowing fast calls into DOM methods (after instantiation-time checks succeed that the imported type matches the signature of imported DOM methods)
  • a module can define and export a type from a module as well as a set of functions that operate on that type, allowing a wasm module to implement a classic abstract data type. (I filed an issue about being able to export an i31, allowing a linear-memory offset to be stuff directly into the reference value.)

For memory management of a linear-memory-implemented abstract type, I think you could either:

  • export incref/decref functions, requiring the client to do ref-counting correctly
  • export some GC object and use the JS weakrefs proposal to learn when it's no longer reachable and it's time to free the linear memory. (Doing this in a JS-agnostic way is an interesting question, perhaps for ☃ bindings...)

@PoignardAzur
Copy link
Author

PoignardAzur commented Jul 24, 2019

In particular, even without ☃ bindings, the type imports and exports proposal has recently been factored out of the GC proposal (thanks @rossberg!), and I think these give you what you want:

Yeah, this proposal depends on typed imports. I've edited the description to make that clearer.

Although typed imports/exports cover a lot of the design space I had in mind when writing the draft for this, I do think the two features you mentioned (arbitrary, non-reference type exports and incref/decref hooks) need to be added to have truly flexible shared-nothing bindings for linear memory languages.

(I filed an issue about being able to export an i31, allowing a linear-memory offset to be stuff directly into the reference value.)

I guess you can shim other types with an i31, but ultimately what you want is a type that matches the C++/Rust/whatever representation of your capability with as little friction as possible.

(actually, I'm a little confused when reading the GC/typed import specs; does typeuseor typeidx cover basic valtypes like i32/i64? if so, never mind the above)

On the other hand, there's probably an implementation cost for VMs; they have to make sure they can store arbitrary valtypes in tables; and if capabilities can be valtypes, the implementation can't convert them to anyref or rely on a generic reference implementation.

That said, I think being able to export and store arbitrary types makes for a coherent type system; and I suspect the implementation costs are small.

export incref/decref functions, requiring the client to do ref-counting correctly

By the client, do you mean the importing module?

That works as a polyfill, but if you want safe interop even with malicious code, you need the host to enforce ref counting.

Otherwise you can always rely on GC, but this proposal is mostly meant for languages and hosts that don't have GC.

export some GC object and use the JS weakrefs proposal to learn when it's no longer reachable and it's time to free the linear memory. (Doing this in a JS-agnostic way is an interesting question, perhaps for ☃ bindings...)

I think that's out of scope for snowman bindings. Probably as a post-MVP extension for GC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants