Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[go] Rework method handling to match latest design #93

Closed
rvolosatovs opened this issue May 29, 2024 · 5 comments
Closed

[go] Rework method handling to match latest design #93

rvolosatovs opened this issue May 29, 2024 · 5 comments
Labels
enhancement New feature or request go wRPC Go support help wanted Extra attention is needed

Comments

@rvolosatovs
Copy link
Member

rvolosatovs commented May 29, 2024

Currently, resources can only be constructed using a constructor in Go bindgen. This effectively makes implementing resources in interfaces not using a constructor (like wasi:keyvalue https://github.com/WebAssembly/wasi-keyvalue/blob/219ea3612a53f1bf5b2d137551b22d0268fd3c58/wit/store.wit#L54) impossible, since the implementations should be able to "conjure a resource out of thin air" to support these kind of use cases.

It seems that the only way to handle this would be exposing the resource construction and method serving functionality in the generated bindings (or perhaps in wrpc module itself?) not tied to the constructor

In current design (implemented, e.g. in Rust), methods, constructors, statics etc. are handled just like regular functions. That should be adopted in Go as well, so instead of assuming there is a unique identifier for each resource and using that in place of "instance", just use the "instance" referring to the method location, just like for any other function

@rvolosatovs rvolosatovs added enhancement New feature or request go wRPC Go support labels May 29, 2024
@rvolosatovs rvolosatovs added this to the feature-completeness milestone May 29, 2024
@rvolosatovs rvolosatovs added the help wanted Extra attention is needed label May 29, 2024
@jacobvm04
Copy link
Collaborator

I’ve been thinking about this issue a lot in #82. My current idea is to expose a table interface like wasmtime’s ResourceTable where you can push the underlying data structure for the resource into the table and get an opaque handle (just a UUID in our case) in return that can easily be encoded or decoded.

This way, wRPC could operate purely on the level of opaque handles, and the access to underlying user defined resources can be handled by the resource table. Thoughts on this approach?

@rvolosatovs
Copy link
Member Author

Generally, that sounds good to me, I did also think we would need such a construct eventually, but I have two questions though:

  • Who provides/implements the table? Would there be a ResourceTable in wrpc_transport crate?
  • How would the lifetime of the resource be managed, i.e. who (and when) sets up and tears down the method handlers?

@jacobvm04
Copy link
Collaborator

jacobvm04 commented May 30, 2024

  • Something that exposes a public interface that is similar to wasmtime's ResourceTable might be a good place to start.

    There's a few complications that would prevent us from being able to just use wasmtime::component::ResourceTable, namely that 1. it's handles are defined by indices into a vector under the hood, but our use case requires the use of ULIDs to define resource handles, and 2. it is not Sync.

    I think defining a ResourceTable in wrpc_transport that utilizes some sort of concurrent hashmap would be an approach worth looking into. However, a simple wrapper around Arc<Mutex<HashMap<String, Box<dyn Any + Send>>>> might be sufficient for an MVP.

  • Ideally, a constructor (or other method that returns a new resource) would return an opaque handle that could be turned into an owned handle, that upon being dropped would invoke the resource's wrpc drop method to tear down the method handlers and remove the resource's data from the table.

If it would be helpful, I could try refactoring #82 to use an approach like this!

@rvolosatovs
Copy link
Member Author

rvolosatovs commented May 30, 2024

A subtle, but very important difference from the wasmtime resource design is that in Wasmtime every function call, be it a freestanding function or a method etc. is received by the (singular) host and any state/data associated with the resource "pointers" is looked up in the table. This works great, because it's the host implementing the resource methods.
In wRPC it's the resource itself that implements the methods, in fact wRPC allows for different resource implementations being used at runtime (e.g. in Wasmtime, there can only ever be one wasi:keyvalue/bucket resource implementation used by the component, in wRPC the server may return a different wasi:keyvalue/bucket resource implementation for each invocation, e.g. depending on the request context)
So in some sense, storing the resource state and assigning the ID is the simple part, much trickier one is storing the listener/handler. I suppose the table would need to store 3 separate things in some way:

  1. Dynamically-typed resource state
  2. Resource ID
  3. Resource method listener/handler

Resources' RPC drop handler would need to tear down all other method handlers and remove itself from the table. In parallel to that the owner of the resource should be able to trigger this tear down drop routine without using RPC. It is important that this only happens when the owned handle is dropped and not moved e.g. by calling another wRPC function and passing that handle as a parameter.
Important note is that the implementations should also be able to determine whether the resource is something that was created locally or exists in some remote location, i.e. if the resource is local or remote.

Another way to look at it is that in Wasmtime there's really just two "peers" - the component and the host/the runtime.
The host always calls the component, the component may call the host and once it's done it returns - all resources associated with that initial host-> component call can be safely teared down.
In wRPC however, we operate in N "peers" scenario, where N is only known at runtime and potentially large. There are potentially M distinct implementations of any particular function available at runtime (e.g. 2 distinct implementations of foo:bar/baz interface may exist. these may also be the same, but e.g. behind a load-balancer of some sorts routing invocations to different hosts).

Let's consider a 3 peer scenario: A, B and C

  1. A calls B.foo and gives it an owned resource R of some type T, which is constructed by A
  2. B calls C.bar and gives it an owned resource R' of some type T, which is constructed by B
  3. At this point C wants to call T::baz on both R and R', note:
    • A. must handle T::baz(R), since it's A that constructed the resource and therefore is the only party, which has the logic associated with it
    • B. must handle T::baz(R'), since it's B that constructed the resource and therefore is the only party, which has the logic associated with it

So how can we make step 3. happen? I generally only see two options here:

  1. C should call T::baz on B itself, since it received both R and R' from it.
    • B will handle T::baz(R') directly, but will "proxy" T::baz(R) to A, since it knows that R originated from A
  2. (what is currently done in wRPC) both R and R' uniquely identify their "location", i.e. C should call T::baz on R' and R directly not knowing whether it's actually A or B peer actually handling the invocation

The main reason why 2. is chosen in wRPC is to eliminate the hops required to make approach 1. work.
Operating on resources passed from one peer to another becomes extremely expensive depending on the amount of "hops" that were performed, in fact the cost of a single method call becomes O(n) where n is the "distance" from the current user of the resource to its origin, what's worse is that in a networked use case the cost here contains very much a non-negligible factor of 2, since every hop must be a roundtrip (to deliver the result of the invocation back to the caller).

With the option 2., the cost of every method call is constant, since it always requires a single roundtrip.

Important thing to note that is that even if the resource R could be created during the original (A->B.foo) call, it needs to "escape the scope" and persist even after B.foo has returned, because it was moved.

Of course there's a slightly different approach 3. possible here, which would sort of combine both approaches, where the resource ID is such that it identifies the pair of host/peer and resource itself. E.g. for our example above R could identify some resource X on host A and C could determine that it should call T::baz(X) on A just by looking at R. I struggle to see significant benefits over 2. with this approach and it seems to introduce additional complexity, but I am happy to change my mind on this if there are some benefits I am not seeing.

(cc @alexcrichton if you're interested, see the above for the reasoning around the resource design we discussed yesterday)

All that said, the design sounds good to me and I'm happy to discuss the details by looking at the implementation.

However, given that the wrpc_transport crate has been rewritten to be much simpler and leaner (#81), I feel like we should probably start with migrating the bindgen (#94) and then work on more significant changes here.
So what I'm saying is I think it's best if we start with a simple/incomplete approach, migrate to the new transport and then improve to avoid churn. Does that make sense?

@jacobvm04
Copy link
Collaborator

That does make sense! I also think that integrating the new transport and then working on adding a simplified version of this onto the bindgen is a good approach for now.

@rvolosatovs rvolosatovs changed the title [go] Provide resource construction functionality [go] Rework method handling to match latest design Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request go wRPC Go support help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants