Add support for multi-call executables #281

sunfishcode · 2020-05-23T14:21:34Z

This expands the definitions of command to allow alternate functions to be exported, in the manner of multi-call executables.

This also includes a description of the __indirect_function_table export, as well as the __heap_case and __data_end exports.

…y exports.

sbc100 · 2020-05-23T15:57:29Z

design/application-abi.md

+   `_start` is the default export which is called when the user doesn't select a
+   specific function to call. Commands may also export additional functions,
+   (similar to "multi-call" executables), which may be explicitly selected by the
+   user to run instead.


What the use case for making this first class like this? Why not just do that multiplexing at a higher level like with the traditional argv0 trick?

Are all the possible entry points required to the have the same signature as start (void -> void)?

Do we need to mark these entry point exports in some way or are all function exports considered to be entry points? Either way this seems like quite big change. I may have missed the meeting where this was discussed.

I think we can do both. When we add toolchain support for this, we can connect things such that when a program is called from an alternate entrypoint, it sets argv[0] accordingly. And this way, we can also support environments that don't have traditional argv strings.

Other entrypoints aren't limited to void->void. That'll also be up to toolchains to use.

I'm proposing all function exports from a command are entry points.

The multi-call part is new; I'm proposing it here.

The rest of the patch here is just explicitly stating assumptions that we're effectively already making. If the restrictions here feel too limiting for some use cases, it's possible those use cases don't actually want commands, in the sense used here, and instead want reactors.

sbc100 · 2020-05-23T16:03:07Z

design/application-abi.md

+   user to run instead.
+
+   Functions exported from a command are available to be called without a
+   pre-existing instance. When they are called, the module is instantiated and used


This doesn't make a lot of sense in terms of the current JS embedding where an instance is required to even get hold of the export. Even the current node wasi implementation requires in instance before its exports can be used. Perhaps we could re-word to keep the intent but allow for such implementations?

How about something like Functions exported from a command are required to run in a fresh instance each time they are called, and when the call returns, the instance is considered terminated and should not be accessed.

That's a good point; I've reworded this section to phrase it in terms of what instances may assume rather than in terms of how instantiation actually works.

sbc100 · 2020-05-23T16:07:28Z

design/application-abi.md

+For compatibility with existing toolchains, modules may also export globals
+named `__heap_base` and `__data_end`. Environments shall not access them.
+This provision is deprecated and toolchains are encouraged to avoid providing
+these exports.


I'm not sure why this last paragraph is needed? Did we have some toolchain version where those symbols were exported by default? Presumably wasi modules are free export any number of additional things on top of what wasi specifies? If I'm wrong and additional exports are not permitted then we should explicitly state that here.

Yes; they're exported by default in the Rust toolchain, due to a series of coincidences. I've now added additional wording clarifying this.

pchickey · 2020-05-26T16:54:00Z

These requirements are just the subtyping rules of module types, right? I agree we should describe them in this friendly way, but it may be good to note that we're talking about the same thing as the module types/linking proposal discusses.

sbc100 · 2020-05-26T17:31:45Z

It seems like there three parts to this change:

Document some existing stuff.. uncontroversial.
Introduce multi-call .. how should we do this? Are the entry points marked in some way? Are restrictions on naming or signatures for alternative entry point?
Limit the "extra" function exports that a wasi module might have. Until now WASI has specified (IIUC) and minimal set of requirements on exports. If a module happens to export extra things that has not historically been a problem, and wasi engines have been free to ignore them.

Maybe you can split out (1) and we can focus on (2) and (3) here?

For (2), its not clear we have strong use case for this. Existing multi-call binaries are quite happy to go through argv. Any naive refactoring would still involve some kind of argv vector for the remaining arguments (busybox ls still requires the rest of argv even if you jump directly to main_ls). Perhaps you are envisaging a new future of wasi-specific multi-call binaries? More motivation is needed for such a cross-cutting change I think.

For (3), having the core wasi spec start to interpret all function exports as command entry points seems like quite a large change, and perhaps unnecessarily restrictive? Now my command with a single entry point is not able to export any other function without them being interpreted by the wasi runtime as alternative entry points.

devsnek · 2020-05-26T17:35:05Z

We have a use case in node for wasi binaries which use a _napi_init entry point (and don't have _start)

…d too.

…d before _initialize.

This pulls out the parts of WebAssembly#281 which document existing practice.

sunfishcode · 2020-05-26T21:56:25Z

Ok, I've now added #282 which splits out the parts which document existing practice and assumptions.

For (2), its not clear we have strong use case for this. Existing multi-call binaries are quite happy to go through argv. Any naive refactoring would still involve some kind of argv vector for the remaining arguments (busybox ls still requires the rest of argv even if you jump directly to main_ls). Perhaps you are envisaging a new future of wasi-specific multi-call binaries? More motivation is needed for such a cross-cutting change I think.

I have a use case that doesn't need traditional argv-style argument strings. It wouldn't otherwise need to the WASI argv APIs, an allocator to allocate a buffer to store the strings in, and a strcmp to compare the strings. Wasm already has a natural representation of what this use case wants: an entrypoint similar to "_start" but with a different name.

I believe it's important to let wasm's unique advantages shine through for applications which wish to take advantage of them. Compatibility is also important, and I expect we can achieve it in a reasonable way here.

Would it help to discuss this in a meeting? I'd be happy to discuss it further.

For (3), having the core wasi spec start to interpret all function exports as command entry points seems like quite a large change, and perhaps unnecessarily restrictive? Now my command with a single entry point is not able to export any other function without them being interpreted by the wasi runtime as alternative entry points.

If a command is calling out to an instance it imports from, and the instance is importing from the command to call one of its exports, that would create a cyclic dependency. We do do a little of this with "memory" and "__indirect_call_table", but it makes APIs non-polyfillable and depends on host-specific magic, and the goal is to move away from using those once interface types gives us better alternatives.

So it's desirable in general to find other ways to do this. For example if you want the environment to call back in to a module to call its malloc, it's better to split out libc into a separate module, similar to the shared-everything example here. The libc module itself wouldn't be a command, so it wouldn't be subject to the restrictions. And, it'd avoid the cyclic dependency problem.

sbc100 · 2020-05-26T22:56:58Z

We have a use case in node for wasi binaries which use a _napi_init entry point (and don't have _start)

If your function is called init, aren't your defining a reactor? This discussion is more about commands (things with which can only be used once and then require new instance).

sbc100 · 2020-05-26T23:05:39Z

Ok, I've now added #282 which splits out the parts which document existing practice and assumptions.

For (2), its not clear we have strong use case for this. Existing multi-call binaries are quite happy to go through argv. Any naive refactoring would still involve some kind of argv vector for the remaining arguments (busybox ls still requires the rest of argv even if you jump directly to main_ls). Perhaps you are envisaging a new future of wasi-specific multi-call binaries? More motivation is needed for such a cross-cutting change I think.

I have a use case that doesn't need traditional argv-style argument strings. It wouldn't otherwise need to the WASI argv APIs, an allocator to allocate a buffer to store the strings in, and a strcmp to compare the strings. Wasm already has a natural representation of what this use case wants: an entrypoint similar to "_start" but with a different name.

I believe it's important to let wasm's unique advantages shine through for applications which wish to take advantage of them. Compatibility is also important, and I expect we can achieve it in a reasonable way here.

Would it help to discuss this in a meeting? I'd be happy to discuss it further.

Yeah, I think that would be good. I think this multi-command thing is kind of new entry in the space between reactors and commands. I'm sure your use case is reasonable but it would be good to hear a little more about it (assuming you can share) in the next meeting.

For (3), having the core wasi spec start to interpret all function exports as command entry points seems like quite a large change, and perhaps unnecessarily restrictive? Now my command with a single entry point is not able to export any other function without them being interpreted by the wasi runtime as alternative entry points.

If a command is calling out to an instance it imports from, and the instance is importing from the command to call one of its exports, that would create a cyclic dependency. We do do a little of this with "memory" and "__indirect_call_table", but it makes APIs non-polyfillable and depends on host-specific magic, and the goal is to move away from using those once interface types gives us better alternatives.

So it's desirable in general to find other ways to do this. For example if you want the environment to call back in to a module to call its malloc, it's better to split out libc into a separate module, similar to the shared-everything example here. The libc module itself wouldn't be a command, so it wouldn't be subject to the restrictions. And, it'd avoid the cyclic dependency problem.

I agree, in general. I was mostly worried about over-specifying needlessly. Why not let modules with extra function exports validate as a valud wasi modules (as they do today). This PR seems to suggest that extra function exports are now meaningful as additional entry points when they were not before. I wonder if we can avoid this? How about and alternative: Any function export that begins with _start is a considered an entry point. We all agree the _start name mangling is not great and we plan to ditch it when we have something better, but for now lets double down on it to keep things explicit and partitioned off.

sunfishcode · 2020-05-27T19:15:04Z

@devsnek If you want a _napi_init function where the instance lives on after the call and you can call other functions on it, that may be a reactor use case.

@sbc100

Why not let modules with extra function exports validate as a valud wasi modules (as they do today)

modules that don't have _start functions would continue to validate as valid wasi modules, as they do today. Those are reactors :-). If the terminology is confusing, it may help to think of "reactor" as meaning "regular wasm module" and "command" as "special wasm module that you can run like a command".

Modules that we run as commands today already don't really support extra non-entrypoint exports, because the tools already assume that you're not calling into an instance before or after the call to _start, and cyclic imports are already problematic. What I'm looking to do here is codify assumptions that various tools are already making.

Is the problem that there are people with modules containing a _start function but which really should be run like reactors? Would it help if we made it clear that there can be a way for users to explicitly request modules be run as reactors, even if they have _start functions?

devsnek · 2020-05-27T19:16:36Z

Ah sorry I misunderstood the intention of multi-call.

sbc100 · 2020-05-27T22:41:23Z

Is the problem that there are people with modules containing a _start function but which really should be run like reactors? Would it help if we made it clear that there can be a way for users to explicitly request modules be run as reactors, even if they have _start functions?

I'm not sure I understand the problem this presents. Wouldn't such user know right away they had made a mistake because they would get runtime errors like:

"You tried to use the module after _start returned"
"You tried to use the module before _start was called".

Isn't the solution is force such users to build as a reactor and avoid _start completely? How does adding mult-call modules help with users that have make this mistake?

Or are you saying that such users could be transitioned to mutli-call rather than reactor?

This makes me realize that multi-call has another issue in terms of program startup because each of the entry points would need to call the libc init and static constructor functions. I'm not sure we want to be exposing those details to user entry points. Having a single entry point mean that startup code exists in a single location.

sunfishcode · 2020-05-28T02:51:50Z

I made a mistake here in introducing muliticall along with doc changes that I expected were just making existing assumptions explicit, but the assumptions turned out to be more interesting than I thought, so I've now split them out into #282. Let's discuss that first.

* Elaborate on the definitions of commands and reactors. This pulls out the parts of #281 which document existing practice. * Simplify the text about __heap_base and __data_end. We no longer need to say "applications may export these", but it's still useful to say that environments shouldn't access them.

sunfishcode added 2 commits May 23, 2020 07:16

Describe multi-call executables.

546e6c3

Allow the function pointer table to be exported, and add compatibilit…

cfd8bc1

…y exports.

sbc100 reviewed May 23, 2020

View reviewed changes

sunfishcode added 3 commits May 23, 2020 10:32

Clarify that commands shouldn't have extraneous exports.

a603bbb

Relax the requirements on command instantiation.

c634082

Modules default to being reactors.

d0c06ea

sunfishcode added 4 commits May 26, 2020 12:19

Instances may assume their exports aren't accessed before being calle…

b0a3b6a

…d too.

Clarify that a reactor function may assume its exports aren't accesse…

2f72007

…d before _initialize.

Fix a typo.

05f7f03

Tighten up wording.

da7381a

sunfishcode added a commit to sunfishcode/WASI that referenced this pull request May 26, 2020

Elaborate on the definitions of commands and reactors.

45dc211

This pulls out the parts of WebAssembly#281 which document existing practice.

sunfishcode mentioned this pull request May 26, 2020

Elaborate on the definitions of commands and reactors. #282

Merged

sunfishcode closed this May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for multi-call executables #281

Add support for multi-call executables #281

sunfishcode commented May 23, 2020

sbc100 May 23, 2020

sunfishcode May 23, 2020

sbc100 May 23, 2020

sunfishcode May 23, 2020

sbc100 May 23, 2020

sunfishcode May 23, 2020

pchickey commented May 26, 2020

sbc100 commented May 26, 2020

devsnek commented May 26, 2020

sunfishcode commented May 26, 2020

sbc100 commented May 26, 2020

sbc100 commented May 26, 2020

sunfishcode commented May 27, 2020 •

edited

Loading

devsnek commented May 27, 2020

sbc100 commented May 27, 2020

sunfishcode commented May 28, 2020

Add support for multi-call executables #281

Add support for multi-call executables #281

Conversation

sunfishcode commented May 23, 2020

sbc100 May 23, 2020

Choose a reason for hiding this comment

sunfishcode May 23, 2020

Choose a reason for hiding this comment

sbc100 May 23, 2020

Choose a reason for hiding this comment

sunfishcode May 23, 2020

Choose a reason for hiding this comment

sbc100 May 23, 2020

Choose a reason for hiding this comment

sunfishcode May 23, 2020

Choose a reason for hiding this comment

pchickey commented May 26, 2020

sbc100 commented May 26, 2020

devsnek commented May 26, 2020

sunfishcode commented May 26, 2020

sbc100 commented May 26, 2020

sbc100 commented May 26, 2020

sunfishcode commented May 27, 2020 • edited Loading

devsnek commented May 27, 2020

sbc100 commented May 27, 2020

sunfishcode commented May 28, 2020

sunfishcode commented May 27, 2020 •

edited

Loading