Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-call executables #281

Closed
wants to merge 9 commits into from

Conversation

sunfishcode
Copy link
Member

This expands the definitions of command to allow alternate functions to be exported, in the manner of multi-call executables.

This also includes a description of the __indirect_function_table export, as well as the __heap_case and __data_end exports.

`_start` is the default export which is called when the user doesn't select a
specific function to call. Commands may also export additional functions,
(similar to "multi-call" executables), which may be explicitly selected by the
user to run instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the use case for making this first class like this? Why not just do that multiplexing at a higher level like with the traditional argv0 trick?

Are all the possible entry points required to the have the same signature as start (void -> void)?

Do we need to mark these entry point exports in some way or are all function exports considered to be entry points? Either way this seems like quite big change. I may have missed the meeting where this was discussed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do both. When we add toolchain support for this, we can connect things such that when a program is called from an alternate entrypoint, it sets argv[0] accordingly. And this way, we can also support environments that don't have traditional argv strings.

Other entrypoints aren't limited to void->void. That'll also be up to toolchains to use.

I'm proposing all function exports from a command are entry points.

The multi-call part is new; I'm proposing it here.

The rest of the patch here is just explicitly stating assumptions that we're effectively already making. If the restrictions here feel too limiting for some use cases, it's possible those use cases don't actually want commands, in the sense used here, and instead want reactors.

user to run instead.

Functions exported from a command are available to be called without a
pre-existing instance. When they are called, the module is instantiated and used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make a lot of sense in terms of the current JS embedding where an instance is required to even get hold of the export. Even the current node wasi implementation requires in instance before its exports can be used. Perhaps we could re-word to keep the intent but allow for such implementations?

How about something like Functions exported from a command are required to run in a fresh instance each time they are called, and when the call returns, the instance is considered terminated and should not be accessed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; I've reworded this section to phrase it in terms of what instances may assume rather than in terms of how instantiation actually works.

For compatibility with existing toolchains, modules may also export globals
named `__heap_base` and `__data_end`. Environments shall not access them.
This provision is deprecated and toolchains are encouraged to avoid providing
these exports.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this last paragraph is needed? Did we have some toolchain version where those symbols were exported by default? Presumably wasi modules are free export any number of additional things on top of what wasi specifies? If I'm wrong and additional exports are not permitted then we should explicitly state that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; they're exported by default in the Rust toolchain, due to a series of coincidences. I've now added additional wording clarifying this.

@pchickey
Copy link
Contributor

These requirements are just the subtyping rules of module types, right? I agree we should describe them in this friendly way, but it may be good to note that we're talking about the same thing as the module types/linking proposal discusses.

@sbc100
Copy link
Member

sbc100 commented May 26, 2020

It seems like there three parts to this change:

  1. Document some existing stuff.. uncontroversial.
  2. Introduce multi-call .. how should we do this? Are the entry points marked in some way? Are restrictions on naming or signatures for alternative entry point?
  3. Limit the "extra" function exports that a wasi module might have. Until now WASI has specified (IIUC) and minimal set of requirements on exports. If a module happens to export extra things that has not historically been a problem, and wasi engines have been free to ignore them.

Maybe you can split out (1) and we can focus on (2) and (3) here?

For (2), its not clear we have strong use case for this. Existing multi-call binaries are quite happy to go through argv. Any naive refactoring would still involve some kind of argv vector for the remaining arguments (busybox ls still requires the rest of argv even if you jump directly to main_ls). Perhaps you are envisaging a new future of wasi-specific multi-call binaries? More motivation is needed for such a cross-cutting change I think.

For (3), having the core wasi spec start to interpret all function exports as command entry points seems like quite a large change, and perhaps unnecessarily restrictive? Now my command with a single entry point is not able to export any other function without them being interpreted by the wasi runtime as alternative entry points.

@devsnek
Copy link
Member

devsnek commented May 26, 2020

We have a use case in node for wasi binaries which use a _napi_init entry point (and don't have _start)

sunfishcode added a commit to sunfishcode/WASI that referenced this pull request May 26, 2020
This pulls out the parts of WebAssembly#281 which document existing practice.
@sunfishcode
Copy link
Member Author

Ok, I've now added #282 which splits out the parts which document existing practice and assumptions.

For (2), its not clear we have strong use case for this. Existing multi-call binaries are quite happy to go through argv. Any naive refactoring would still involve some kind of argv vector for the remaining arguments (busybox ls still requires the rest of argv even if you jump directly to main_ls). Perhaps you are envisaging a new future of wasi-specific multi-call binaries? More motivation is needed for such a cross-cutting change I think.

I have a use case that doesn't need traditional argv-style argument strings. It wouldn't otherwise need to the WASI argv APIs, an allocator to allocate a buffer to store the strings in, and a strcmp to compare the strings. Wasm already has a natural representation of what this use case wants: an entrypoint similar to "_start" but with a different name.

I believe it's important to let wasm's unique advantages shine through for applications which wish to take advantage of them. Compatibility is also important, and I expect we can achieve it in a reasonable way here.

Would it help to discuss this in a meeting? I'd be happy to discuss it further.

For (3), having the core wasi spec start to interpret all function exports as command entry points seems like quite a large change, and perhaps unnecessarily restrictive? Now my command with a single entry point is not able to export any other function without them being interpreted by the wasi runtime as alternative entry points.

If a command is calling out to an instance it imports from, and the instance is importing from the command to call one of its exports, that would create a cyclic dependency. We do do a little of this with "memory" and "__indirect_call_table", but it makes APIs non-polyfillable and depends on host-specific magic, and the goal is to move away from using those once interface types gives us better alternatives.

So it's desirable in general to find other ways to do this. For example if you want the environment to call back in to a module to call its malloc, it's better to split out libc into a separate module, similar to the shared-everything example here. The libc module itself wouldn't be a command, so it wouldn't be subject to the restrictions. And, it'd avoid the cyclic dependency problem.

@sbc100
Copy link
Member

sbc100 commented May 26, 2020

We have a use case in node for wasi binaries which use a _napi_init entry point (and don't have _start)

If your function is called init, aren't your defining a reactor? This discussion is more about commands (things with which can only be used once and then require new instance).

@sbc100
Copy link
Member

sbc100 commented May 26, 2020

Ok, I've now added #282 which splits out the parts which document existing practice and assumptions.

For (2), its not clear we have strong use case for this. Existing multi-call binaries are quite happy to go through argv. Any naive refactoring would still involve some kind of argv vector for the remaining arguments (busybox ls still requires the rest of argv even if you jump directly to main_ls). Perhaps you are envisaging a new future of wasi-specific multi-call binaries? More motivation is needed for such a cross-cutting change I think.

I have a use case that doesn't need traditional argv-style argument strings. It wouldn't otherwise need to the WASI argv APIs, an allocator to allocate a buffer to store the strings in, and a strcmp to compare the strings. Wasm already has a natural representation of what this use case wants: an entrypoint similar to "_start" but with a different name.

I believe it's important to let wasm's unique advantages shine through for applications which wish to take advantage of them. Compatibility is also important, and I expect we can achieve it in a reasonable way here.

Would it help to discuss this in a meeting? I'd be happy to discuss it further.

Yeah, I think that would be good. I think this multi-command thing is kind of new entry in the space between reactors and commands. I'm sure your use case is reasonable but it would be good to hear a little more about it (assuming you can share) in the next meeting.

For (3), having the core wasi spec start to interpret all function exports as command entry points seems like quite a large change, and perhaps unnecessarily restrictive? Now my command with a single entry point is not able to export any other function without them being interpreted by the wasi runtime as alternative entry points.

If a command is calling out to an instance it imports from, and the instance is importing from the command to call one of its exports, that would create a cyclic dependency. We do do a little of this with "memory" and "__indirect_call_table", but it makes APIs non-polyfillable and depends on host-specific magic, and the goal is to move away from using those once interface types gives us better alternatives.

So it's desirable in general to find other ways to do this. For example if you want the environment to call back in to a module to call its malloc, it's better to split out libc into a separate module, similar to the shared-everything example here. The libc module itself wouldn't be a command, so it wouldn't be subject to the restrictions. And, it'd avoid the cyclic dependency problem.

I agree, in general. I was mostly worried about over-specifying needlessly. Why not let modules with extra function exports validate as a valud wasi modules (as they do today). This PR seems to suggest that extra function exports are now meaningful as additional entry points when they were not before. I wonder if we can avoid this? How about and alternative: Any function export that begins with _start is a considered an entry point. We all agree the _start name mangling is not great and we plan to ditch it when we have something better, but for now lets double down on it to keep things explicit and partitioned off.

@sunfishcode
Copy link
Member Author

sunfishcode commented May 27, 2020

@devsnek If you want a _napi_init function where the instance lives on after the call and you can call other functions on it, that may be a reactor use case.

@sbc100

Why not let modules with extra function exports validate as a valud wasi modules (as they do today)

modules that don't have _start functions would continue to validate as valid wasi modules, as they do today. Those are reactors :-). If the terminology is confusing, it may help to think of "reactor" as meaning "regular wasm module" and "command" as "special wasm module that you can run like a command".

Modules that we run as commands today already don't really support extra non-entrypoint exports, because the tools already assume that you're not calling into an instance before or after the call to _start, and cyclic imports are already problematic. What I'm looking to do here is codify assumptions that various tools are already making.

Is the problem that there are people with modules containing a _start function but which really should be run like reactors? Would it help if we made it clear that there can be a way for users to explicitly request modules be run as reactors, even if they have _start functions?

@devsnek
Copy link
Member

devsnek commented May 27, 2020

Ah sorry I misunderstood the intention of multi-call.

@sbc100
Copy link
Member

sbc100 commented May 27, 2020

Is the problem that there are people with modules containing a _start function but which really should be run like reactors? Would it help if we made it clear that there can be a way for users to explicitly request modules be run as reactors, even if they have _start functions?

I'm not sure I understand the problem this presents. Wouldn't such user know right away they had made a mistake because they would get runtime errors like:

  • "You tried to use the module after _start returned"
  • "You tried to use the module before _start was called".

Isn't the solution is force such users to build as a reactor and avoid _start completely? How does adding mult-call modules help with users that have make this mistake?

Or are you saying that such users could be transitioned to mutli-call rather than reactor?

This makes me realize that multi-call has another issue in terms of program startup because each of the entry points would need to call the libc init and static constructor functions. I'm not sure we want to be exposing those details to user entry points. Having a single entry point mean that startup code exists in a single location.

@sunfishcode
Copy link
Member Author

I made a mistake here in introducing muliticall along with doc changes that I expected were just making existing assumptions explicit, but the assumptions turned out to be more interesting than I thought, so I've now split them out into #282. Let's discuss that first.

sunfishcode added a commit that referenced this pull request May 29, 2020
* Elaborate on the definitions of commands and reactors.

This pulls out the parts of #281 which document existing practice.

* Simplify the text about __heap_base and __data_end.

We no longer need to say "applications may export these", but it's still
useful to say that environments shouldn't access them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants