Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider new "coercion" field of export/import declarations #657

Closed
lukewagner opened this issue Apr 15, 2016 · 7 comments
Closed

consider new "coercion" field of export/import declarations #657

lukewagner opened this issue Apr 15, 2016 · 7 comments
Assignees

Comments

@lukewagner
Copy link
Member

Today in asm.js, if you compile
uint32_t f() { return UINT32_MAX; }
in the obvious way, you'll get
function asmModule() { 'use asm'; function f() { return -1 } return f }
which, when called, will return -1 to JS (which is != UINT32_MAX) due to the fact that asm.js hard-codes the conversion from i32 to JS number to interpret the i32 as signed.

This trips up asm.js users in practice (who would naturally expect, if they wrote UINT32_MAX in their C, that they should test against UINT32_MAX in their JS) and it will surely trip up wasm users as well. The same problem applies to arguments passed to FFI/import calls from wasm.

In asm.js, this could have been fixed by allowing functions to return either signed or unsigned integers (although there are tradeoffs), but in wasm, there is only i32. Another fix is to require the toolchain, when compiling the above C code, to provide a light JS shim which explicity coerces to unsigned (>>>0) when the declared C return type is unsigned.

I'd like to consider in this issue a better and more general fix: What if export and import statements provided a new "coercion" field which was a sequence of bytes whose interpretation was, like the other byte-sequence fields of imports/exports, left up to the host environment. When binding to JS, this is where we'd explain how to interpret integer arguments/returns. However, I think there are many other potential uses, so I think we'd want to keep the contents of this field extensible, perhaps starting with a JSON blob like {ret:'u'}. Of course, each wasm type would have a "default" coercion (what we're doing today) so this field could always be left empty.

Some other potential uses of the coercion field:

  • We could allow i64 to be passed/returned from JS today using {low, high} objects and later using int64 value types by using the coercion field specify which one.
  • By default, I think the highly-coercive ToInt32/ToNumber coercions would be applied when converting JS arguments in calls to wasm exports, but it might be nice to have a non-coercive option that threw if, e.g., the given JS value wasn't already a number. This could make more sense when we start talking about GC/reference types.
  • It's really common to want to pass/return strings from wasm linear memory to JS (that produce real JS strings). There's actually not a super-efficient way to do this in JS atm (I see Emscripten's current UTF8ArrayToString does one fromCharCode and concat per character!). A string coercion would thus be both a major usability and performance improvement.
  • If Typed Objects get standardized, it'd be useful to have an option that allows a wasm pointer (i32) return value to be coerced into a transparent Typed Object views that aliases linear memory at that offset. This could provide a much more pleasant way to poke at linear memory from JS w/o requiring a JS shim layer.

Taken together, this set of features could make it a lot more pleasant to use wasm from JS without requiring a JS shim.

@lukewagner lukewagner changed the title consider new opaque "coercion" field of export/import declarations consider new "coercion" field of export/import declarations Apr 15, 2016
@lukewagner lukewagner added this to the MVP milestone Apr 18, 2016
@littledan
Copy link

Seems like a good idea to me. For the initial example, seems like a toolchain fix would make sense. However, separating out an embedder-supplied casting function makes a lot of sense in general; what wins me over is strings, where there are actually many different ways that wasm memory could meaningfully represent a JS string. Just seems like a question of whether it's worth the complexity for now.

@lukewagner
Copy link
Member Author

lukewagner commented Apr 21, 2016

Yes, and strings are an example that could make a lot of sense outside of a browser embedding: when passing a string to any import call, you might want to specify utf8 vs. latin1 vs. ...

For the MVP, the lowest-effort addition would be a coercion field that the browser embedding required to be length 0. Then we can take our time to come up with a good system of expressing these coercions. But that assumes we decide that this is a Good Thing to add some time in the future.

@titzer
Copy link

titzer commented Apr 21, 2016

On Thu, Apr 21, 2016 at 4:07 PM, Luke Wagner notifications@github.com
wrote:

Yes, and strings are an example that could make sense outside of a browser
embedding: when passing a string to any native call, you might want to
specify utf8 vs. latin1 vs. ...

For the MVP, the lowest-effort addition would be a coercion field that the
browser embedding required to be length 0. Then we can take our time to
come up with a good system of expressing these coercions. But that assumes
we decide that this is a Good Thing to add some time in the future.

Would it make sense to split the coercions out into a separate (optional)
section as well?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#657 (comment)

@lukewagner
Copy link
Member Author

Adding new sections is always the fallback and what we'll need to resort to post-v.1, but I think it'll be less work overall if the coercion field were embedable "inline".

If we want to avoid mandatory-but-mostly-unused fields (like coercion would be), we could apply the pattern I proposed for making the maximum field of the memory section optional: after the mandatory fields (initial, in the case of memory), you have a varint32 optional_fields field which has 1 bit for each optional field (LEB128 means we have an infinitely-expandable bitfield too!). Then after that, you have only the fields that have a bit set (in least-significant bit order). I was thinking it might be a good idea to have one of these in every section where we might conceivably add fields in the future. At the cost of 1 byte, we buy ourselves a large degree of future extensibility that avoids creating a bunch of one-off sections.

If agreed on that, then we don't even need to "reserve" coercions in the MVP, just add a trailing optional_fields field to "exports" and "imports" and add "coercions" to FutureFeatures.md.

@lukewagner
Copy link
Member Author

Another way to express coercions is as a separate section that, for each exported function, gave any extra details that describe how the exported function would be presented to another language. The section could even be specific to a target language, like JS. With this understanding, there's no need to provision for this in MVP given that the feature itself isn't in the MVP. A good time to revisit might be when we spec ES6 module integration.

@jfbastien
Copy link
Member

@lukewagner can you PR this?

@lukewagner
Copy link
Member Author

Having had time away from the issue and reconsidering, I think this should probably wait until there is a strong use case. At the moment, I think this would be handled better and more generally by the toolchain. So I'll close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants