Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require import/export names to be UTF-8. #1016

Merged
merged 4 commits into from
Mar 30, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions BinaryEncoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,8 @@ part of the payload.
| ----- | ----- | ----- |
| id | `varuint7` | section code |
| payload_len | `varuint32` | size of this section in bytes |
| name_len | `varuint32` ? | length of the section name in bytes, present if `id == 0` |
| name | `bytes` ? | section name string, present if `id == 0` |
| name_len | `varuint32` ? | length of `name` in bytes, present if `id == 0` |
| name | `bytes` ? | section name: valid UTF-8 byte sequence, present if `id == 0` |
| payload_data | `bytes` | content of this section, of length `payload_len - sizeof(name) - sizeof(name_len)` |

Each known section is optional and may appear at most once. Custom sections all have the same `id` (0), and can be named non-uniquely (all bytes composing their names may be identical).
Expand Down Expand Up @@ -252,10 +252,10 @@ The import section declares all imports that will be used in the module.

| Field | Type | Description |
| ----- | ---- | ----------- |
| module_len | `varuint32` | module string length |
| module_str | `bytes` | module string of `module_len` bytes |
| field_len | `varuint32` | field name length |
| field_str | `bytes` | field name string of `field_len` bytes |
| module_len | `varuint32` | length of `module_str` in bytes |
| module_str | `bytes` | module name: valid UTF-8 byte sequence |
| field_len | `varuint32` | length of `field_str` in bytes |
| field_str | `bytes` | field name: valid UTF-8 byte sequence |
| kind | `external_kind` | the kind of definition being imported |

Followed by, if the `kind` is `Function`:
Expand Down Expand Up @@ -355,8 +355,8 @@ The encoding of the [Export section](Modules.md#exports):

| Field | Type | Description |
| ----- | ---- | ----------- |
| field_len | `varuint32` | field name string length |
| field_str | `bytes` | field name string of `field_len` bytes |
| field_len | `varuint32` | length of `field_str` in bytes |
| field_str | `bytes` | field name: valid UTF-8 byte sequence |
| kind | `external_kind` | the kind of definition being exported |
| index | `varuint32` | the index into the corresponding [index space](Modules.md) |

Expand Down Expand Up @@ -470,8 +470,8 @@ where a `naming` is encoded as:
| Field | Type | Description |
| ----- | ---- | ----------- |
| index | `varuint32` | the index which is being named |
| name_len | `varuint32` | number of bytes in name_str |
| name_str | `bytes` | binary encoding of the name |
| name_len | `varuint32` | length of `name_str` in bytes |
| name_str | `bytes` | UTF-8 encoding of the name |

#### Function names

Expand Down
7 changes: 5 additions & 2 deletions Modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ In the future, other kinds of imports may be added. Imports are designed to
allow modules to share code and data while still allowing separate compilation
and caching.

All imports include two opaque names: a *module name* and an *export name*. The
All imports include two opaque names: a *module name* and an *export name*,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JS.md, which type of exception should occur if import or export are invalid UTF-8 strings?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS.md already seems to specify WebAssembly.CompileError in this case.

Copy link
Member Author

@sunfishcode sunfishcode Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be a validation requirement, so WebAssembly.validate would return false, and APIs that throw would throw WebAssembly.CompileError. The design docs don't specify the details of validation, so there doesn't seem to be a clear place to specify this; perhaps we could just handle this in eventual spec PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

which are required to be [valid UTF-8]. The
interpretation of these names is up to the host environment but designed to
allow a host environments, like the [Web](Web.md), to support a two-level
namespace.
Expand Down Expand Up @@ -108,7 +109,8 @@ native `syscall`. For example, a shell environment could define a builtin

A module can declare a sequence of **exports** which are returned at
instantiation time to the host environment. Each export has three fields:
a *name*, whose meaning is defined by the host environment, a *type*,
a *name*, which is required to be [valid UTF-8],
whose meaning is defined by the host environment, a *type*,
indicating whether the export is a function, global, memory or table, and
an *index* into the type's corresponding [index space](Modules.md).

Expand Down Expand Up @@ -380,3 +382,4 @@ In the future, operators like `i32.add` could be added to allow more expressive
[future types]: FutureFeatures.md#more-table-operators-and-types
[future dom]: FutureFeatures.md#gc/dom-integration
[future multiple tables]: FutureFeatures.md#multiple-tables-and-memories
[valid UTF-8]: https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail