From 9e08aad63fee5f49e5a7d24fb8e89c286d30a39d Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 14 Mar 2017 09:55:10 -0700 Subject: [PATCH 1/4] Require import/export names to be UTF-8. This implements the UTF-8 proposal described in https://github.com/WebAssembly/design/issues/989#issuecomment-284757788. This does not currently rename "name" to "utf8-name", because if UTF-8 is required for import/export names, there's a greater appeal to just saying that all strings are UTF-8, though this is debatable. --- BinaryEncoding.md | 8 ++++---- Modules.md | 7 +++++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 5c59b351..75c2e49a 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -253,9 +253,9 @@ The import section declares all imports that will be used in the module. | Field | Type | Description | | ----- | ---- | ----------- | | module_len | `varuint32` | module string length | -| module_str | `bytes` | module string of `module_len` bytes | +| module_str | `bytes` | module name: `module_len` bytes holding valid utf8 string | | field_len | `varuint32` | field name length | -| field_str | `bytes` | field name string of `field_len` bytes | +| field_str | `bytes` | field name: `field_len` bytes holding valid utf8 string | | kind | `external_kind` | the kind of definition being imported | Followed by, if the `kind` is `Function`: @@ -356,7 +356,7 @@ The encoding of the [Export section](Modules.md#exports): | Field | Type | Description | | ----- | ---- | ----------- | | field_len | `varuint32` | field name string length | -| field_str | `bytes` | field name string of `field_len` bytes | +| field_str | `bytes` | field name: `field_len` bytes holding valid utf8 string | | kind | `external_kind` | the kind of definition being exported | | index | `varuint32` | the index into the corresponding [index space](Modules.md) | @@ -471,7 +471,7 @@ where a `naming` is encoded as: | ----- | ---- | ----------- | | index | `varuint32` | the index which is being named | | name_len | `varuint32` | number of bytes in name_str | -| name_str | `bytes` | binary encoding of the name | +| name_str | `bytes` | utf8 encoding of the name | #### Function names diff --git a/Modules.md b/Modules.md index 708b79ef..37a9dfd9 100644 --- a/Modules.md +++ b/Modules.md @@ -48,7 +48,8 @@ In the future, other kinds of imports may be added. Imports are designed to allow modules to share code and data while still allowing separate compilation and caching. -All imports include two opaque names: a *module name* and an *export name*. The +All imports include two opaque names: a *module name* and an *export name*, +which are required to be [valid UTF-8]. The interpretation of these names is up to the host environment but designed to allow a host environments, like the [Web](Web.md), to support a two-level namespace. @@ -108,7 +109,8 @@ native `syscall`. For example, a shell environment could define a builtin A module can declare a sequence of **exports** which are returned at instantiation time to the host environment. Each export has three fields: -a *name*, whose meaning is defined by the host environment, a *type*, +a *name*, which is required to be [valid UTF-8], +whose meaning is defined by the host environment, a *type*, indicating whether the export is a function, global, memory or table, and an *index* into the type's corresponding [index space](Modules.md). @@ -380,3 +382,4 @@ In the future, operators like `i32.add` could be added to allow more expressive [future types]: FutureFeatures.md#more-table-operators-and-types [future dom]: FutureFeatures.md#gc/dom-integration [future multiple tables]: FutureFeatures.md#multiple-tables-and-memories +[valid UTF-8]: https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail From 9d260e7dc2481380bbbb861972c7e7bd177708b6 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 14 Mar 2017 10:15:23 -0700 Subject: [PATCH 2/4] s/utf8/UTF-8/g --- BinaryEncoding.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 75c2e49a..928682d0 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -253,9 +253,9 @@ The import section declares all imports that will be used in the module. | Field | Type | Description | | ----- | ---- | ----------- | | module_len | `varuint32` | module string length | -| module_str | `bytes` | module name: `module_len` bytes holding valid utf8 string | +| module_str | `bytes` | module name: `module_len` bytes holding valid UTF-8 string | | field_len | `varuint32` | field name length | -| field_str | `bytes` | field name: `field_len` bytes holding valid utf8 string | +| field_str | `bytes` | field name: `field_len` bytes holding valid UTF-8 string | | kind | `external_kind` | the kind of definition being imported | Followed by, if the `kind` is `Function`: @@ -356,7 +356,7 @@ The encoding of the [Export section](Modules.md#exports): | Field | Type | Description | | ----- | ---- | ----------- | | field_len | `varuint32` | field name string length | -| field_str | `bytes` | field name: `field_len` bytes holding valid utf8 string | +| field_str | `bytes` | field name: `field_len` bytes holding valid UTF-8 string | | kind | `external_kind` | the kind of definition being exported | | index | `varuint32` | the index into the corresponding [index space](Modules.md) | @@ -471,7 +471,7 @@ where a `naming` is encoded as: | ----- | ---- | ----------- | | index | `varuint32` | the index which is being named | | name_len | `varuint32` | number of bytes in name_str | -| name_str | `bytes` | utf8 encoding of the name | +| name_str | `bytes` | UTF-8 encoding of the name | #### Function names From 2f30ddeada13936094d308219a27e71a558bdfca Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 14 Mar 2017 12:38:08 -0700 Subject: [PATCH 3/4] Say "UTF-8 byte sequence" rather than "UTF-8 string". This document is describing the encoded bytes, rather than the string which one gets from decoding them. Also, make the descriptions of the byte sequence length fields more precise. --- BinaryEncoding.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 928682d0..a7e860bb 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -195,8 +195,8 @@ part of the payload. | ----- | ----- | ----- | | id | `varuint7` | section code | | payload_len | `varuint32` | size of this section in bytes | -| name_len | `varuint32` ? | length of the section name in bytes, present if `id == 0` | -| name | `bytes` ? | section name string, present if `id == 0` | +| name_len | `varuint32` ? | length of `name` in bytes, present if `id == 0` | +| name | `bytes` ? | section name: valid UTF-8 byte sequence, present if `id == 0` | | payload_data | `bytes` | content of this section, of length `payload_len - sizeof(name) - sizeof(name_len)` | Each known section is optional and may appear at most once. Custom sections all have the same `id` (0), and can be named non-uniquely (all bytes composing their names may be identical). @@ -252,10 +252,10 @@ The import section declares all imports that will be used in the module. | Field | Type | Description | | ----- | ---- | ----------- | -| module_len | `varuint32` | module string length | -| module_str | `bytes` | module name: `module_len` bytes holding valid UTF-8 string | -| field_len | `varuint32` | field name length | -| field_str | `bytes` | field name: `field_len` bytes holding valid UTF-8 string | +| module_len | `varuint32` | length of `module_str` in bytes | +| module_str | `bytes` | module name: valid UTF-8 byte sequnce | +| field_len | `varuint32` | length of `field_str` in bytes | +| field_str | `bytes` | field name: valid UTF-8 byte sequence | | kind | `external_kind` | the kind of definition being imported | Followed by, if the `kind` is `Function`: @@ -355,8 +355,8 @@ The encoding of the [Export section](Modules.md#exports): | Field | Type | Description | | ----- | ---- | ----------- | -| field_len | `varuint32` | field name string length | -| field_str | `bytes` | field name: `field_len` bytes holding valid UTF-8 string | +| field_len | `varuint32` | length of `field_str` in bytes | +| field_str | `bytes` | field name: valid UTF-8 byte sequence | | kind | `external_kind` | the kind of definition being exported | | index | `varuint32` | the index into the corresponding [index space](Modules.md) | @@ -470,7 +470,7 @@ where a `naming` is encoded as: | Field | Type | Description | | ----- | ---- | ----------- | | index | `varuint32` | the index which is being named | -| name_len | `varuint32` | number of bytes in name_str | +| name_len | `varuint32` | length of `name_str` in bytes | | name_str | `bytes` | UTF-8 encoding of the name | #### Function names From 6f13ecfb965511ef1db848c8f36a142928875e7b Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 14 Mar 2017 13:54:58 -0700 Subject: [PATCH 4/4] Fix typo. --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index a7e860bb..8035b27c 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -253,7 +253,7 @@ The import section declares all imports that will be used in the module. | Field | Type | Description | | ----- | ---- | ----------- | | module_len | `varuint32` | length of `module_str` in bytes | -| module_str | `bytes` | module name: valid UTF-8 byte sequnce | +| module_str | `bytes` | module name: valid UTF-8 byte sequence | | field_len | `varuint32` | length of `field_str` in bytes | | field_str | `bytes` | field name: valid UTF-8 byte sequence | | kind | `external_kind` | the kind of definition being imported |