Skip to content

Tweaks to binary section format? #623

@rossberg

Description

@rossberg

Having finished a first iteration of the encoder & decoder for the spec, I'd like to make a couple of small suggestions regarding the structure of sections in the binary. Lumping the together here.

Section Headers

Instead of
(payload_size; name_string; payload)
as the overall section structure, can we swap that to
(name_string; payload_size; payload)
? Two minor advantages:

  • It makes the offset computation slightly less confusing.
  • It allows simply viewing a section as a pair
    (name_string; payload_string),
    which is particularly natural wrt handling and skipping unknown sections.

With that change, skipping over sections requires skipping over their name first. But I have a hard time imagining a reason for skipping a section without even knowing what it is.

Section Names

Honestly, the current names are super verbose. Would anybody be opposed to shorten them a bit, to something nicer? I'd suggest:

"signatures"          -> "types"
"import_table"        -> "imports"
"function_signatures" -> "functions"
"export_table"        -> "exports"
"start_function"      -> "start"
"function_bodies"     -> "code"
"data_segments"       -> "data"

Note that the signatures section may be generalised to contain other kinds of type definitions in the future, so the current name is not a good fit.

Function Bodies

Function bodies are implicitly delimited by the byte size of their encoding. This is unfortunate for a couple of reasons:

  • The expression decoder cannot just operate on an abstract stream, it has to take this secondary end-of-stream condition into account. The spec currently handles this by a somewhat ad-hoc notion of "substream", but really, this pierces the stream abstraction in an ugly manner.
  • This is the only piece of the binary format that stands in the way of formulating the entire format unambiguously as a grammar. Because it depends on non-local (and lower-level) context information. I find it rather sad to get 99% there but lose it on the last meter.

I'd hence like to suggest adding an explicit end opcode to functions. Pros:

  • The binary format is fully structured and can be entirely parsed linearly from an abstract byte stream, without paying attention to any size information.
  • Sizes would only be needed to (a) seek through the stream if desired, (b) validating that they are consistent. That decouples concerns nicely.

I'm aware that there are concerns that the extra byte is "redundant", but is one byte per function a big deal? We recently saved much more by improving the representation of locals (or would by shortening section names :) ).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions