Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wit text format comments encoded in the binary Wasm binary #213

Open
calvinrp opened this issue Jul 10, 2023 · 12 comments
Open

Wit text format comments encoded in the binary Wasm binary #213

calvinrp opened this issue Jul 10, 2023 · 12 comments

Comments

@calvinrp
Copy link

This has been discussed. Formally, creating an issue for it.

Currently, we lose comments / docs from Wit text formats when compiling to the binary Wasm component file format. This is especially an issue when publishing to the Warg registry that is designed to only accept Wasm binary files.

Likely implementation would involve some custom section convention.

@lukewagner
Copy link
Member

Thanks for filing this; agreed this is something we should fix and agreed that this is probably a custom section thing. Some initial thoughts on what we might want from a custom section format:

  • Since it's mostly text anyways, perhaps the custom section contents should be human-readable/editable text (unlike, e.g., the name section) so that, e.g., the "exploded" representation of a component has the documentation in a simple text file that can be easily edited before reimploding.
  • It would be useful to have a well-defined validation predicate that we can regularly check (e.g., on registry publication) to avoid the drift I expect we'd get otherwise.
  • As part of the general theme of unifying Wit packages with general component packages (so there are just "packages" represented as components), it would be nice if the same documentation format could be applied to both encoded Wit interfaces/worlds and regular component imports/exports.

@lann
Copy link
Contributor

lann commented Aug 16, 2023

I can take a shot at this. Would we expect this specification to be part of this repo?

@peterhuene
Copy link
Collaborator

peterhuene commented Aug 16, 2023

I don't believe we currently document the binary encoding of WIT, but my hope, at least, it will be soon (and look more like any other component that simply exports the relevant types); I would assume we would then want to document the comment custom section here too.

@lann
Copy link
Contributor

lann commented Aug 16, 2023

Since it's mostly text anyways, perhaps the custom section contents should be human-readable/editable text (unlike, e.g., the name section) so that, e.g., the "exploded" representation of a component has the documentation in a simple text file that can be easily edited before reimploding.

Two thoughts on how this could be accomplished, both of which require defining some unique encoding of the "path" to each item.

  • Each documented item could get its own custom section, named e.g. component-item-doc:<path> with docs as section contents.
  • The most reasonable way I can think of doing this in a single section would be to encode as JSON, e.g. {"<path>": "<docs>", ...}

I'm not sure either of these is all that great, but I'd be happy to hear feedback / other ideas.

@lann
Copy link
Contributor

lann commented Aug 16, 2023

It would be useful to have a well-defined validation predicate that we can regularly check (e.g., on registry publication) to avoid the drift I expect we'd get otherwise.

You mean checking that a package is "sufficiently documented" a la Rust's #![deny(missing_docs)]? (whatever that might precisely mean here)

@lukewagner
Copy link
Member

To the "does this belong in this repo/spec" question: I think so, probably as a new .md in design/mvp (covering both the documentation and encoding of Wit into C-M types) in the short-term, and in an appendix (like in the core spec) in the official document. Thanks for offering to help Lann!

As for your first question: I expect we want just one custom section, so yeah, the JSON approach probably makes sense. Half-baked idea: instead of putting the <path> in the key, what if the nesting structure was mirrored in the JSON object nesting structure, with imports and exports at the top-level, and then the importname/exportname as the next-level key, and then further nesting is determined by the externdesc (exports of an instance type, parameter/result-names of a func type, etc)? This would have the nice effect of "factoring out" the common prefix and maybe also be somewhat readable.

You mean checking that a package is "sufficiently documented" a la Rust's #![deny(missing_docs)]? (whatever that might precisely mean here)

Oh, no, more like: given a component .wasm, validate that, if it contains a documentation section, it's well-formed (e.g., valid JSON and each referenced name exists).

@lann
Copy link
Contributor

lann commented Aug 17, 2023

I have some prototype code for this now: bytecodealliance/wasm-tools#1169

The JSON schema ended up being directed a bit by the wit_parser::Resolve internals. I think the result of that is that it more closely matches the WIT structure than the equivalent binary encoding.

@lukewagner
Copy link
Member

Nice! On first glance, it looks really good. Initially I wasn't sure about documenting Wit-level concepts in the schema, but on second thought it does seem like the right level of abstraction. What's interesting is that the well-formedness predicate of the docs section will end up depending on the encoding-scheme of Wit into component types, which is something we need to specify precisely in any case.

Incidentally, we were just talking with @peterhuene about where the package statement in Wit goes so that it can be roundtripped, and it seemed like maybe it would belong in a docs section, so maybe that's an additional top-level key in the schema.

@alexcrichton
Copy link
Collaborator

In the near-term JSON I think is ok but in the long-term I'm not sure if it makes sense. Luke above said:

perhaps the custom section contents should be human-readable/editable text

but I'd call that into question in the sense that I'm not sure what this would be used for? You can't, for example, open up a *.wasm binary in a text editor and change the contents of the section because at the very least there's a header at the beginning of the proposed custom section indicating how large the section is. Otherwise I'm not sure what the workflow would look like for editing the comments in a component, but the closest I'd imagine is that you'd explode the wasm binary back into a WIT package, edit some bits, and the re-implode back into a wasm binary. In this case the encoding format doesn't matter since only the text contents are being updated.

One point in favor against JSON I think is that I think in the long run it doesn't really provide much benefit over a section defined in the manner of the name section. Even with JSON we'd still have to document a schema which feels similar to the work necessary to define a binary format as well. I also feel that using a custom format would avoid the need to shoehorn everything into JSON whether it fits there or not.

To clarify again though I think JSON is fine for now, but I do think we'll want to keep the door open to updates in the future.

@oovm
Copy link

oovm commented Feb 27, 2024

I want to know what the documentation comments in wat format look like and where should they be written?

My tool generates wat with $id in debug mode, and then generates wasm in release mode.

@oovm
Copy link

oovm commented Feb 27, 2024

Another question is whether to save markdown or compiled html format.

Benefits of using html

  • Some languages do not use markdown, but org-mode, ascii-doc and other tools, and then derive the wit format.
  • The markdown used in many language documents is not standard markdown, but an extended markdown that adds various jumps, parameter markers and other functions.
  • Markdown is not scalable enough. If someone wants to use the katex tool to render formulas and mermaid to render icons, it will be difficult.
  • Supports direct reading of fields and splicing to generate static web pages

@lann
Copy link
Contributor

lann commented Feb 27, 2024

Another question is whether to save markdown or compiled html format.

One of the goals of this feature is to be able to transform the binary encoding back into something equivalent to the input WIT text, which strongly suggests that comments should be preserved ~verbatim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants