Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Roundtrip names with strange symbols in text format #617

Closed
binji opened this issue Dec 6, 2017 · 8 comments
Closed

[feature] Roundtrip names with strange symbols in text format #617

binji opened this issue Dec 6, 2017 · 8 comments
Labels
future feature issues that may be handled in future versions

Comments

@binji
Copy link
Member

binji commented Dec 6, 2017

See WebAssembly/wabt#685 (comment). We currently generate a name section using the name provided like $foo. This doesn't work for all names that are allowed by the binary format. Should we have a way to represent these names in the text format?

@AndrewScheidecker
Copy link
Contributor

How about allowing quoted names: $"arbitrary\00string"?

@binji
Copy link
Member Author

binji commented Dec 7, 2017

Good idea. Seems like a simple enough change, and matches what the text format already does for quoted strings. What do you think, @rossberg?

@rossberg
Copy link
Member

rossberg commented Dec 8, 2017 via email

@AndrewScheidecker
Copy link
Contributor

In particular, it would pull in Unicode into a central piece of the text semantics and get us into the business of defining the right equivalence on arbitrary Unicode strings or their (possibly malformed?) encodings. I'd rather not go there, IME it's a rabbit hole.

It does make sense to apply the same well-formed UTF-8 constraint as the import/export strings, but why would it be necessary to define equivalence as anything other than byte-wise comparison? If we allow imports/exports to be distinguished by equivalent UTF-8 strings, why not these names?

@binji
Copy link
Member Author

binji commented Dec 8, 2017

I agree w/ @AndrewScheidecker that this seems to be a similar situation to import/export names. That said, I also think that if we have the general mechanism for custom section annotations, that would work fine too. That seems like it requires more design work than extending the syntax for identifiers though.

@rossberg
Copy link
Member

@AndrewScheidecker, fair enough, but we would still introduce the situation where there are many different ways to spell the same identifier, e.g., using unicode escapes, raw UTF-8 hex escapes, quotes vs no quotes, etc., which is undesirable IMO.

Unlike import/export names, which are string labels for external interaction so that they have to be language-agnostic and universal (and don't have any meaning inside Wasm itself), free form quoting is not something typically found for internal identifiers. I can see the temptation to view symbolic identifiers as a reflection of the name section, but that wasn't the intended purpose.

@AndrewScheidecker
Copy link
Contributor

@AndrewScheidecker, fair enough, but we would still introduce the situation where there are many different ways to spell the same identifier, e.g., using unicode escapes, raw UTF-8 hex escapes, quotes vs no quotes, etc., which is undesirable IMO.

I think it's acceptable if the same identifier can be written multiple ways: e.g. $f as $"\66". It would make it possible to write confusing or misleading WAT code, but the purpose of these names is to make disassembly/callstacks useful, and in those cases the names will be printed in a consistent way.

Unlike import/export names, which are string labels for external interaction so that they have to be language-agnostic and universal (and don't have any meaning inside Wasm itself), free form quoting is not something typically found for internal identifiers.

We want to disassemble names from languages with arbitrary syntax, and produce valid WAT syntax. The simplest way to do that is to allow arbitrary strings in WAT identifier syntax.

The annotation proposal tries to avoid the issue by adding a name annotation that takes an arbitrary string, but as I mentioned here, that doesn't replace a good WAT identifier that can be used as an argument of call, get_local, etc.

@binji binji added the future feature issues that may be handled in future versions label May 22, 2019
@rossberg rossberg changed the title Roundtrip names with strange symbols in text format [feature] Roundtrip names with strange symbols in text format Aug 4, 2022
tlively added a commit to WebAssembly/binaryen that referenced this issue Feb 6, 2024
In addition to normal identifiers, support parsing identifiers of the format
`$"..."`. This format is not yet allowed by the standard, but it is a popular
proposed extension (see WebAssembly/spec#617 and
WebAssembly/annotations#21).

Binaryen has historically allowed a similar format and has supported arbitrary
non-standard identifier characters, so it's much easier to support this extended
syntax than to fix everything to use the restricted standard syntax.
tlively added a commit to WebAssembly/binaryen that referenced this issue Feb 6, 2024
In addition to normal identifiers, support parsing identifiers of the format
`$"..."`. This format is not yet allowed by the standard, but it is a popular
proposed extension (see WebAssembly/spec#617 and
WebAssembly/annotations#21).

Binaryen has historically allowed a similar format and has supported arbitrary
non-standard identifier characters, so it's much easier to support this extended
syntax than to fix everything to use the restricted standard syntax.
tlively added a commit to WebAssembly/binaryen that referenced this issue Feb 6, 2024
In addition to normal identifiers, support parsing identifiers of the format
`$"..."`. This format is not yet allowed by the standard, but it is a popular
proposed extension (see WebAssembly/spec#617 and
WebAssembly/annotations#21).

Binaryen has historically allowed a similar format and has supported arbitrary
non-standard identifier characters, so it's much easier to support this extended
syntax than to fix everything to use the restricted standard syntax.
radekdoulik pushed a commit to dotnet/binaryen that referenced this issue Jul 12, 2024
In addition to normal identifiers, support parsing identifiers of the format
`$"..."`. This format is not yet allowed by the standard, but it is a popular
proposed extension (see WebAssembly/spec#617 and
WebAssembly/annotations#21).

Binaryen has historically allowed a similar format and has supported arbitrary
non-standard identifier characters, so it's much easier to support this extended
syntax than to fix everything to use the restricted standard syntax.
@rossberg
Copy link
Member

This is now supported with string-style identifiers, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future feature issues that may be handled in future versions
Projects
None yet
Development

No branches or pull requests

3 participants