Skip to content

gen-host-js: utf16 support#392

Merged
alexcrichton merged 14 commits intobytecodealliance:mainfrom
guybedford:js-utf16
Oct 25, 2022
Merged

gen-host-js: utf16 support#392
alexcrichton merged 14 commits intobytecodealliance:mainfrom
guybedford:js-utf16

Conversation

@guybedford
Copy link
Copy Markdown
Contributor

@guybedford guybedford commented Oct 22, 2022

Adds UTF16 support to the js-host generator.

In the process, this also refactors to use all camelcase for variables - removing the uppercase module scope variables, and underscore naming, as usually only window-level globals sometimes have capitalization in JS, as well as implementing a shorter data_view function. Feel free to tell me not to meddle though!

@guybedford
Copy link
Copy Markdown
Contributor Author

I've updated this PR to go all-in on camel casing, I'm happy to remove those parts though if it seems weird.

Pointers re a good way to get the tests integrated would be very useful, as I'm not quite sure how utf16 should fit into the current test pipeline.

Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me! Happy to take all the naming bits, but I think the bigger piece to handle will be testing this. I do think we'll want to land this with end-to-end tests from a guest into the JS host to ensure everything works. To do that I think the way would be:

  • Pick a guest generator and add a --string-encoding utf16 option.
  • Implement the option in the generator, adjusting lift/lower as necessary and the types printed for strings
  • Get the strings-are-utf16 option plumbed into the final component with either:
    • Use a file name like wasm.utf16.rs for the test case and use that to decide whether to pass the utf-16 option to wit-component
    • Invent the ability to ferry data through a custom section to get fed into wit-component to describe the string encoding. For example right now the custom section is component-type:name and has a binary-encoded component type, and there could possibly be something like component-string-encoding:name with a one-byte payload that is the string encoding which matches the encodings of string encodings in the component model spec.

With all that it should be possible to have one language as a guest and at least JS as a host. Ideally this would support Wasmtime as well since it should already have utf16 support but I can work on adding a test for that later.

Comment thread crates/gen-host-js/src/lib.rs Outdated
Comment thread crates/gen-host-js/src/lib.rs Outdated
Comment thread crates/gen-host-js/src/lib.rs Outdated
@guybedford
Copy link
Copy Markdown
Contributor Author

I've integrated the tests as discussed here for a C -> JS workflow with some string encoding checks etc.

With regards to the C helper functions, I had to implement a UTF16 strlen function - it might be better to entirely avoid these helpers using the strlen function and rather explicitly always take a length requiring the user to pass the length to the helpers. Shall we add that as well while we're on this?

Would be great to move on to error handling and output formatting work if we can get this landed soon.

Comment thread crates/test-helpers/macros/build.rs
Comment thread crates/test-helpers/macros/build.rs Outdated
Comment thread crates/wit-component/src/encoding.rs Outdated
Comment thread tests/runtime/strings/wasm_utf16.c Outdated
Comment thread tests/runtime/strings/wasm_utf16.c
Comment thread crates/gen-host-js/src/lib.rs Outdated
Comment thread crates/gen-host-js/src/lib.rs
Comment thread tests/runtime/strings/wasm_utf16.c Outdated
// 🚀 = 0xD83D 0xDE80
// 𠈄 = 0xD840 0xDE04
// 𓀀 = 0xD80C 0xDC00
char16_t UNICODE_STRING[] = { 0xD83D, 0xDE80, 0xD83D, 0xDE80, 0xD83D, 0xDE80, ' ', 0xD840, 0xDE04, 0xD80C, 0xDC00 };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be u"🚀🚀🚀 𠈄𓀀"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try that but it didn't seem to work for some reason.

Comment thread tests/runtime/strings/wasm_utf16.c Outdated
@alexcrichton alexcrichton merged commit b6df951 into bytecodealliance:main Oct 25, 2022
@guybedford guybedford deleted the js-utf16 branch October 25, 2022 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants