gen-host-js: utf16 support#392
Conversation
|
I've updated this PR to go all-in on camel casing, I'm happy to remove those parts though if it seems weird. Pointers re a good way to get the tests integrated would be very useful, as I'm not quite sure how utf16 should fit into the current test pipeline. |
alexcrichton
left a comment
There was a problem hiding this comment.
Looks reasonable to me! Happy to take all the naming bits, but I think the bigger piece to handle will be testing this. I do think we'll want to land this with end-to-end tests from a guest into the JS host to ensure everything works. To do that I think the way would be:
- Pick a guest generator and add a
--string-encoding utf16option. - Implement the option in the generator, adjusting lift/lower as necessary and the types printed for strings
- Get the strings-are-utf16 option plumbed into the final component with either:
- Use a file name like
wasm.utf16.rsfor the test case and use that to decide whether to pass the utf-16 option towit-component - Invent the ability to ferry data through a custom section to get fed into
wit-componentto describe the string encoding. For example right now the custom section iscomponent-type:nameand has a binary-encoded component type, and there could possibly be something likecomponent-string-encoding:namewith a one-byte payload that is the string encoding which matches the encodings of string encodings in the component model spec.
- Use a file name like
With all that it should be possible to have one language as a guest and at least JS as a host. Ideally this would support Wasmtime as well since it should already have utf16 support but I can work on adding a test for that later.
|
I've integrated the tests as discussed here for a C -> JS workflow with some string encoding checks etc. With regards to the C helper functions, I had to implement a UTF16 strlen function - it might be better to entirely avoid these helpers using the strlen function and rather explicitly always take a length requiring the user to pass the length to the helpers. Shall we add that as well while we're on this? Would be great to move on to error handling and output formatting work if we can get this landed soon. |
| // 🚀 = 0xD83D 0xDE80 | ||
| // 𠈄 = 0xD840 0xDE04 | ||
| // 𓀀 = 0xD80C 0xDC00 | ||
| char16_t UNICODE_STRING[] = { 0xD83D, 0xDE80, 0xD83D, 0xDE80, 0xD83D, 0xDE80, ' ', 0xD840, 0xDE04, 0xD80C, 0xDC00 }; |
There was a problem hiding this comment.
I did try that but it didn't seem to work for some reason.
Adds UTF16 support to the js-host generator.
In the process, this also refactors to use all camelcase for variables - removing the uppercase module scope variables, and underscore naming, as usually only window-level globals sometimes have capitalization in JS, as well as implementing a shorter data_view function. Feel free to tell me not to meddle though!