Skip to content

UTF-8 validation #322

@sunfishcode

Description

@sunfishcode

UTF-8 is a very popular string encoding, for example, it's the encoding used by over 95% of all Web content. It's not uncommon for applications to need to do UTF-8 validation on their own strings, and since all WebAssembly VMs have UTF-8 validation logic built in as required by the spec, we should define a WASI API to let applications call into the VM's UTF-8 validation logic rather than having to bundle their own.

I'm picturing an API which takes a byte slice as input and returns a boolean value indicating whether it's valid or not. This is the minimum that WebAssembly engines themselves are required to have, and would be enough for eg. the use case of implementing a UTF-8 validity check for a a WASI API implemented in wasm.

More elaborate APIs are possible, such as validation which returns the position where an error occurred, and possibly information about the error, but I think it makes sense to start with something simple. I won't have time to make an official proposal myself for a while, but I wanted to file this issue to see what others think!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions