Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System collation and system timezone data #25

Open
kg opened this issue Apr 29, 2019 · 9 comments
Open

System collation and system timezone data #25

kg opened this issue Apr 29, 2019 · 9 comments
Labels
discussion A discussion that doesn't yet have a specific conclusion or actionable proposal. feature-request Requests for new WASI APIs

Comments

@kg
Copy link

kg commented Apr 29, 2019

(Maybe this is two separate issues?)

It's common for runtimes (or apps) to have to bundle entire copies of collation tables, timezone data, etc with every app distributed. It adds a considerable amount of weight and for web deployment situations it could make things a lot worse - i.e. do we really want people to have to drag along 2-5mb worth of data with every little worker they want to run on edge nodes just because they need to sort strings?

It would make sense for host environments to be able to expose collation, timezone, etc data they have available to the application. Most environments that can host a wasm app have this data already - every browser has to ship ICU, etc.

AFAIK Mono currently ships its own timezone database on some platforms for this reason, and I see people ship ICU everywhere in all sorts of scenarios. It'd be cool if WASI could at least optionally eliminate the need to ship your own copies of ICU and tzdb.

@programmerjake
Copy link
Contributor

I think it would also be useful to include all the other commonly used Unicode tables as well.

@kg
Copy link
Author

kg commented Apr 29, 2019

IIRC some of this information is presently exposed by some JavaScript DOM APIs, but the ones I've seen for things like collation require you to perform method calls and pass strings, which isn't going to cut it. People need tables, especially if they're porting existing software. (And maybe tables with well-specified formats are also much easier for a host to implement than an actual detailed API?)

@devsnek
Copy link
Member

devsnek commented Apr 29, 2019

there's also been some light discussion of exposing things like cldr tables and timezone charts in js itself, this could be another push in that direction.

@programmerjake
Copy link
Contributor

maybe the Unicode tables could be designed to be compatible with Rust's version: https://github.com/rust-lang/rust/tree/master/src/libcore/unicode

@rylev rylev added the discussion A discussion that doesn't yet have a specific conclusion or actionable proposal. label May 8, 2019
@sunfishcode sunfishcode added the feature-request Requests for new WASI APIs label Feb 19, 2020
alexcrichton added a commit to alexcrichton/WASI that referenced this issue Jan 19, 2022
@ricochet
Copy link

I expect this to largely be solved at the component-model level and not wasi. I expect many apps to depend on the same ICU component as a dependency (https://github.com/unicode-org/icu4x).

@kg
Copy link
Author

kg commented Jan 24, 2023

That still requires every app to deploy all the tables and data, and requires every app developer to vendor it and update their app when the tables change. The host OS and browser are already both provisioned to handle that.

If I build a simple calendar management app (for example) using WASI, should I have to manually build and update it every time the tzdb changes (a dozen or more times a year, if I'm not mistaken)?

I find reluctance to specify this stuff understandable, of course. But expecting every single application or library developer to vendor tzdb and collation tables feels like it would scale really poorly unless the assumption is that every WASI host is going to have a central component registry that ICU4x and the tzdb/collation/etc tables can live in (is there a spec for that registry? I didn't see one in the proposals list).

Just to give an arbitrary example for reference here: I have at least 30 different copies of libavcodec spread across my hard disk, each vendored by a different application. They're all different sizes, presumably because some application vendors configure it to omit file formats they don't care about. In total, that's over 300MB of duplicated code, some of it very old because not every application vendor is going to stay on top of updates.

Now instead of obscure video file formats or video decoder CVEs, we're talking about collations for less-commonly-used languages and updates to time zones for smaller countries. Many application vendors won't regularly hand-push updates to keep up with changes to tzdb, collations, etc. That would mean software targeting WASI potentially will provide an inferior experience for those audiences compared to what they'd get out of native Mac or Windows software, which is unfortunate.

Worse still, ICU and its data files are very large (though I'm sure ICU4x will improve on this), which creates a strong incentive to strip out 'unimportant' languages or internationalization data that your userbase doesn't appear to care about if you're having to vendor all those files yourself and ship them. If you're paying bandwidth costs to ship updated files that creates a further incentive to not update. I've personally witnessed this kind of decision get made.

Some sort of forward-looking mechanism to allow the host to provide some or all of this data would allow preventing at least some of the long-term damage that might result from entirely leaving this up to Someone Else

EDIT: One other example of host data that seems relevant is root certificate stores. In the Bad Old Days everyone had their own approach to it and as a result even things like the Mono .NET runtime have their own certificate store. Is WASI going to also say 'if you need to do SSL, you need to maintain your own root certificate store'? I see SSL (wisely) scoped out of the sockets proposal on purpose.

@ricochet
Copy link

I'm interested in your thoughts on surfacing the Internationalization API Specification through WASI. I pitched this in a related but simpler case in 239.

This doesn't completely solve the ask. The idea of relegating all dependency management to the host is an interesting one, but feels like something that doesn't belong at the systems level API. I expected it to be solved at the platform layer. An example of that is how wasmCloud operates (caveat: I'm a maintainer).

@kg
Copy link
Author

kg commented Jan 24, 2023

A compromise might be a 'system blob' API of some sort, where communities can agree on specializations for well known system blobs like tzdb, and the WASI spec provides a way for a host (or system installed component) to offer that blob.

@ricochet
Copy link

Yes I totally expect different communities to build shared API's and with planned resource handles on WIT definitions, they can pass blobs. proxy-wasm is one example of a community coming together to build a shared interface. Perhaps we should scope this to building a common interface for ICU? wasi-icu?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion A discussion that doesn't yet have a specific conclusion or actionable proposal. feature-request Requests for new WASI APIs
Projects
None yet
Development

No branches or pull requests

6 participants