Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall ABI #27

Open
jfbastien opened this issue Dec 6, 2017 · 7 comments
Open

syscall ABI #27

jfbastien opened this issue Dec 6, 2017 · 7 comments

Comments

@jfbastien
Copy link
Member

IIUC we currently don't have a stable syscall ABI. I think we should try to standardize something.

Having a stable ABI we agree on means that embedder don't have to roll their own. There's plenty of experience to gain from what Emscripten did, and I would love to have its JavaScript syscall layer as a free-standing thing.

Here's a quick sketch:

  • Each syscall is its own function (unlike e.g. Linux where each syscall signature is a function, taking as first parameter the syscall number).
  • Go through the existing syscalls and adopt ones we deem useful.
  • Embedder can do X when a syscall cannot work on that platform (X TBD, should we allow trapping if e.g. sockets aren't available?).
  • The module for all syscalls is the same. Say syscall.
  • I think we might want to version the module name (i.e. syscall_v0): adding new syscalls doesn't need a new version, but changing any tool-convention ABI behavior would require bumping the version. Unless we think behavior will never change, in which case no versioning.
  • The field for each syscall is just the syscall number macro's name (e.g. exit, fork, read, write, open, ...)
  • IIRC we've talked about adding a custom clang attribute to denote module / field of an export / import.

One open question I have: say a JS embedding wants to let the user choose how to implement filesystem access (maybe WebSQL versus in-memory are two options). How would be offer a stable ABI, and let users choose which JS glue to use? They can't just change the "filesystem" import if all syscalls are in the "syscall" import. Should we group syscalls by theme, and are all of these orthogonal enough that you wouldn't want to have two in the same group sometimes?

@binji
Copy link
Member

binji commented Dec 6, 2017

Thanks for kicking this off, JF!

How would be offer a stable ABI, and let users choose which JS glue to use?

For filesystems I think we would follow the way it was done in emscripten + NaCl and have the ability to mount different filesystems on the same hierarchy. We'd have to agree on what a particular filesystem type means, though. So in your example "websql" could be the filesystem type, but what does that mean in a non-web embedding?

Are there other examples where we'd want to swap out the host's implementation?

@sunfishcode
Copy link
Member

I am concerned that this proposal, as currently phrased, seems to be aimed at encouraging the use of non-standard Web APIs.

@jfbastien
Copy link
Member Author

@binji

For filesystems I think we would follow the way it was done in emscripten + NaCl and have the ability to mount different filesystems on the same hierarchy. We'd have to agree on what a particular filesystem type means, though. So in your example "websql" could be the filesystem type, but what does that mean in a non-web embedding?

I'm thinking that the .wasm would just use the syscalls, and a developer would choose (in this example) which filesystem backs their syscall uses outside of the .wasm. Say you're in JS, that could be configured in your Emscripten build, or your Webpack build (hi @TheLarkInn). Say you're in a non-JS embedding that could be through the command-line.

Are there other examples where we'd want to swap out the host's implementation?

A few random ideas:

  • System settings such as PID, UID, groups, etc.
  • Process tracing / debugging.
  • TTYs / console control.
  • Forking and communication to other "processes" (pipes, etc).
  • Event loop / select / poll / epoll (which brings in some file descriptor things!).
  • Networking (on the web that can be just same-origin, or some WebRTC-based magic).

@sunfishcode

I am concerned that this proposal, as currently phrased, seems to be aimed at encouraging the use of non-standard Web APIs.

That's absolutely not my goal, and I think we can agree that such things are off the table as a design concern. If e.g. Chrome wants to implement the filesystem syscalls using HTML5 filesystem then go for it, but that's in no way tied to the present discussion.

@lukewagner
Copy link
Member

In the Web embedding, we don't have any constraints on the precise syscall ABI, so it means our choices will be arbitrary and hard to validate. It seems like we need some non-web embeddings to participate in this discussion--embedding that will implement the syscalls directly in the host--for the ABI to be meaningful. I have actually heard several non-web embeddings that are considering doing exactly such a thing, so perhaps we could reach out to them and get them together.

Another point is that while it seems like a syscall ABI would certainly relate to the toolchain, it seems like a much broader discussion than just "toolchain conventions". It's like a new wasm-ified POSIX standard.

@kripken
Copy link
Member

kripken commented Dec 6, 2017

These might be good goals here:

  • No or at least minimal regressions in emscripten in code size and perf. Example issue: should more date/time handling be done in compiled libc code (larger?), or call out to JS (slower?)
  • Also make sure we can support other existing browser filesystem libraries like BrowserFS.

@RyanLamansky
Copy link

My non-web implementation targets .NET Standard, which is fairly restrictive, though I could make a .NET Classic build that would have full access to the Windows API. If the goal here is to make POSIX for WASM, I could probably find a way to make most of it work...

@NWilson
Copy link
Contributor

NWilson commented Feb 23, 2018

I can see an argument in favour standardising the syscall ABI... but I'm not entirely convinced!

Pro:

  • The rest of the ABI (calling conventions, typedefs, C99 ABI, etc) is standardised
  • A stable ABI would assist in integrating libc implementations into different backends

Con:

  • The contract between libc and the "kernel" is not something applications should be relying on.
  • In particular, if the same project ships a both Musl port, and the embedder-side of the syscalls, that's a private contract for the implementer. That is, if you maintain a Wasm port of Musl and also provide the JavaScript syscalls, why should anyone else need to know about that? Toolchains ought to handle symbols "opaquely".
  • I expect that projects like Emscripten will continue to provide the JavaScript-side of syscalls, as well as their own Musl port. And, any other projects (like a .NET port) may inevitably end up maintaining their own Musl port that speaks to their .NET embedder side. Catering for all needs with one syscall ABI is a bit ambitious.

For what it's worth - I have my own Musl port and JavaScript syscalls implementation. I've called it Minscripten.

For my own experience, I'd suggest the following:

  • Please don't standardise the syscall ABI that Emscripten is currently using. It's the Linux x86 ABI, and it's just horrible - full of legacy grot, including bogus "narrow" versions of syscalls, duplicates of old/new syscalls, and 32-bit time_t. Please if you're going to standardise something don't standardise on 32-bit time_t!
  • I'd suggest using the x32 syscall ABI. It's clean and modern.
  • Minscripten's Musl port is in fact fairly suitable for Emscripten to adopt; I'd be happy to assist in any changes that might be needed, to make it a standard/shared port of Musl for Wasm.
  • Please don't standardise on a number-based ABI. Wasm modules should export functions with names like __syscall_open not __syscall123. My Musl port does this, I think it's quite a bit nicer. There's no reason to use numbered syscalls at all for Wasm.
  • Syscalls I've had to remove:
    • brk - handled internally in Wasm, not JavaScript
    • futex - will be handled internally in Wasm when the futex opcodes for Wasm arrive
    • madvise, mremap, mmap, munmap - handled internally in Wasm, not JavaScript. No need to jump to the kernel when Wasm can call grow_memory!
    • set_tid_address - ditto, Wasm can (and will) set this on its own side, no need for a syscall that I can think of yet
    • Obsolete syscalls, which are part of the x32 ABI but shouldn't be speced for Wasm: afs_syscall, getpmsg, putpmsg, security, tuxcall
    • Syscalls in the x32 ABI that are x86-specific, shouldn't be speced for Wasm: ioperm, iopl, vm86.

Finally, Wasm needs some new syscalls!

  • __syscall_localtime - needed to do timezone conversions in the "kernel". There's simply no way in a browser to read out the timezone database into the /usr/share/zoneinfo format; the JS APIs basically require forwarding the libc localtime call directly to the browser. Thus this needs a new syscall.
  • I'm sure some more things will emerge...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants