Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Libc for Wasm? #519

Closed
jcbeyler opened this issue Jan 19, 2016 · 20 comments
Closed

A Libc for Wasm? #519

jcbeyler opened this issue Jan 19, 2016 · 20 comments
Milestone

Comments

@jcbeyler
Copy link

Dear all,

As we move forward in handling WebAssembly, there are cases where calling libc methods and understanding how Wasm would interact well with those methods (and start figuring out the dynamic loading questions and issues). I’ve talked to a few people who have expressed interest in a Wasm-libc and so I thought I would ask here to see thoughts/comments. I imagine we can use some of the tools to create it either via binaryen for example. But, in my mind, that might not be exactly what we would want. We might want a smaller wasm-tuned libc for standard calls.

Is there any work planned to have a Wasm port of the libc ?
Are there reasons we would not do it?

In my case, I’m interested in it because it would allow me to create a stand-alone system using Wasm in a non-web world but wondered if there is interest outside of that.

Thanks,
Jc

@sunfishcode
Copy link
Member

We are currently using Emscripten/Binaryen's libc, which is based on musl. The code is here: https://github.com/kripken/emscripten/tree/master/system/lib/libc/musl

@jcbeyler
Copy link
Author

Seems to be here now as well: https://github.com/WebAssembly/musl

@jfbastien
Copy link
Member

@jcbeyler I only added upstream musl to this now. I talked to @kripken about how to move Emscripten's libc there, and allowing experimentation as well. Work still ongoing :)

@kripken
Copy link
Member

kripken commented Jan 26, 2016

A few more notes, stuff we talked about yesterday:

Emscripten's libc is essentially pure upstream musl, except:

  • We have an arch/ dir, alongside x86, arm, etc. It's basically the same as arm, but with tweaks to reduce code size and no arm-isms.
  • We don't build all parts, e.g. we don't build time/, since we don't want musl to handle e.g. timezones, we want to use the builtin support in the browser (less code to ship, and also always up to date).
  • We have a few minor fixes, like when musl does a weak alias of a function with one type to another, that can't work in asm.js, nor will it work in wasm. @sunfishcode suggested that clang might be able to automatically generate thunks that fix the type, so we might not need this eventually.

In other words, there isn't much to "port" for a libc, at this level - it just works. No reason not to use pure musl as it is, as emscripten does now, so I think it makes sense to use that. Of course, at the same time I'd be very curious to hear about other ways to do this, maybe we've overlooked some important optimization opportunities. As @jfbastien said, let's experiment in the repo he opened.

That's for libc, but libc does syscalls that need to go somewhere. musl isn't relevant here, it just issues syscalls that normally go into the linux kernel. In emscripten, those enter our syscall code in JS (library_syscall.js). In theory, once wasm can use web APIs, we might not need JS, but for quite some time we will. I think it makes sense to keep using emscripten's code here (and the same for emscripten's WebGL code, Web Audio code, etc. etc. - all that just works already, representing many years of work).

If we do want to upstream an arch/ dir to musl, it's an open question what to call it. wasm/ isn't quite right, since wasm/ isn't an arch like x86 and arm. For example, emscripten would want to use the upstreamed thing for both wasm and asm.js. And what about non-web embeddings of wasm, that might want a different syscall interface for some reason? Anyhow, the name for potential upstreamed code is an open question.

In any case, that can be left for later. We agreed that if/when we approach upstream with something official-sounding (like "wasm") then we'd need wide consensus across the wasm community before doing so.

@jcbeyler
Copy link
Author

I think the question that I still have is always (I know I know) for the non-web case. I'm not entirely interested in having musl go to JS to do the syscalls like you said. In the non-web case, you would want musl to go directly to the linux kernel (if we are on a linux system. In that case, can we get the wasm version of musl to call directly the linux syscalls?

@jfbastien
Copy link
Member

That's what we're discussing, but a secure sandbox would want to intercept syscalls anyways. We'll need to try things out: do you do it just through seccomp-bpf, or do you shim more? We also want it to work on non-Linux hosts!

FWIW NaCl does this to some degree, so it's not totally uncharted.

@kripken
Copy link
Member

kripken commented Jan 27, 2016

@jcbeyler: Sure, if you want, you can make a non-web embedding in which calls to the syscall import get sent directly as syscalls into the system kernel.

The one tricky thing (aside from security, but sounds like that's not a concern for you?) is portability, you'd need to fix up the syscalls and their arguments and so forth for your native syscall interface. In other words, the arch/ dir of the compiled musl will have definitions for various things (like the sizes and order of fields in structs), and your native linux kernel will have whatever x86 or x86_64 or arm or whatever conventions instead. You'd need a translation layer.

@kripken
Copy link
Member

kripken commented Jan 27, 2016

Or, you could have various builds of musl for wasm, one for x86, one for x86_64, and so forth, and if you made sure you're running an app built with the right one - you can send those syscalls directly to the kernel (aside from security).

@binji
Copy link
Member

binji commented Jan 27, 2016

@kripken Ah, I hadn't looked at emscripten recently I guess. I didn't realize you switched over to using the syscall interface. Last I remember many libc functions were implemented in JS, and musl wasn't used fully.

@kripken
Copy link
Member

kripken commented Jan 27, 2016

Yeah, we used to do a lot more in JS. This was refactored to the current model around a year ago.

@sunfishcode sunfishcode added this to the Meta milestone Feb 1, 2016
@jfbastien
Copy link
Member

I did a musl experiment, and @kripken has ported the asm.js port of musl to wasm. Other libcs could be done. Closing this.

@wallabra
Copy link

Isn't it maybe better to merge those libc library implementations into a single repository, and then include it into Emscripten and WemAssembly as a Git submodule? :)

@kripken
Copy link
Member

kripken commented Nov 24, 2019

I doubt we'll ever have a single libc for all wasm compilers. Different ones care about different APIs (WASI, POSIX, etc.) have different priorities (code size, portability, etc.), and different porting targets (Web, server, plugins, blockchain, etc.).

@sunfishcode
Copy link
Member

I disagree, and hope the WASI libc will be a single libc for most wasm compilers. It isn't as optimized as it could be for all use cases today, but the overheads that have been discussed recently are modest and have been getting smaller as we've been optimizing them. And while there's more work to do, it's on a path to participate in the broader "secure by default" vision of wasm.

Full POSIX is something no wasm libc has; it isn't practical without core language changes. Even fork, which is technically possible to implement in a POSIX-conforming way in wasm, doesn't make sense to have as a core building block for a wasm ecosystem, as it is in POSIX. So, which parts of POSIX are important to support?

There are some differences of opinion on specific features, such as the well-known but also limited and quirky user/group/other permissions system, which WASI doesn't expose right now. The current discussion would benefit from having more examples of where these features would be useful.

@kripken
Copy link
Member

kripken commented Nov 24, 2019

@sunfishcode To be clear, all I'm saying is the noncontroversial claim that I doubt a one-size-fits-all approach will be optimal for all use cases. That's usually true everywhere 😃 , but also true here in my experience of implementing libc in emscripten.

About code size, most codebases I see using Emscripten are written against libc and POSIX. Whenever a WASI API differs just in some constant values that's fine of course, but whenever the API is different in some way, we may need to add a little translation code. That is generally not a lot per call, but it adds up. Also, in Emscripten we care a lot about the JS side's code size too, and we design additional C APIs on top of libc with that in mind (say, event handling APIs).

This issue will increase over time, as when wasm gets Interface Types, wasm on the Web will have a compact and efficient way to access a huge existing API surface. For example, emscripten's libc already uses JS + Web APIs for date/time instead of musl + syscalls, because it's smaller. With Interface Types, that will get even smaller (and faster!).

About the "quirky" POSIX permissions issue, consider this real-world use case: The Emscripten compiler itself depends on python and node when running on the developer's machine. It would be nice if we could replace those two deps with a wasm runtime + wasm ports of python and node. But then we do want all the "quirky" POSIX flags for opening files and low-level access. (Note: I'm pretty sure we don't need fork or anything else that may be a serious problem to add.)

More specifically, we literally want (a subset of) POSIX in that case - or maybe even Linux/Mac/Windows more specifically - because we want to use wasm's CPU portability, but not any type of OS-level portability. WASI does both at once.

Also, WASI adds a specific form of sandboxing. If I just want to be able to run a wasm'd python or node, I want the same commandline interface. I don't want to need to specify a bunch of files to preopen - consider that python will likely read/write files in the user's home dir, temp dir, etc., and others, and not just the actual files mentioned on the commandline.

Of course WASI's sandboxing is great! None of this comment is a criticism of WASI in any way. But WASI and the wasm-libc satisfy one family of use cases. Again, I don't believe there is a one-size-fits-all solution in this space, because it's a big space 😃

@sunfishcode
Copy link
Member

For users that just want to abstract over CPU architectures and are fine using Linux/Mac/Windows APIs, to use your example, wasm's linear memory construct would also often be unnecessary overhead and complexity. Yet, while wasm's inabilty to specialize for individual use cases at this level is a weakness, it may be counterbalanced by the strength of having a greater ecosystem of tools, libraries, and implementations which work well together.

We see WASI as an opportunity to extend wasm's strengths. It's on a path to use interface types and fit well within shared-nothing linking and nanoprocesses. It'll be modular, so not entirely one-size-fits-all. And there are still many opportunities to optimize it, make it more capable, and make it easier to work with.

@kripken
Copy link
Member

kripken commented Nov 27, 2019

@sunfishcode

100% agreed! My point is that there are also other ways to extend wasm's strengths, for different types of use cases.

You have a very specific picture there, definitely an interesting and compelling one (with the tradeoffs you mention), but at the same time, the ability to directly access POSIX for example may be very useful for some things, and there are lots of other things (plugins, blockchain, etc.).

@sunfishcode
Copy link
Member

Blockchain use cases, and many plugin use cases too, still need to be sandboxed. It's always possible to wrap custom sandboxing around any kind of API, and for many use cases today, that's the most expedient solution. However, it isn't the only solution, and it's not the best solution over time.

@sbc100
Copy link
Member

sbc100 commented Nov 27, 2019

However, it isn't the only solution

From what I read @kripken never suggested it was the only solution. It sees like he is saying quite the opposite in almost every comment.

It sounds to me like you are more or less in agreement now that there isn't only one solution here. Different users can demand different types of sandboxing and thats OK.

@sunfishcode
Copy link
Member

"Every use case does sandboxing in its own way" is a kind of meta-solution, from an ecosystem perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants