New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Libc for Wasm? #519
Comments
We are currently using Emscripten/Binaryen's libc, which is based on musl. The code is here: https://github.com/kripken/emscripten/tree/master/system/lib/libc/musl |
Seems to be here now as well: https://github.com/WebAssembly/musl |
A few more notes, stuff we talked about yesterday: Emscripten's libc is essentially pure upstream musl, except:
In other words, there isn't much to "port" for a libc, at this level - it just works. No reason not to use pure musl as it is, as emscripten does now, so I think it makes sense to use that. Of course, at the same time I'd be very curious to hear about other ways to do this, maybe we've overlooked some important optimization opportunities. As @jfbastien said, let's experiment in the repo he opened. That's for libc, but libc does syscalls that need to go somewhere. musl isn't relevant here, it just issues syscalls that normally go into the linux kernel. In emscripten, those enter our syscall code in JS ( If we do want to upstream an In any case, that can be left for later. We agreed that if/when we approach upstream with something official-sounding (like "wasm") then we'd need wide consensus across the wasm community before doing so. |
I think the question that I still have is always (I know I know) for the non-web case. I'm not entirely interested in having musl go to JS to do the syscalls like you said. In the non-web case, you would want musl to go directly to the linux kernel (if we are on a linux system. In that case, can we get the wasm version of musl to call directly the linux syscalls? |
That's what we're discussing, but a secure sandbox would want to intercept syscalls anyways. We'll need to try things out: do you do it just through seccomp-bpf, or do you shim more? We also want it to work on non-Linux hosts! FWIW NaCl does this to some degree, so it's not totally uncharted. |
@jcbeyler: Sure, if you want, you can make a non-web embedding in which calls to the The one tricky thing (aside from security, but sounds like that's not a concern for you?) is portability, you'd need to fix up the syscalls and their arguments and so forth for your native syscall interface. In other words, the |
Or, you could have various builds of musl for wasm, one for x86, one for x86_64, and so forth, and if you made sure you're running an app built with the right one - you can send those syscalls directly to the kernel (aside from security). |
@kripken Ah, I hadn't looked at emscripten recently I guess. I didn't realize you switched over to using the syscall interface. Last I remember many libc functions were implemented in JS, and musl wasn't used fully. |
Yeah, we used to do a lot more in JS. This was refactored to the current model around a year ago. |
Isn't it maybe better to merge those libc library implementations into a single repository, and then include it into Emscripten and WemAssembly as a Git submodule? :) |
I doubt we'll ever have a single libc for all wasm compilers. Different ones care about different APIs (WASI, POSIX, etc.) have different priorities (code size, portability, etc.), and different porting targets (Web, server, plugins, blockchain, etc.). |
I disagree, and hope the WASI libc will be a single libc for most wasm compilers. It isn't as optimized as it could be for all use cases today, but the overheads that have been discussed recently are modest and have been getting smaller as we've been optimizing them. And while there's more work to do, it's on a path to participate in the broader "secure by default" vision of wasm. Full POSIX is something no wasm libc has; it isn't practical without core language changes. Even There are some differences of opinion on specific features, such as the well-known but also limited and quirky user/group/other permissions system, which WASI doesn't expose right now. The current discussion would benefit from having more examples of where these features would be useful. |
@sunfishcode To be clear, all I'm saying is the noncontroversial claim that I doubt a one-size-fits-all approach will be optimal for all use cases. That's usually true everywhere 😃 , but also true here in my experience of implementing libc in emscripten. About code size, most codebases I see using Emscripten are written against libc and POSIX. Whenever a WASI API differs just in some constant values that's fine of course, but whenever the API is different in some way, we may need to add a little translation code. That is generally not a lot per call, but it adds up. Also, in Emscripten we care a lot about the JS side's code size too, and we design additional C APIs on top of libc with that in mind (say, event handling APIs). This issue will increase over time, as when wasm gets Interface Types, wasm on the Web will have a compact and efficient way to access a huge existing API surface. For example, emscripten's libc already uses JS + Web APIs for date/time instead of musl + syscalls, because it's smaller. With Interface Types, that will get even smaller (and faster!). About the "quirky" POSIX permissions issue, consider this real-world use case: The Emscripten compiler itself depends on python and node when running on the developer's machine. It would be nice if we could replace those two deps with a wasm runtime + wasm ports of python and node. But then we do want all the "quirky" POSIX flags for opening files and low-level access. (Note: I'm pretty sure we don't need More specifically, we literally want (a subset of) POSIX in that case - or maybe even Linux/Mac/Windows more specifically - because we want to use wasm's CPU portability, but not any type of OS-level portability. WASI does both at once. Also, WASI adds a specific form of sandboxing. If I just want to be able to run a wasm'd python or node, I want the same commandline interface. I don't want to need to specify a bunch of files to preopen - consider that python will likely read/write files in the user's home dir, temp dir, etc., and others, and not just the actual files mentioned on the commandline. Of course WASI's sandboxing is great! None of this comment is a criticism of WASI in any way. But WASI and the wasm-libc satisfy one family of use cases. Again, I don't believe there is a one-size-fits-all solution in this space, because it's a big space 😃 |
For users that just want to abstract over CPU architectures and are fine using Linux/Mac/Windows APIs, to use your example, wasm's linear memory construct would also often be unnecessary overhead and complexity. Yet, while wasm's inabilty to specialize for individual use cases at this level is a weakness, it may be counterbalanced by the strength of having a greater ecosystem of tools, libraries, and implementations which work well together. We see WASI as an opportunity to extend wasm's strengths. It's on a path to use interface types and fit well within shared-nothing linking and nanoprocesses. It'll be modular, so not entirely one-size-fits-all. And there are still many opportunities to optimize it, make it more capable, and make it easier to work with. |
100% agreed! My point is that there are also other ways to extend wasm's strengths, for different types of use cases. You have a very specific picture there, definitely an interesting and compelling one (with the tradeoffs you mention), but at the same time, the ability to directly access POSIX for example may be very useful for some things, and there are lots of other things (plugins, blockchain, etc.). |
Blockchain use cases, and many plugin use cases too, still need to be sandboxed. It's always possible to wrap custom sandboxing around any kind of API, and for many use cases today, that's the most expedient solution. However, it isn't the only solution, and it's not the best solution over time. |
From what I read @kripken never suggested it was the only solution. It sees like he is saying quite the opposite in almost every comment. It sounds to me like you are more or less in agreement now that there isn't only one solution here. Different users can demand different types of sandboxing and thats OK. |
"Every use case does sandboxing in its own way" is a kind of meta-solution, from an ecosystem perspective. |
Dear all,
As we move forward in handling WebAssembly, there are cases where calling libc methods and understanding how Wasm would interact well with those methods (and start figuring out the dynamic loading questions and issues). I’ve talked to a few people who have expressed interest in a Wasm-libc and so I thought I would ask here to see thoughts/comments. I imagine we can use some of the tools to create it either via binaryen for example. But, in my mind, that might not be exactly what we would want. We might want a smaller wasm-tuned libc for standard calls.
Is there any work planned to have a Wasm port of the libc ?
Are there reasons we would not do it?
In my case, I’m interested in it because it would allow me to create a stand-alone system using Wasm in a non-web world but wondered if there is interest outside of that.
Thanks,
Jc
The text was updated successfully, but these errors were encountered: