WIP: [Swift+WASM] initial support for compiling Swift to WebAssembly #24684
What's in this pull request?
This pull request adds initial support for compiling Swift code to WebAssembly.
"Hello world" works, and a large subset of the stdlib already works on WebAssembly.
You can try this yourself with our cloud-hosted toolchain incorporating this port:
Links to issues
See SR-9307 for some background and the existing Swift+WASM changes in Swift that this patch depends on.
Also see emscripten#2427 for discussion and previous attempts on porting Swift to WebAssembly, which this port draws heavily upon.
Thank you to everyone who helped make this possible:
If you would like to help, you can join us at https://github.com/swiftwasm.
Status of the port
This port is not ready for merging. The biggest blocking issues currently are:
We're opening this pull request now to get advance feedback and advice, so we can fix these remaining issues and start cleaning up our patches.
Per Swift's contributing guidelines, we're planning to split each change into a separate pull request.
We've already created pull requests for some minor changes:
We welcome advice on how best to submit these changes for review.
Note that this port also requires changes to Clang and LLVM: the corresponding pull requests are:
Here's a more detailed explanation of the changes included in this pull request, and on what still needs to be done.
How WebAssembly differs from other platforms
WebAssembly is a new platform, with unique attributes that pose issues to Swift's runtime.
So to access the metadata entry at 0x4 bytes past the table pointer, a symbol is exported using LLVM's
Other metadata use similar aliases, again to export a symbol with an offset inside another symbol
Clang doesn't emit alias directives with offset, so this wasn't already implemented by LLVM's Wasm backend. This change adds it.
What still needs to be done
Most important: fix swift calling convention and extra arguments
WebAssembly has strict function signature checking, so this crashes:
Why? Optional.Map takes a throwing closure, but is passed a non-throwing closure.
A throwing closure is compiled down to a signature similar to this:
A non-throwing closure has signature like this:
without an end error pointer.
Swift assumes passing extra parameters to a function pointer is ignored, so it doesn't generate a thunk if a non-throwing closure is called as a throwing closure, or if a thin function is called as a thick function.
lib/IRGen/GenFunc has a comment that explains this further.
This assumption is valid on all platforms that Swift currently supports, but doesn't work on WebAssembly thanks to the strict signature checking.
Unfortunately, I have no idea how to fix this.
Modifying the Swift compiler to generate the thunks might be difficult. Currently, thunks are only generated when calling to a function with different calling conventions, not between functions with the same calling convention but different number of arguments. SIL's SILFunctionType doesn't even track if a function throws.
I've never worked with Swift compiler internals, so I don't even know where to start modifying IRGen.
I asked @jrose-apple, who suggested that one short-term alternative is to standardize all swiftcall functions to take only one extra parameter:
extraArgs would point to an area on the stack, containing swiftself, swifterror, and any other extra parameters.
This way, thin, thick, and throwable thick functions would have the same number of arguments.
I'm guessing this would require either Clang+Swift changes or an LLVM pass.
I found an example in Chrome PNaCL that transforms function arguments https://chromium.googlesource.com/native_client/pnacl-llvm/+/mseaborn/merge-34-squashed/lib/Transforms/NaCl/ExpandVarArgs.cpp#170, so the LLVM pass might not be too complicated.
We would really appreciate help and advice on how best to approach this.
Reenable and run tests
Upstream the LLVM patches
Support building Swift stdlib for Wasm using a macOS host
Split this PR into small, reviewable chunks
Get Swift's other libraries working
Support link-time optimization
Rather than spill to the stack, it should be sufficient to make the two swiftself and swifterror arguments always be provided, and leave them undef when they aren't needed. Those are the only two extra arguments that you should need to worry about. Since there's already a distinct swiftcc convention at the LLVM level, it seems natural to me to introduce these arguments, if they don't exist in the LLVM signature, into the wasm-level signature in the backend.
Note that swifterror isn't a real pointer-to-pointer argument for the x86 or arm backends, but really represents an in-out register. If it's possible to model it that way in wasm too, it'd probably lead to better native code size and quality when lowered to native code.
jckarter left a comment •
If you're never emitting relative references to begin with, you ought to be able to stop emitting GOT-relative pointers altogether. They're only necessary in native code image formats for referencing symbols from other binaries that don't have fixed relative offsets. You should be able to chase down all the IRGen helpers that return
@jckarter That was one of the alternatives that @jrose-apple suggested. However, this may not work for all methods: According to GenFunc.cpp there could be extra "witness_method generic parameters" after the error parameter, so just adding two parameters might not be enough.
Also, re swiftcc calling convention: Wasm doesn't have registers: it is a stack machine, similar to the Java virtual machine.
@zhouwei For witness table entry points, there is still a consistent calling convention that all witnesses use. That logic duct tapes over some representation issues in SIL; they should end up ultimately lowering to compatible LLVM level signatures.
If wasm is a stack machine, then it may still be worth considering the intent of the swiftcc special arguments in designing how it lowers to the stack machine representation. The
The error argument is similarly intended to be mapped to a fixed, normally callee-preserved register, which the caller sets to zero, and the callee sets to nonzero on error. This is so that nonthrowing functions are ABI compatible when used as throwing functions, and so that propagating errors through multiple stack frames can be done with minimal code size cost for the test and early return. For a stack machine, it seems like we don't really gain anything from passing a value in to the callee, since it's always zero or undef, but you could push the error value after the primary return value when the callee returns, so that the caller can easily test it and either pop or return to propagate the error upward.
@ddunbar Swift's intended use is to reduce load time for memory mapped native code binaries. Since wasm AIUI generally goes through another compilation stage on the client, it seems to me like the cost of relocating wouldn't be that big a part of the load time cost. Hopefully the wasm binary format already has a reasonably efficient way of representing references to local symbols...
@jckarter Wasm call instructions seem to consume the arguments from the stack: https://godbolt.org/z/NjLc0f so it might not matter whether self is pushed first or last, from a code density or performance point of view.
I have no experience working on compilers, though, so that's just my guess. for what it's worth, Java's JVM, which has a similar stack machine and call semantics, pushes the self pointer first, but I'm not sure why they do that.
Some of them do reduce in-memory static data size too, but maybe that's not worth pushing a whole feature through WASM.
# Conflicts: # stdlib/public/Concurrency/Mutex.cpp