Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of atomic.wait on the main thread seems limiting to a fault #106

Closed
alexcrichton opened this issue Oct 11, 2018 · 28 comments
Closed

Lack of atomic.wait on the main thread seems limiting to a fault #106

alexcrichton opened this issue Oct 11, 2018 · 28 comments

Comments

@alexcrichton
Copy link
Contributor

I've started to experiment with wasm threads, Rust, and wasm-bindgen recently to see how well our story shapes up there. The good news is that it's all working pretty well! On basically every demo I've written so far, though, I've very quickly run up against the wall of atomic.wait instructions are not allowed on the main thread (they throw an error). I'm currently testing in Firefox, but I think this behavior is mirrored in other implementations?

On the surface and in abstract the lack fo atomic.wait definitely makes sense. Reducing jank is always good! In practice, though, I've found this severly limiting when trying to write applications. The use case I'm exploring currently is to instantiate a WebAssebly.Instance on the main thread, and then postMessage that instance's module and its shared memory to a set of worker threads. That way the main wasm thread (the main application) can enjoy features like DOM access while the worker threads can do the workhorse of all the work. In this model, some gotchas arise pretty quickly.

Most of the gotchas can be categorized as "it's really hard for libraries to avoid blocking synchronization". All code executed on the main thread, and all libraries it links to, can't use any form of blocking synchronization (like mutexes). Some cases where this come up quickly are:

  • Memory allocators - the Rust standard library provides a global memory allocator, for example, which is currently a translation of dlmalloc. To make this safe to use in a multithreaded scenario, access to the global allocator is synchronized with a mutex. (can't really imagine a world where memory allocation is asynchronous...). It's really hard for the main thread to entirely avoid allocating memory, or for sub-workers to all avoid allocating memory.

  • Synchronizing messages - one of the first problems I ran into was accidentally attempting to lock memory to read it on the main thread. Without atomic.wait the only way (I think?) for a worker to synchronize with the main thread (aka wake it up to an event) is via postMessage. A worker (in abstract) doesn't even know if that'll wake up the main thread as well! (sub-workers and such).

    While it's not the worst thing in the world to provide custom synchronization at the app level, this makes me very wary to use any library that has synchronization at all on the main thread. If any library anywhere uses a mutex, even if just for a short period of time, it's not usable on the main thread as it may occasionally throw an exception.

    Put another way, it seems like all existing threading-related libraries almost cannot be used by default. Even libraries that provide the ability to specify a custom method to send notifications are at risk of using a mutex for short periods of time to protect some data.

Putting this all together seems like it basically means that the entire main thread for an application has to be entirely user-written and use very few libraries (only those audited to be used on the main thread or saying they don't have synchronization). Even then, I'm not sure how the memory allocation issue would be solved. Additionally it seems like synchronization primitives will almost always have to be hand-rolled for each application, always using postMessage to communicate from the main thread to workers and back.

Coming out of this is a few questions:

  • Is there any way this restriction can be lifted?
  • Failing that, can it be partially lifted somehow?
  • Failing that, is it expected that this is simply a pattern that's not used in the wild? Would shared memory wasm modules basically entirely live in workers and main thread wasm modules would never use shared memory?

Ideally these problems could be solved by simply saying "atomic.wait is ok on the main thread", but that of course brings back the jank problem. Some of the possible solutions (like for the memory allocator problem) could be "just use a spin lock if it's short", but I'm not sure how that's better than just allowing atomic.wait on the main thread? Maybe there's recourse for something like "you can use atomic.wait only on the main thread if you specify a small timeout". For example it takes Firefox N seconds to say "your script is slowing the page down", could that be the maximum timeout for atomic.wait?

In general I'm also curious to hear others' thoughts on this as well. Is sharing a wasm module on the main thread with worker threads just a pipe dream? Are there other ways to work around this issue?

@lars-t-hansen
Copy link

lars-t-hansen commented Oct 11, 2018

The short answer is that browsers are "never" going to allow the main thread to wait. This really has little to do with jank (even if that was a concern early on) but is mainly about implementation reality; browsers use the main thread for all sorts of housekeeping tasks on behalf of other threads and allowing the main thread to be blocked by user content is a recipe for locking up the browser and/or breaking user programs.

Specifically, the main thread may be required to do work on behalf of worker threads that are not themselves blocking from user content, but which are in fact blocking while waiting for the main thread to perform the work, and if the main thread is blocked in user content the worker threads will not be able to advance to the point where they can do the work that unblocks the main thread.

We went over this a (large) number of times for JS and that's how it's going to be. We went over all the workarounds you propose, and found them wanting for that reason; it doesn't matter how you wait; it's that you wait that's the problem, since the main thread may not be able to do work that you think is concurrent while you're making the main thread wait.

As a technical matter, JS and Wasm allow any thread to wait, but there is a flag on the agent, ie on the thread, called [[CanBlock]], available only to the embedder, that determines whether wait throws or not on that thread. Browsers set that to false for the main thread. See here.

Put another way, it seems like all existing threading-related libraries almost cannot be used by default.

By and large, they can't work on the browser's main thread, they have to be moved into a worker.

Is there any way this restriction can be lifted?

Not truly, though see below for a workaround that has some traction.

Failing that, can it be partially lifted somehow?

For main-thread code written in JS, there is a pattern ("asynchronous wait") that probably works, where you call Atomics.waitAsync on a waitable location; this does not block, but returns a promise, and when the wake arrives the promise is resolved. This is in the process of being standardized though we've not done much work on it lately since Spectre pretty much put a block on shared memory in the browser; I expect work to continue eventually though.

Failing that, is it expected that this is simply a pattern that's not used in the wild? Would shared memory wasm modules basically entirely live in workers and main thread wasm modules would never use shared memory?

We sort of expect that main thread modules will use shared memory, because the main thread will be important for interacting with the DOM and the browser in general and it is therefore the main conduit for I/O with the browser, but the mechanisms by which these main thread modules synchronize will have to be something other than classical locks - as in the case of JS.

Again, Spectre halted us in our tracks wrt exploring this territory. Up until then, we were moving to a state where entire apps were sequestered to clusters of workers, communicating on some kind of channel with the main thread. This channel can be in shared memory, provided the main thread can find a workable solution for synchronization; or it can be partly in shared memory (for the bulk of data) and partly with postMessage (for synchronization / wakeup); or we can try to come up with something better.

With object support - just anyref is sufficient - in Wasm there's really no reason why Wasm can't call JS's Atomics.waitAsync directly and just return the object to JS, though of course that requires unwinding the wasm stack at this point - not exactly desirable. If we get something like JS's async into wasm, or more likely a basic notion of coroutine, then this changes fundamentally. Main thread code can waitAsync, then do a directed yield to a couroutine that will return to JS, passing the Promise object along with it; when the promise is resolved it can call back into Wasm, which does a directed yield back to the coroutine that blocked. It's not fast, but it may be adequate, and it works for the web (and current browser architectures).

Another thing that we could envision is that the "main thread" in a wasm application is always some kind of coroutine that can block in the normal way (ie, it's actually a thread in the implementation), with directed yields to this coroutine on entry to wasm and directed yields back to the coroutine that represents the JS main thread on callouts to JS. This is not without peril but perhaps worth talking about.

And finally, will Wasm threads be full Web Workers? Probably not. So this story is pretty open still.

EDIT: Clarified some of the blue-sky discussion re coroutines.

@alexcrichton
Copy link
Contributor Author

Thanks so much for taking the time to read and respond @lars-t-hansen! (and so quickly and thoroughly!). This definitely clears things up for me wrt the current state of affairs, I had no idea the main thread was so important for other assorted tasks! Also sorry about this but I definitely should have led with "I don't want to reopen any old wounds", I can only imagine the amount of debate about SharedArrayBuffer that's already happened!

I did want to try to clarify one point though:

We sort of expect that main thread modules will use shared memory, because the main thread will be important for interacting with the DOM and the browser in general and it is therefore the main conduit for I/O with the browser, but the mechanisms by which these main thread modules synchronize will have to be something other than classical locks - as in the case of JS.

This makes sense to me! It's definitely the end state we'd like to land in for Rust (and I imagine other languages would like this as well) where wasm can drive DOM operations (and even quickly with host bindings!) as well as doing compute-heavy tasks externally in workers.

I was almost totally sold on the "it's ok to have custom synchronization" point when I was working on a small raytracing demo with threads. Then I ran into an exception where the main thread executed atomic.wait while acquiring the (Rust-specific) global mutex for the memory allocator. That pushed me over the hump to open this issue and see others' thoughts on this.

This may actually be better titled "how are memory allocators supposed to work?" rather than atomic.wait in general for the main thread. Memory allocation seems so fundamentally core to language runtimes (Rust, C, C++, etc) that I don't think something like Atomics.waitAsync in wasm would help much. I'm not sure we could really write a standard library where memory allocation was asynchronous!

This does mean, though, that the primary motivation for opening this issue, memory allocator synchronization, may have a more focused solution. I was talking with @tschneidereit this morning and it sounds like there's discussions for a "standard libray libc-like thing" for wasm which might be able to come with a memory allocator, and if implemented by the browser it could presumably implement synchronization safely (as it's known it wouldn't block the thread for too too long).

Do you (or others?) have thoughts though on how to solve this in the near (or long?) term? Is there perhaps a convention we could shoehorn into most language runtimes to work well with the vision of "main thread I/O workers compute" while both can allocate memory? One idea we had was to just have the main thread spin loop waiting for the lock to be released (but all workers would atomic.wait), but that seems "basically equivalent" to atomic.wait, only a worse implementation :(


(also FWIW I don't fully understand the coroutine idea, but it sounds quite promising!)

@binji
Copy link
Member

binji commented Oct 11, 2018

One idea we had was to just have the main thread spin loop waiting for the lock to be released...

Right, that has the same issue in that none of the browser's main thread work can execute.

This may actually be better titled "how are memory allocators supposed to work?" rather than atomic.wait in general for the main thread.

One solution would be to give the main thread its own pool to allocate from. If this pool is ever exhausted, it can synchronously call memory.grow, which synchronizes with other threads. So you know that if the grow succeeds the pages you've allocated are not used by any other threads.

@tlively
Copy link
Member

tlively commented Oct 11, 2018

Another way to dodge this issue is to run all user code in workers and use asynchronous or proxying interfaces to interact with Web APIs through a (non-allocating) shim that runs on the main thread.

@alexcrichton
Copy link
Contributor Author

@binji it's true yeah we're still locking things up! I think my broader point is that given the fact that the main thread can't execute atomic.wait at all I don't (personally at least) see a future where it's possible for an app to use a shared array buffer with the main thread and workers, and the main thread has code that wasn't meticulously crafted to work "just right". Put another way, it seems infeasible for there to be a smooth and easy story for using wasm on both the main thread and workers.

It's definitely possible to have thread-local allocators and not too hard to set up! That has the restriction, though, that by default you can't actually send the memory to other threads to get deallocated. It... may be possible though to architect an allocator like this? If each thread had its own allocator and could "deallocate" memory from any thread, we could semantically allocate memory without acquiring a lock and also free memory without acquiring a lock. Such an allocator would just mean that if you constantly allocate memory on the main thread and then free it on the worker threads you'd quickly run out of memory...

@tlively yeah definitely! I was under the impression, though, that one of the goals of the threads+wasm proposal was to have a shared module on the main thread and worker threads. If that use case isn't desired then there's certainly no issue at all :). Right now it seems like that's the only feasible way to architect an app (all workers or only on the main thread). That model, however, is much more difficult to program against, I think, when you're an arbitrary library and you're trying to work within most applications that might you use (both threaded and not)

@Pauan
Copy link

Pauan commented Oct 11, 2018

If each thread had its own allocator and could "deallocate" memory from any thread, we could semantically allocate memory without acquiring a lock and also free memory without acquiring a lock. Such an allocator would just mean that if you constantly allocate memory on the main thread and then free it on the worker threads you'd quickly run out of memory...

Just some musings: could the allocators use postMessage (or similar) to tell the other allocator "please deallocate this region of memory"?

Deallocation would now be asynchronous (though this fact is hidden from the programmer), but that's better than leaking memory, right?

@lars-t-hansen
Copy link

@alexcrichton

I was under the impression, though, that one of the goals of the threads+wasm proposal was to have a shared module on the main thread and worker threads. If that use case isn't desired then there's certainly no issue at all :).

It's one thing is for the module to be shared; that doesn't mean the main thread and the worker threads have to run the same code, they "just" have to be compatible. (Once wasm threads are actual threads and not web workers the sharing will be a fact of life in any case.) Of course this is awkward but the reality of the web is that it is asynchronous and these concessions have to be made.

I like the observation that this is in some sense more fundamentally about memory allocation than anything else. (Really, about managing any resource from a shared pool.) But it probably follows from the asynchronicity of the web that this management must at least in some ways be asynchronous.

But you can sometimes choose where to put your asynchronous operations. Suppose, for example, you create an infallible allocator (used by all threads) that has a lock-free data structure over a set of size-segregated free lists so a lock won't normally be needed for allocation or deallocation; where you fall back to trying to take the heap lock to grow the heap when you can't allocate, and this will usually succeed for any thread; and where the main thread's fallback for failing to take the heap lock is to just execute memory.grow and use memory thus obtained. Provided the main thread is always the one responsible for growing the heap anyway - via some synchronization with the workers which does not have to be postMessage-based, see below - when more memory is needed anywhere, then this will work fine, except when the heap is completely exhausted and you could have done better by stopping all the other threads and coalescing the memory etc... You can safely make the main thread the heap-grower for everyone because the main thread is guaranteed not to block... It's possibly not the most efficient thing you can think of, but that's the cost of insisting that the main thread be using synchronous operations.

Speaking of synchronization, a couple of points worth making.

The performance of postMessage is usually fairly awful as it involves a lot of browser machinery. A mechanism like Atomics.asyncWait executed at the outer level (ie, the JS level) of the application that is still mainly wasm code will likely perform better than waiting for an event and decoding that. So a worker needing to grow the heap would simply set a flag, signal a wakeup, and go to sleep waiting for a response; the main thread would either see the flag because it's polling now and again, or it would receive the wakeup and process the request and provide a result and wake the sleeping worker.

One weird aspect of an asynchronous event-driven design with promises is that a thread - the main thread for sure, but also all the others - represents an unbounded number of concurrently existing coroutines. By means of Atomics.asyncWait a thread can in fact be sleeping on many locations simultaneously, in particular, the main thread can be sleeping once per worker thread. It can indeed be sleeping many times per worker thread, perhaps once for each service the worker thread expects the main thread to perform on its behalf. (At the moment, this only works at the outermost JS level, but it's a start.) This can remove event decoding costs and further improve on performance relative to postMessage events.

@alexcrichton
Copy link
Contributor Author

Oh fascinating, the usefulness of Atomics.asyncWait didn't really click with me until your explanation @lars-t-hansen, and I definitely agree with what you're saying! Using Atomics.asyncWait (through JS) solves other unrelated issues I've had with "how to abstractly send a signal to a specific thread", and that works great as it means that main thread has another way to get woken up if it's asleep (other than postMessage).

I also like your generalization of management of shared resources. While a memory allocator is probably one of the first issues to come up it's surely not the last! I'd definitely want to prove this out though to see how feasible it is to implement this strategy (particularly of an allocator). It's not totally clear to me yet if we can write an allocator which doesn't require intrusive support from the top-level application, but it seems more plausible than when I first opened this issue :)

I'd be ok closing this issue for the topic of memory allocators in that it seems like there may be a viable path forward, and making progress seems like it requires at least some experimentation. I'm still somewhat worried about the implications of no atomics.wait on libraries in general, though. It seems like it's quite common to use a mutex, for example, for temporary synchronization around global data structures, shared operations, etc. The naive implementation of a mutex means that you don't actually see an exception unless contention happens (which is probably rare), and a non-naive version of a mutex which executes atomic.wait with a timeout of 0 just to see if it works may be too unusable for the main thread still where lots of low-level resources may be protected.

It seems ok though to wait-and-see what happens here, whether it's actually a big problem in practice or whether a solution for memory allocators lends itself to a solution for other usages as well. One aspect that may not work is the main thread is empowered in memory allocation to fall back to memory.grow on contention, but for an arbitrary resource the main thread can't do this and has no fallback recourse.

@sbc100
Copy link
Member

sbc100 commented Oct 23, 2018

I feel like this issue, and issues that stem from it, are the biggest blocker for bringing existing portable multi-threaded applications to wasm.

Emscripten tries to paper over this issue by busy waiting on the main thread: https://github.com/kripken/emscripten/blob/incoming/src/library_pthread.js#L995
This can be enough to get some simple programs running, and indeed might solve some of your issues in the short term @alexcrichton.

However, for real world applications, as @lars-t-hansen points out, it can lead to deadlocks. One solution that emscripten has for this is PROXY_TO_PTHREAD which moves all the user code into worker and only runs main-thread-specific code on the main thread. Of course this means that user code looses synchronous access to a lot of APIs (including the DOM) but for running portable multi-threaded coded it seems like the only general purpose option.

@Macil
Copy link

Macil commented Oct 23, 2018

One solution that emscripten has for this is PROXY_TO_PTHREAD which moves all the user code into worker and only runs main-thread-specific code on the main thread. Of course this means that user code looses synchronous access to a lot of APIs (including the DOM) but for running portable multi-threaded coded it seems like the only general purpose option.

Firefox allows web workers to directly control canvases; maybe more APIs could be added along these lines to keep the entirely-in-worker setup simple and low-latency. Pre-existing multi-threaded codebases that are being ported to WASM probably don't expect to use the DOM much besides for writing to a canvas and reading inputs.

Multi-threaded codebases that are written from scratch for heavy direct DOM usage (like rewrites of javascript front-end UI frameworks) can be architected from the start to deal with the lack of locks on the main thread. If the codebase is made with the restriction in mind from the start, it's probably easier to accomplish things like each thread having its own memory allocator or similar alternatives.

@rajsite
Copy link

rajsite commented Oct 24, 2018

Multi-threaded codebases that are written from scratch for heavy direct DOM usage (like rewrites of javascript front-end UI frameworks) can be architected from the start to deal with the lack of locks on the main thread.

This may be too off topic but one concern I'm having in this area is handling DOM events from a worker wasm context. In order to have cancellable events you need main thread synchronization and the only approach I can think of so far is to spin-lock in the JS event handlers. It gets tricky making even simple form handlers that disable buttons reliably during processing without that synchronization. I think this also applies to apis that require user gestures to perform.

Edit: Actually I remember @developit commenting on a presentation which mentioned an experimental API called transferable events which may be the right direction for DOM APIs. Maybe purpose built apis like transferable events and some allocation api could be the approach over generic primitives like atomic.wait on the main thread.

@vincentriemer
Copy link

@rajsite Re user gesture requirement: I've been working on a framework that runs user code in a webworker, so I've experienced this first hand, and through some research I found this which I've confirmed solves the user-gesture requirement issue (tested in chrome with the experimental flag enabled).

My biggest concern with disallowing wait on the main thread (as you mentioned) is preventDefault-ing events from a different thread. Potentially this "transferrable events" proposal may solve this, but all we really know about it is its name, so no way to know for sure 🤔

@lygstate
Copy link

@alexcrichton May a better way is provide a instrunction memory.alloc ?

@alexcrichton
Copy link
Contributor Author

Ok after thinking this over, I personally think that we're in a good enough position that I'm going to cloes this issue. While this is still a general problem, I at least personally understand more of the story and it brings us back to a point of "there's a reasonably good story". Notably, the following points are what have changed my mind:

  • Spinning on the main thread can deadlock because workers rely on the main thread sometime doing work for them.
  • Memory allocation in specific can be handled with a fallback to memory.grow on contention and otherwise writing an allocator that's aware of this strategy.
  • An upcoming Atomics.asyncWait proposal provides a suitable strategy for the main thread to wait on locations.
  • There's a long and likely storied discussion about this in JS, and wasm appears to simply be following JS!

I think that the above points definitely don't put us into a "absolutely amazing" world where everything can just work, but it seems like a good balance between the constraints of the web with a reasonable enough story for synchronization of languages between the main thread and workers.

Thanks all for the discussion! I suspect follow-up issues can always be opened if others are interested.

Also @lygstate fwiw I think such an instruction is pretty difficult to provide, and we should be able to effectively do it with memory.grow

@mirkootter
Copy link

mirkootter commented Oct 8, 2021

This issue is quite old, but I haven't found any new information yet...

I am wondering why lock free allocators are not really discussed here. Those only rely on atomic operations like compare and swap, and as far as I understand, those operations are completely legal on the main thread.

See mimalloc for example: It uses a thread local heap, i.e. each thread has its own heap. Allocation works on the thread local heap. Deallocation works from any thread:

  • If you deallocate on the thread local heap, there is no need for sync
  • If you deallocate on another thread, then the thread local heap is not touched and the freed block is added to a lock free queue. This queue can be used by any thread to atomically pop new blocks which it can use.

@kettle11
Copy link

5+ years later this issue is still pertinent and has had a negative impact on the Wasm ecosystem.

Rust and emscripten, two of the most prominent ways to deploy Wasm on web, still both busy-loop instead of wait, which is strictly worse than if atomic.wait were allowed on the main thread.

The effect this limitation has had on the Rust ecosystem is stark: partially due to the clunkiness of working around this constraint Wasm / Rust multithreading uptake has been slow and now nearly all Rust libraries bake in the assumption that WebAssembly is single-threaded.

An example of how this hampers Rust is the popular library rayon. If some data is being iterated over rayon allows automatically distributing the iterated work across cores. The way this works is rayon internally chunks the iteration into 'tasks' that are handed off to a multi-thread scheduler and the calling thread blocks / yields until all tasks are done. So while the calling thread does block it actually returns more quickly with this pattern.

There is a library that adapts rayon to work on web, but it comes with the caveat that you can't run your program on the main thread. This requires additional rearchitecting.

Popular projects like the Bevy game engine have not implemented multithreading on web, in part because underlying libraries (like rayon and others) aren't sure how to work around this constraint.

This constraint has resulted in a Wasm ecosystem on web that's leaving significant performance on the table.

@rossberg
Copy link
Member

I agree this is silly and harmful, in the same boat as the Web's restrictions on synchronous compilation in the JS API. Unfortunately, these restrictions are not necessarily by choice of the Wasm CG but primarily imposed by the groups in control of the Web platform. Hence it would require powerful lobbying to lift them. I wouldn't hold my breath. I'm afraid Wasm cannot fix the Web.

@jayphelps
Copy link
Contributor

@rossberg as an aside, some browsers are removing that synchronous compilation restriction: https://groups.google.com/a/chromium.org/g/blink-dev/c/nJw2zwaiJ2s/m/EYPgC5D3LwAJ

So maybe if enough evidence can be shown, there's hope for a change some day. Wouldn't hold our breath though.

@rossberg
Copy link
Member

@jayphelps, yeah well, lazy compilation adds a whole new bag of problems with regards to predictable performance and optimisation, especially for the use cases where synchronous compilation would matter most. So that's almost jumping from the frying pan into the fire. But we're off-topic now.

@Pauan
Copy link

Pauan commented Dec 15, 2023

As @rossberg said, it's unlikely that the situation will improve anytime soon, so for the foreseeable future if you want to do multi-threading then you need to run your Rust app off of the main thread.

As long as all your Rust code is running in a Worker, then things like rayon work great. It does require some annoying extra setup, and it does make it harder to do main thread things (like the DOM), but it does work.

And even with those hurdles, Rust is still the best option for doing multi-threaded Wasm on the web.

We can try and do some things to improve the Rust experience (better tooling, better libraries, better docs), but the browser restriction around Workers is something we just can't fix.

@kettle11
Copy link

Unfortunately, these restrictions are not necessarily by choice of the Wasm CG but primarily imposed by the groups in control of the Web platform.

Is there a public record, beyond this issue, of this being discussed? Which parties would need to be persuaded?

It's hard to imagine why someone would prefer to encourage an ecosystem of busy-loops and hacks over allowing very brief waits.

@sbc100
Copy link
Member

sbc100 commented Dec 15, 2023

Popular projects like the Bevy game engine have not implemented multithreading on web, in part because underlying libraries (like rayon and others) aren't sure how to work around this constraint.

While I totally understand the frustration here, I'm curious how higher level libraries such as the ones you mention (Bevy and rayon) suffer from the fact that rust (and emscripten) have to perform this busy-wait workaround on the main thread. Isn't that workaround buried deep in the standard library? How does it become observable to higher level libraries? Are you talking about the performance overhead of not being able to yield?

@kettle11
Copy link

kettle11 commented Dec 15, 2023

Isn't that workaround buried deep in the standard library? How does it become observable to higher level libraries?

The Rust standard library on Wasm does not currently busy-wait in its low-level primitives. That logic would go here: https://github.com/rust-lang/rust/blob/e6707df0de337976dce7577e68fc57adcd5e4842/library/std/src/sys/wasm/atomics/futex.rs#L13

Just the other day I proposed changing that: rust-lang/rust#77839 (comment)

I suspect the reason a workaround using busy-wait has not been introduced yet is that the relevant Rust maintainers thought a better solution might come along and they didn't want to introduce any performance foot guns or 'hacky' code.

Separately the Wasm Rust global allocator does work around this by busy-waiting, instead of using regular locking primitives.


You can see this impact 'bubbling up' through higher level libraries, like this task scheduling library Bevy uses:

https://github.com/smol-rs/async-executor/blob/d747bcd8277f7928a825129139a9290632f4d90d/src/lib.rs#L276-L290

Because the library authors were seemingly unfamiliar with web and unsure how to handle the main thread's inability to wait, they simply crash (even off the main thread!) if the lock can't be acquired. A solution, for just that library, would be to add its own busy-wait workaround.

There are other cases of confused library authors throughout the Rust ecosystem who aren't sure what 'hacks' they should use to accommodate web.

@daxpedda
Copy link

I suspect the reason a workaround using busy-wait has not been introduced yet is that the relevant Rust maintainers thought a better solution might come along and they didn't want to introduce any performance foot guns or 'hacky' code.

I don't believe the busy-wait workaround can be introduced at the wasm32-unknown-unknown target, because it would affect other platforms then Web as well, where it is most likely very undesirable.

So for this workaround to be applied in Rust Std, we would need a dedicated Web target.


Just my two cents here:
AFAIU chasing the idea of somehow convincing browsers to allow blocking on the main thread is a futile effort.

Additionally, the busy-loop workaround seems like a terrible idea to me that we should avoid if at all possible, unless we can come up with a nice solution how to put a time limit on that. My guess is that the busy-wait workaround for Wasm is born out of desperation.

The problem with the busy-loop workaround is that it can't work entirely reliable on the Web, where e.g. we can't guarantee that things are properly dropped because of Worker.terminate() and other very funny stuff like that. This would cause a loop like that to spin infinitely.

On the other hand I have no clue how these busy-loop workarounds are really implemented and if there are already good solutions to these problems I'm describing or not.

AFAICS the problem boils down to not having a block_on implementation in Wasm, because then we could solve this with Atomics.waitAsync(). This could be solved by the JS-promise integration proposal or in the future the stack-switching proposal, AFAIK both would still pose a re-entrancy problem.

But I do believe that this is a problem we should and can solve in Wasm (instead of trying to allow blocking on the main thread).

@fgmccabe
Copy link

Can you elaborate on the reentrancy problem introduced by stack switching? Other than it is necessary to be able to support it (and JSPI does).

@daxpedda
Copy link

daxpedda commented Dec 15, 2023

For a proper block_on implementation I'm assuming that we want to prevent re-entrancy (in the same thread at least).
Which AFAIK both proposals can't.

EDIT: it was discussed before to try and address re-entrancy problems in Rust itself, which might be an option here as well ...

@kettle11
Copy link

I don't believe the busy-wait workaround can be introduced at the wasm32-unknown-unknown target, because it would affect other platforms then Web as well, where it is most likely very undesirable.

So for this workaround to be applied in Rust Std, we would need a dedicated Web target.

It would need to be configurable. The Rust Std could expose a way to set a flag that enables the busy-loop behavior on a thread. This could be behind yet another feature-flag, or always enabled when the atomics flag is set. It's a hack, but it's a very high value, pragmatic, and low-code hack.

Creating another target seems like it'd be a larger maintenance burden, but it is an option.


My concern with a Atomics.waitAsync()solution is that I suspect it's notably higher overhead than a typical native wait call (although I don't know that), and that it requires future waiting for more complex proposals that are still a ways off.


The problem with the busy-loop workaround is that it can't work entirely reliable on the Web, where e.g. we can't guarantee that things are properly dropped because of Worker.terminate() and other very funny stuff like that. This would cause a loop like that to spin infinitely.

That would be bad! But acquiring a lock and then calling Worker.terminate() could be noted as a thing to look out for. The status quo is that the thing to look out for is a panic if you accidentally wait on the main thread, which is more difficult to track down / correct, and requires rearchitecting around.


AFAIU chasing the idea of somehow convincing browsers to allow blocking on the main thread is a futile effort.

I still would like to know more about why this is so futile.

Long-running code has a similar effect to waiting on the main-thread, but that's permissible because it's necessary. At least in the Rust ecosystem most main-thread waits fall into two categories:

  • Very rare and meant to be nearly instantaneous
  • Longer, but only because the main thread is waiting for other threads that are helping the main thread with work it needs done.

Almost always code using wait never intends to block the main thread for excessively long, and it's a bug if it does so. Just as it's a bug if code falls into an infinite loop or hits a pathological case that takes too long to compute.

Main-thread waits can lead to poor behavior, just as loops can. Both are useful if used properly!

@daxpedda
Copy link

The Rust Std could expose a way to set a flag that enables the busy-loop behavior on a thread. This could be behind yet another feature-flag, or always enabled when the atomics flag is set. It's a hack, but it's a very high value, pragmatic, and low-code hack.

Creating another target seems like it'd be a larger maintenance burden, but it is an option.

Unfortunately there is currently no way to get around creating a new target. Even the atomics target feature will need it's own target to be stabilized. Similarly to wasi32-threads. This is a limitation by Rust which won't be fixed anytime soon.

On that note, the component model should hopefully allow Rust to create a dedicated target for the Web platform, this would solve many problems, e.g. std::thread, std::time, std::fs and so on. I don't know what the plans are for this or if there are any, but if this is viable it would solve many problems and allow for much better platform support.

Fact is, that unless Wasm converges on all kinds of topics to be equal on all platforms, we will need dedicated targets.

My concern with a Atomics.waitAsync()solution is that I suspect it's notably higher overhead than a typical native wait call (although I don't know that), and that it requires future waiting for more complex proposals that are still a ways off.

The JS-Promise integration proposal is already at phase 3, unlike a proposal to allow blocking on the main thread, which doesn't even exist.

But my guess would be too that it would have some significant overhead. It would be really nice if we could busy-loop for just a moment until we give up and switch context, but I honestly don't know if there is some proper way to define "a moment". Maybe this would need a dedicated Wasm proposal after all, but yeah, sounds tough.

That would be bad! But acquiring a lock and then calling Worker.terminate() could be noted as a thing to look out for. The status quo is that the thing to look out for is a panic if you accidentally wait on the main thread, which is more difficult to track down / correct, and requires rearchitecting around.

A library can't prevent a user from using Worker.terminate(), which potentially ends up again being a similar problem as before: just having to document whats possible and what isn't somewhere deep in the dependency tree.

But agreed, the status quo is pretty bad and we would have to make significant trade-offs with what we have right now to fix this, even if temporarily.

I still would like to know more about why this is so futile.

This is just my impression! I'm literally a nobody in this scene, it's not like I have any say here.
But it was already stated multiple times in this thread that this is extremely unlikely to happen.


On that note, it should be perfectly possible to make a Wasm post-processor that just replaces every call to memory.atomic.wait with a busy loop when on the main thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests