Atomic and thread operators #1019

binji · 2017-03-24T23:47:31Z

This is an initial pass at adding some atomic and thread operators, based on some work that @flagxor did earlier.

There's still a lot that needs to be fleshed out, but I wanted to get something going.

jfbastien

We need a way to identify post-MVP features. ⚛️ is an easy on here! We then need a page that identifies all these symbols (MVP, each post-MVP, and 🦄 as "future").

This is also missing the specification of how non-atomic accesses behave. We could just wait for the JS work to finish thought?

How do we intend to move such proposals forward?

jfbastien · 2017-03-25T00:07:43Z

Semantics.md

@@ -42,7 +42,7 @@ environment such as a browser, a trap results in throwing a JavaScript exception
 If developer tools are active, attaching a debugger before the
 termination would be sensible.

-## Stack Overflow
+## Stack overflow


Capitalization was inconsistent in the file, but seemed to be "Upper lower lower." Probably should be a different change though.

jfbastien · 2017-03-25T00:07:46Z

Semantics.md

@@ -81,7 +81,7 @@ Note that the value types `i32` and `i64` are not inherently signed or
 unsigned. The interpretation of these types is determined by individual
 operators.

-## Linear Memory
+## Linear memory


jfbastien · 2017-03-25T00:08:03Z

Semantics.md

+memory if the module's memory import doesn't specify that it allows shared
+memory.
+
+### Linear memory accesses


Capitalization.

jfbastien · 2017-03-25T00:10:27Z

Semantics.md

+| `i64.atomic_xchg32_u` | 4 bytes, zero-extended i32 to i64 | nop | 4 bytes, wrapped from i64 to i32 |
+| `i64.atomic_xchg` | 8 bytes | nop | 8 bytes |
+
+The atomic compare exchange RMW operators take three operands: a linear memory


Kinda weird to classify cmpxchg as RMW. It's a 3rd category IMO.

jfbastien · 2017-03-25T00:10:49Z

Semantics.md

@@ -211,7 +356,7 @@ Thus, it is recommend that WebAssembly producers align frequently-used data to
 permit the use of natural alignment access, and use loads and stores with the
 greatest alignment values practical, while always avoiding misaligned accesses.

-### Out of Bounds
+### Out of bounds


jfbastien · 2017-03-25T00:10:52Z

Semantics.md

@@ -461,7 +606,7 @@ supported (including NaN values of all possible bit patterns).
  * `f32.const`: produce the value of an f32 immediate
  * `f64.const`: produce the value of an f64 immediate

-## 32-bit Integer operators
+## 32-bit integer operators


jfbastien · 2017-03-25T00:12:30Z

Semantics.md

+## Thread operators
+
+  * `is_lock_free`: return 1 if an atomic memory access of a given size can be
+    completed without the execution engine acquiring a lock. If not, return `0`.


Can you import the wording from https://tc39.github.io/ecma262/ ?

jfbastien · 2017-03-25T00:12:43Z

Semantics.md

@@ -678,6 +823,48 @@ outside the range which rounds to an integer in range) traps.
    This trap is intended to be impossible for user code to catch or handle, even in the future when it may be possible to
    handle some other kinds of traps or exceptions.

+## Thread operators
+
+  * `is_lock_free`: return 1 if an atomic memory access of a given size can be


Code-quote 1.

jfbastien · 2017-03-25T00:13:15Z

Semantics.md

+memory.
+
+The wait operators take three operands: a linear memory index, an
+expected value, and a relative timeout in milliseconds. The wait operation


Can it be negative?

Described below, but I moved it up.

binji · 2017-03-25T01:10:23Z

We need a way to identify post-MVP features. ⚛️ is an easy on here! We then need a page that identifies all these symbols (MVP, each post-MVP, and 🦄 as "future").

Good point, perhaps we should open a separate issue to discuss?

This is also missing the specification of how non-atomic accesses behave. We could just wait for the JS work to finish thought?

Yeah, I was considering just referencing the ecma262 memory model, but it's not immediately obvious to a reader how it would map.

How do we intend to move such proposals forward?

I seem to remember there being a plan about this. Little help, anyone? :-)

lukewagner · 2017-03-27T18:07:35Z

I think we're still figuring out what, post-MVP, is the process for proposing extensions. Also, with the concurrent emergence of a proper spec in the spec repo, the shape of a PR/proposal might be quite different.

jfbastien · 2017-03-27T18:21:31Z

I'm not sure we want to block everything on the spec repo at this point in time.

lukewagner · 2017-03-27T18:27:25Z

I guess it depends on how long each of these things (process for evolving post-MVP, spec repo, threads) take.

flagxor · 2017-03-27T18:30:18Z

Agree we don't want to block if possible.
@lukewagner would pulling out the sections this adds into a separate Threads.md doc be easier in terms of weathering the shifting repos?

AndrewScheidecker · 2017-03-27T20:01:15Z

BinaryEncoding.md

+
+## Atomic compare exchange operators ([described here](Semantics.md#atomic-memory-accesses))
+
+| `i32.atomic_cmpxchg` | `0xe61c` | `memory immediate` | atomic compare exchange with memory |


The formatting on this table is broken, possible because of the lack of a heading.

Oops, thanks!

sunfishcode · 2017-03-27T20:04:46Z

BinaryEncoding.md

+| `i32.atomic_xor8_s` | `0xe612` | `4` `memory immediate` | atomic xor with memory |
+| `i32.atomic_xor8_u` | `0xe613` | `4` `memory immediate` | atomic xor with memory |
+| `i32.atomic_xor16_s` | `0xe614` | `4` `memory immediate` | atomic xor with memory |
+| `i32.atomic_xor16_u` | `0xe615` | `4` `memory immediate` | atomic xor with memory |


Names like i32.atomic_xor16_s are a little awkward with _s/_u apparently associated with otherwise sign-agnostic operators. What would you think of any of the following conventions?

i32.atomic16_s.xor

i32.atomic_rmw16_s.xor

i32.atomic.rmw16_s.xor (in this case, following atomic with . could be a broader convention)

These would make it clear that the xor isn't signed itself; it's just the loaded value from the rmw that's being sign-extended. This isn't super important with the current opcodes, but could be in the future if wasm ever wants atomic rotate, popcnt, or other operators which care about the number of bits in the actual arithmetic part.

Such a clarification would also align with an idea anticipated here concerning the possibility of adding a real add_s and similar operators (which could trap or do other things on overflow) in the future.

I like i32.atomic.rmw16_s.xor, it nicely matches the proposed opcode layout too. Though I wonder about xchg and cmpxchg:
i32.atomic.rmw16_s.xchg? And as @jfbastien mentioned, cmpxchg isn't really RMW, so i32.atomic.cmpxchg16_s? And I suppose for the non-extending forms:
i32.atomic.rmw.xor?

i32.atomic.cmpxchg16_s and i32.atomic.rmw.xor seem to fit the pattern nicely.

lukewagner · 2017-03-27T20:41:29Z

@flagxor I'm not sure what would be better without knowing what the intended process is for proposing and making changes. Since threads are a high priority, we could just cheat and carry on like we did pre-MVP.

lukewagner · 2017-03-27T20:44:42Z

BinaryEncoding.md

+| Name | Opcode | Immediate | Description |
+| ---- | ---- | ---- | ---- |
+| `i32.atomic_load` | `0xe628` | `memory immediate` | atomic load from memory |
+| `i64.atomic_load` | `0xe629` | `memory immediate` | atomic load from memory |


For all these load/store ops, could we reuse the existing load/store ops, adding a "consistency" annotation to the memory_immediate.flags?

Yes, and I believe that's what was discussed before. But I had a thought it might be nice to have all atomic/threading operators use the same prefix, which led me down this route. One additional nicety is that the opcodes for atomic load/store operators match the non-atomic ones, e.g. i32.atomic_load is 0xe628 and i32.load is 0x28, etc.

Since the opcode specifies that it is atomic, this also means that we don't need a special "non-atomic" value for the memory immediate for RMW ops too.

But I'm not married to the idea, just thought it was kinda neat. :-)

Hm, those are fair points. I guess we are already adding a bunch of operators, so it's not a huge difference, and there would be a natural extension for using the atomic prefix on global ops (when we eventually allow shared globals).

I'm partial to calling the "atomic" prefix the "lock" prefix, and numbering it 0xF0.

@jfbastien can't tell if serious...

sunfishcode · 2017-03-27T20:51:38Z

Semantics.md

+
+| Name | Read | Modify | Write |
+| ---- | ---- | ---- | ---- |
+| `i32.atomic_cmpxchg8_s` | 1 byte, sign-extended i8 to i32 | nop | (conditional) 1 byte, wrapped from i32 to i8 |


One detail that's not clear from this presentation is whether the comparison is the sign-extended value with the full 32 bits of the expected value, or the expected value wrapped to 8 bits with the 8-bit loaded value.

I tried to clarify, ptal

AndrewScheidecker · 2017-03-27T20:58:54Z

What about atomic operations on globals?

lukewagner · 2017-03-27T21:11:52Z

@AndrewScheidecker I think first we need to define a new WebAssembly.Global object for mutable globals that can be imported, exported and marked shared (symmetric to Memory); but after that it makes sense to have atomic global operations.

sunfishcode · 2017-03-27T23:01:16Z

Semantics.md

+
+| Name | Read | Modify | Write |
+| ---- | ---- | ---- | ---- |
+| `i32.atomic.rmw8_s.add` | 1 byte, sign-extended i8 to i32 | sign-agnostic addition | 1 byte, wrapped from i32 to i8 |


The cmpxchg clarification looks good. So good, that I'm now wondering about a Return column for the rmw table too :-). This is admittedly a really minor nit at this point because none of the current rmw operators care whether their inputs are extended or wrapped, but if we add other rmw operators in the future, they could potentially care.

rossberg · 2017-03-28T11:19:54Z

Semantics.md

+## Thread operators
+
+  * `is_lock_free`: If the atomic step of an atomic primitive (see
+    [Atomic Memory Accesses](#atomic-memory-accesses)) on a datum of size N


Where is N coming from, is that an i32 operand?

rossberg · 2017-03-28T11:25:32Z

Semantics.md

+  * `f32.atomic.store`: (no conversion) atomically store 4 bytes
+  * `f64.atomic.store`: (no conversion) atomically store 8 bytes
+
+Atomic read-modify-write (RMW) operators atomically read a value from a memory


Can you specify in which order the operands are expected on the stack for these instructions?

I couldn't find another place in the doc to crib from, so I just specified it by describing the operand in order. Should I be more explicit?

rossberg · 2017-03-28T11:26:25Z

Semantics.md

+| `i64.atomic.rmw32_u.xchg` | 4 bytes | nop | 4 bytes | zero-extended i32 to i64 |
+| `i64.atomic.rmw.xchg` | 8 bytes | nop | 8 bytes | as i64 |
+
+The atomic compare exchange operators take three operands: a linear memory


Nit: the linear memory is an immediate, not an operand.

Whoops, I meant address. :-)

rossberg · 2017-03-28T11:27:20Z

Semantics.md

+change. It is a validation error to use these operators on non-shared linear
+memory.
+
+The wait operators take three operands: a linear memory index, an expected


Nit: as above, I suppose the memory is an immediate.

rossberg · 2017-03-28T11:28:13Z

BinaryEncoding.md

+
+| Name | Opcode | Immediate | Description |
+| ---- | ---- | ---- | ---- |
+| `is_lock_free` | `0xe600` | | |


Can we allocate prefix opcodes at the end of the opcode space instead of the middle? So 0xff for threads being the first such proposal.

rossberg · 2017-03-28T11:28:48Z

Semantics.md

+
+### Wake
+
+The wake operators take two operands: a linear memory index, and a wake count.


Nit: same here.

rossberg · 2017-03-28T11:35:50Z

BinaryEncoding.md

+| `i32.wait` | `0xe601` | `offset` | |
+| `i64.wait` | `0xe602` | `offset` | |
+| `i32.wake` | `0xe603` | `offset` | |
+| `i64.wake` | `0xe604` | `offest` | |


Don't these all have memory immediates?

rossberg · 2017-03-28T11:40:15Z

BinaryEncoding.md

+| `i64.atomic.rmw16_u.xor` | `0xe619` | `4` `memory immediate` | atomic xor with memory |
+| `i64.atomic.rmw32_s.xor` | `0xe61a` | `4` `memory immediate` | atomic xor with memory |
+| `i64.atomic.rmw32_u.xor` | `0xe61b` | `4` `memory immediate` | atomic xor with memory |
+| `i32.atomic.rmw.xchg` | `0xe610` | `5` `memory immediate` | atomic exchange with memory |


Is xchg actually a RMW operation? Seems like it could be its own group, with names like i32.atomic.xchg16_s, in line with cmpxchg.

rossberg · 2017-03-28T11:43:40Z

Semantics.md

+| `i64.atomic.rmw32_s.xor` | 4 bytes | 32-bit sign-agnostic bitwise exclusive or | 4 bytes | sign-extended i32 to i64 |
+| `i64.atomic.rmw32_u.xor` | 4 bytes | 32-bit sign-agnostic bitwise exclusive or | 4 bytes | zero-extended i32 to i64 |
+| `i64.atomic.rmw.xor` | 8 bytes | 64-bit sign-agnostic bitwise exclusive or | 8 bytes | as i64 |
+| `i32.atomic.rmw8_s.xchg` | 1 byte | nop | 1 byte | sign-extended i8 to i32 |


Is xchg actually a RMW operation? It seems like it would fit better with the cmpxchg group -- it's an unconditional version of it. With analogous names like i32.atomic.xchg16_s. Or as its own group.

xchg is usually considered an RMW operation.

I'm fine with it either way, but it probably shouldn't be grouped with cmpxchg since those have 3 operands.

I guess I'm primarily wondering by what criterion xchg is RMW but cmpxchg is not. Some consistency would be nice.

Bikeshedding more generally, why the complicated names in the first place? Wouldn't i32.atomic.add8_s etc work just fine? Sure it's RMW, but that's kind of implied.

cmpxchg (compare-and-swap) is fundamentally more powerful than other RMW operations. Between that, and having a distinct type signature, it commonly gets its own category, even if it is also an RMW operation.

cmpxchg isn't just a conditional version of xchg. For example, an abstract formal semantics wouldn't be able to reduce cmpxchg to xchg when the compare succeeds; that would imply the comparison is a separate step from the swap, breaking atomicity.

add8_s under wasm's existing naming scheme would seem to imply that the add, or, and, xor, etc., are signed/unsigned, which is inconsistent with the existing operators.

@sunfishcode, interesting, thanks for the pointer!

Regarding the naming, I see your argument. My only remaining quibble is that, even being in different expressive categories, it would still be nice to have a somewhat more symmetric naming scheme between cmpxchg and the other RMW operators.

i32.atomic.rmw8_s.cmpxchg would be the maximally symmetric convention, it seems. I don't think it's critical, but I wouldn't be opposed to it.

sbc100 · 2017-03-28T17:34:00Z

Semantics.md

+The atomic compare exchange operators take three operands: a linear memory
+index, an `expected` value, and a `replacement` value. The operators
+conditionally store the `replacement` value, but only if the `loaded` value is
+equal to the `expected` value.


The logic of this sentence seems a little strange to me. "X conditionally does Y, but only when Z".. I think what you are trying to say is that its always conditional, and then describe said condition.

jfbastien · 2017-03-28T18:30:33Z

I think we've got a great opportunity to call the "atomic" prefix the "lock" prefix, and number it 0xF0.

jayphelps · 2017-03-29T03:44:16Z

BinaryEncoding.md

+| `i64.wake` | `0xc3` | `offest` | SAB like wake, takes an index + `offset` |
+| `i32.thread_create` | `0xc4` | `table` | Creates a thread, stores thread in `table` at an i32 index. Returns 0 on success or non-zero otherwise. |
+| `i32.thread_join` | `0xc5` | `table` | Waits to join to thread stored at an i32 index in `table`. Returns 0 on success or non-zero otherwise. |
+| `thread_exit` | `0xc6` | | Halts current thread. |


These three thread ops were dropped in 4b94ac6#diff-7c076c52f23f4a770974fb209259f34bL962 seemingly without comment (that I can find). Is there somewhere else they went?

It would be good to document this! Is the intent to start off with workers, mimic JS' API, and then add better capabilities later?

lukewagner · 2017-04-04T17:02:32Z

Threads.md

+threads of execution. The shared memory can be imported or defined in the
+module. It is a validation error to attempt to import shared linear
+memory if the module's memory import doesn't specify that it allows shared
+memory.


It'd be good to flesh this out a bit more, including these sorts of details.

I've added some of the details, PTAL (still no JS.md stuff yet)

jfbastien · 2017-04-04T17:11:00Z

We need to specify:

What happens with grow_memory and WebAssembly.Memory.prototype.grow.
What happens to instance.export.memory.buffer that are in JS.
Whether current_memory is sequentially-consistent.

…codes

binji · 2017-04-05T19:46:36Z

OK, I've added a little bit here about agents/agent clusters, mostly copied from https://tc39.github.io/ecma262/#sec-agents. Not sure how much detail we want to go into here. Gonna write up some JS.md stuff next.

jfbastien · 2017-04-05T19:58:04Z

Threads.md

+...
+
+If linear memory is marked as shared, the
+[maximum memory size](Modules.md#linear-memory-section) must be specified.


Same restriction for JS.md WebAssembly.Memory.

jfbastien · 2017-04-05T19:58:51Z

Threads.md

+`grow_memory` of a shared linear memory is allowed, but may fail if the new
+size is greater than the specified maximum size, or if reserving the additional
+memory fails. All agents in the executing agent's cluster will have access to
+the additional linear memory.


Do we need to discuss neutering?

hemobo · 2017-04-06T11:43:10Z

@jfbastien, can you motivate why you insist on 0xf0? It seems strictly worse than starting from the end.

Yay, bikeshedding! Count me in!

Reserving the last row of 16 bytes for extension prefixes sounds like a good idea to me. In particular because the potential cost of some future version running out of possibilities for extension seems far higher than the cost of putting some of the more rare prospective one byte opcodes behind a prefix due to lack of space in the lower bytes.

Putting the first extension byte at 0xf0 would be a nice way to mark off that reserved range. Opcode space fragmentation would then be a feature here, not a bug, keeping room for (possibly unanticipated) future extensions.

The only freedom gained by going backwards from 0xff (i.e. defragmentation) seems to be to allow for the possibility of one byte opcodes 'bleeding' into that reserved range, which I think is exactly what one would like to prevent.

Also, extension opcodes could still grow from 0xf0 backwards if at some point in the future it becomes clear that more than 16 prefixes are needed and one byte opcodes haven't filled the lower bytes yet. So one doesn't give up any growth potential for extensions here compared to starting from 0xff and going backwards, the only difference is the explicit reservation of the last row of bytes.

It also feels kind of awkward to spec all the extension opcodes backwards, in a list of operations one would either end up with features running from most obscure to most common, or otherwise with opcodes being listed backwards for extension prefixes and the other way around for the rest... just feels icky :)

rossberg · 2017-04-07T15:52:20Z

One question that also needs answering is how shared memories interact with instantiation. In particular, what is the amount of atomicity, if any, that is provided for writes performed for data sections?

jfbastien · 2017-04-07T15:59:54Z

One question that also needs answering is how shared memories interact with instantiation. In particular, what is the amount of atomicity, if any, that is provided for writes performed for data sections?

Good point. Ordering of these as well (i.e. are they nonatomic writes, from start to end, or are they sequentially consistent?).

I assume we don't intend to allow shared Tables for now?

lukewagner · 2017-04-07T22:49:01Z

w.r.t. misaligned atomic accesses trapping:

Does this match the behavior of any of the major ISAs? Or do they "succeed" but fail to provide their atomicity guarantees?
Will trapping atomics end up leading to some of the same double-branching that we've seen with the trapping float operations (where user code branches on alignment to avoid the trap, and then the engine branches again to trap)?
Each trapping instruction has some small amount of overhead (metadata/code) to record the (semi-)precise trapping bytecode offset (for error reporting). For codes with millions of heap accesses, this adds up to several MB for the current out-of-bounds heap access checks. Since separate instructions would check for misalignment vs. out-of-bounds, I expect trapping semantics will incur some amount of size overhead.

jfbastien · 2017-04-07T22:55:53Z

w.r.t. misaligned atomic accesses trapping:

Does this match the behavior of any of the major ISAs? Or do they "succeed" but fail to provide their atomicity guarantees?

Some ISAs "just work", others tear (sometimes just when crossing cachelines), others silently align, and others trap. It's unclear which of these ISAs we care about.

Will trapping atomics end up leading to some of the same double-branching that we've seen with the trapping float operations (where user code branches on alignment to avoid the trap, and then the engine branches again to trap)?

No: there's no "polyfill" possible. having a side-lock for misaligned accesses won't do the right thing.

Each trapping instruction has some small amount of overhead (metadata/code) to record the (semi-)precise trapping bytecode offset (for error reporting). For codes with millions of heap accesses, this adds up to several MB for the current out-of-bounds heap access checks. Since separate instructions would check for misalignment vs. out-of-bounds, I expect trapping semantics will incur some amount of size overhead.

We'd need to quantify how many atomic instructions actually occur. I expect very few, so low cost. Would be good to confirm.

lukewagner · 2017-04-07T23:03:04Z

Some ISAs "just work", others tear (sometimes just when crossing cachelines), others silently align, and others trap. It's unclear which of these ISAs we care about.

What about just x86/x64/ARM32/ARM64/MIPS?

No: there's no "polyfill" possible. having a side-lock for misaligned accesses won't do the right thing.

Well, not a polyfill that provides atomicity, just a path that does the load non-atomically w/o trapping. Auto-aligning via mask (like asm.js+SAB would do) would be another option.

jfbastien · 2017-04-07T23:58:32Z

Some ISAs "just work", others tear (sometimes just when crossing cachelines), others silently align, and others trap. It's unclear which of these ISAs we care about.

What about just x86/x64/ARM32/ARM64/MIPS?

The behavior I described is covered by just x86 and ARM. Which ARM is another question. The auto-align and trapping ones aren't the mainstream ones AFAIK (or rather, the ones which run browsers, definitely true for "embedded" types ARMs).

No: there's no "polyfill" possible. having a side-lock for misaligned accesses won't do the right thing.

Well, not a polyfill that provides atomicity, just a path that does the load non-atomically w/o trapping. Auto-aligning via mask (like asm.js+SAB would do) would be another option.

If the intent is to have atomicity, except the code is OK without atomicity, then... Why even bother with atomicity? The polyfill you describe makes no sense.

We could auto-align, but I don't see how that's useful compared to trapping. Trapping is much more consistent with what WebAssembly otherwise does, and as I mentioned above I think the cost is minimal given how few of these instructions I expect to see.

lukewagner · 2017-04-10T16:01:49Z

Again, I'm not talking about a "polyfill"; I'm wondering whether we'd get into a situation (as we've seen with the trapping conversion ops) where a compiler is forced to keep code "working" (i.e, not trapping) by adding its own branches/masks so that unaligned atomic accesses don't trap.

We could auto-align, but I don't see how that's useful compared to trapping.

I'm totally not suggesting that; I'm anti-suggesting that, even. I'm trying to determine whether it's better for us to not trap since, it sounds like, on all our tier 1 platforms, that's what the ISA gives us and trapping just adds several minor sources of overhead.

jfbastien · 2017-04-10T16:15:58Z

Again, I'm not talking about a "polyfill"; I'm wondering whether we'd get into a situation (as we've seen with the trapping conversion ops) where a compiler is forced to keep code "working" (i.e, not trapping) by adding its own branches/masks so that unaligned atomic accesses don't trap.

That's what I was calling a polyfill. I don't think it can be made to work.

We could auto-align, but I don't see how that's useful compared to trapping.

I'm totally not suggesting that; I'm anti-suggesting that, even.

Sure, I'm listing possibilities.

I'm trying to determine whether it's better for us to not trap since, it sounds like, on all our tier 1 platforms, that's what the ISA gives us and trapping just adds several minor sources of overhead.

I understand what you're asking. I just don't understand which alternative you prefer out of the ones I listed. All our tier 1 platforms don't have the same behavior.

lukewagner · 2017-04-10T18:55:46Z

That's what I was calling a polyfill. I don't think it can be made to work.

It depends what you mean by "work". If "work" means "loads the bytes as if by a non-atomic load", then sure it can.

I just don't understand which alternative you prefer out of the ones I listed. All our tier 1 platforms don't have the same behavior.

Ok, maybe I've misunderstood the cases here. I assumed there are three: trap, auto-align (boo!), do the unaligned load (without atomicity guarantees). Is that right? If so, then when you said above "The auto-align and trapping ones aren't the mainstream ones AFAIK", I assumed that meant all the tier 1 platforms are in the last case.

jfbastien · 2017-04-10T20:23:47Z

That's what I was calling a polyfill. I don't think it can be made to work.

It depends what you mean by "work". If "work" means "loads the bytes as if by a non-atomic load", then sure it can.

But then it's non-atomic. Why would a user bother with atomics if their code is OK not being atomic? I'm totally happy making that type of code slower.

I just don't understand which alternative you prefer out of the ones I listed. All our tier 1 platforms don't have the same behavior.

Ok, maybe I've misunderstood the cases here. I assumed there are three: trap, auto-align (boo!), do the unaligned load (without atomicity guarantees). Is that right? If so, then when you said above "The auto-align and trapping ones aren't the mainstream ones AFAIK", I assumed that meant all the tier 1 platforms are in the last case.

AFAIK x86 "just works" and will be atomic even across cache lines, whereas "mainstream" mobile ARMs will tear. These are "tier 1". Other ARMs (e.g. for embedded) will trap or auto-align.

I agree we should tune for tier 1. We shouldn't ignore tier 2 if it's easy. In this case tier 1 differs, and trapping is cheap enough that I'd go with that because it prevents "I tested on Chrome on x86 and it worked, ship it!".

Silently tearing is really bad! It's hard to reliably observe.

lukewagner · 2017-04-10T21:19:59Z

But then it's non-atomic. Why would a user bother with atomics if their code is OK not being atomic?
I'm totally happy making that type of code slower.

If the compiler has to blindly work around it (like it had to do with float conversion traps), it would penalize the whole app, not just the dumb code. Then again, maybe explicit atomics + unaligned bugs are a rare enough combination in practice that this situation wouldn't arise in practice.

and trapping is cheap enough that I'd go with that because it prevents "I tested on Chrome on x86
and it worked, ship it!".

Yeah, I think that is the central tradeoff: making all tier-1 fast paths slower for some amount of bug-catching. It'd certainly help to have some measurements here.

lars-t-hansen · 2017-04-18T12:31:42Z

Threads.md

+If the loaded value is not equal to the expected value, the operator returns 1
+("not-equal"). If the values are equal, the agent is suspended. If the agent
+is woken, the wait operator returns 0 ("ok"). If the timeout expires before
+another agent wakes this one, this operator returns 2 ("timed-out").


As you know, in the HTML embedding of ES2017 we do not allow Atomics.wait to wait on the main thread of a window (or, in Firefox at present, on the main thread of a SharedWorker). In ES, a call to Atomics.wait on such a thread results in a TypeError being thrown. Wasm can be used on the main thread, so is there going to be the same restriction there?

I never really liked this AgentCanSuspend compromise since it seems like it would just lead to spin-locking instead. That being said, it seems like wasm should be symmetric with JS here, so either we make the restriction too or we try to remove the restriction from JS. If we make the restriction, it seems like we'd throw a RuntimeError.

It will lead to some spin-locking, but those that dabble in that will quickly discover that the browser locks up, as the main thread is not able to service requests made on behalf of other parts of the system. (Though I sort of doubt that will be apparent to many of the people bitten by this problem.)

(Edited to add:)

IMO we should expect to live with this problem and to ease the creation of content that lives with it, too, by providing structures for easily running Wasm off-main-thread.

Good point, I think we need to be symmetric with JS here.

lars-t-hansen · 2017-04-18T12:36:07Z

Threads.md

+
+The embedder is also permitted to suspend or wake an agent. A suspended agent
+can be woken by the embedder or the wake operator, regardless of how the agent
+was suspended (e.g. via the embedder or a wait operator).


Is this attempting to formalize the push-into-history / restore-from-history mechanism of browsers, or to allow spurious wakeups (which are not in the ES version of Atomics.wait)?

Oh, it was allowing spurious wakeups, but of course you're right, that's not allowed.

Oh, sorry -- this is meant to allow the thread to be suspended/woken by JavaScript as well as WebAssembly. But it's a good point that it is similar to a spurious wakeup. I wonder if there's a good way to make a distinction.

Can you reference the abstract WakeWaiter operation? If the wasm wake operator uses that, that'll unify wasm with how the embedder can explicitly wake up agents.

jfbastien · 2017-04-18T16:39:10Z

I created a new repo for this: https://github.com/WebAssembly/threads
@binji can you move things there? It'll be easier to iterate.

binji · 2017-04-18T17:20:20Z

Moved here: WebAssembly/threads#1, I'll close this for now. Please move relevant comments there!

flagxor and others added 5 commits March 22, 2017 13:19

Adding strawman threads.

d1f4731

Some modifications

4b94ac6

Remove structured clone of WebAssembly.Memory for now

851bfac

fix tables

22a4bf2

use different opcode for cmpxchg, since it has different operand count

38185e3

jfbastien reviewed Mar 25, 2017

View reviewed changes

binji added 2 commits March 24, 2017 18:01

jfb comments

add7807

separate out cmpxchg in BinaryEncoding

9481e3a

AndrewScheidecker reviewed Mar 27, 2017

View reviewed changes

add missing table heading

d98a7e1

sunfishcode reviewed Mar 27, 2017

View reviewed changes

lukewagner reviewed Mar 27, 2017

View reviewed changes

sunfishcode reviewed Mar 27, 2017

View reviewed changes

binji added 2 commits March 27, 2017 15:33

rename operators, try to clarify cmpxchg

91e1116

fix typo in add8, sub8, and8

3e52203

sunfishcode reviewed Mar 27, 2017

View reviewed changes

add return column to rmw ops

b9917b9

rossberg reviewed Mar 28, 2017

View reviewed changes

sbc100 reviewed Mar 28, 2017

View reviewed changes

address comments

4c6dfb7

jayphelps reviewed Mar 29, 2017

View reviewed changes

lukewagner reviewed Apr 4, 2017

View reviewed changes

lukewagner mentioned this pull request Apr 4, 2017

Add Typed Object support to WebAssembly #1022

Closed

2 tasks

specify agents, {grow,current}_memory, shared flag, expand out RMW op…

63c951e

…codes

Fix resizable_limits table

e4a9109

jfbastien reviewed Apr 5, 2017

View reviewed changes

AndrewScheidecker mentioned this pull request Apr 8, 2017

Integrating multiple memories and 64-bit addresses with other extensions #1036

Open

binji mentioned this pull request Apr 11, 2017

What is an Agent? tc39/ecma262#882

Closed

annevk mentioned this pull request Apr 12, 2017

Make Agents / Agent Clusters concepts that work for WebAssembly tc39/ecma262#887

Open

lars-t-hansen reviewed Apr 18, 2017

View reviewed changes

binji mentioned this pull request Apr 18, 2017

An initial overview of threads WebAssembly/threads#1

Merged

binji closed this Apr 18, 2017

AndrewScheidecker mentioned this pull request Sep 14, 2018

Should wake trap for out-of-bounds addresses? WebAssembly/threads#105

Open

tlively mentioned this pull request Mar 18, 2020

i64.atomic.wait has no matching futex syscall on Linux WebAssembly/threads#135

Open


		## Atomic compare exchange operators ([described here](Semantics.md#atomic-memory-accesses))

		\| `i32.atomic_cmpxchg` \| `0xe61c` \| `memory immediate` \| atomic compare exchange with memory \|


		### Wake

		The wake operators take two operands: a linear memory index, and a wake count.

Atomic and thread operators #1019

Atomic and thread operators #1019

Conversation

binji commented Mar 24, 2017

jfbastien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binji commented Mar 25, 2017

lukewagner commented Mar 27, 2017

jfbastien commented Mar 27, 2017

lukewagner commented Mar 27, 2017

flagxor commented Mar 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukewagner commented Mar 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewScheidecker commented Mar 27, 2017

lukewagner commented Mar 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binji Mar 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg Mar 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg Mar 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfbastien commented Mar 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfbastien commented Apr 4, 2017

binji commented Apr 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hemobo commented Apr 6, 2017

rossberg commented Apr 7, 2017

jfbastien commented Apr 7, 2017 • edited Loading

binji Mar 28, 2017 •

edited

Loading

rossberg Mar 29, 2017 •

edited

Loading

rossberg Mar 29, 2017 •

edited

Loading

jfbastien commented Apr 7, 2017 •

edited

Loading

jfbastien commented Apr 7, 2017 •

edited

Loading

jfbastien commented Apr 10, 2017 •

edited

Loading