Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions proposals/simd/SIMD.md
Original file line number Diff line number Diff line change
Expand Up @@ -808,6 +808,24 @@ will use unsigned saturation to handle overflow, 0x00 or 0xff for i8x16.
Regardless of the whether the operation is signed or unsigned, the input lanes
are interpreted as signed integers.

```python
def S.narrow_T_s(a, b):
result = S.New()
for i in range(T.Lanes):
result[i] = S.SignedSaturate(a[i])
for i in range(T.Lanes):
result[T.Lanes + i] = S.SignedSaturate(b[i])
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #103 I mentioned that narrow [a1,a0] [b1,b0] = [b1,b0,a1,a0] this is "register view".
In pseudocode, I think we deal with arrays/lists, hence "memory view" (little endian). So I think it looks a bit different.

a = [a0, a1, a2, a3]
b = [b0, b1, b2, b3]
narrow(a,b) will try to put a on the low lanes, i.e. result[0], result[1], result[2], result[3]. Note, I think low lanes should be the smaller lane numbers, so lane 0 is the lowest lane.

Thus in pseudo code, the arguments to SignedSaturate are swapped:

    for i in range(T.Lanes):
        result[i] = S.SignedSaturate(a[i]) # changed from b to a
    for i in range(T.Lanes):
        result[T.Lanes + i] = S.SignedSaturate(b[i]) # changed from a to b

Giving result = [a0, a1, a2, a3, b0, b1, b2, b3] (little endian view).

Does this make sense? Or am I still confused?

Copy link
Member Author

@tlively tlively Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense. I didn't realize your comment in #103 was the register view, so I will have to update the pseudocode.

Apart from that, it looks like we disagree about which lanes are "low" and "high." This is reasonable because we did not use those terms to refer to lanes before adding these instructions. Your interpretation is that the "high" lanes have the greater indices, mine is that the "high" lanes should be on the left (by analogy to high-order bits).

Do people think "high" lanes should be on the right with greater indices (:tada:) or on the left with lesser indices (:heart:)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think high lanes should be on the left, analogy to high-order bits, when we look at it in terms of a 128-bit register. But then in memory, those end up being on the right part of an array.
So given a 128-bit register representing a i32x4:
[ 0x1 | 0x2 | 0x3 | 0x4 ]
The highest lane contain 0x1.

When represented as an array (little-endian):
arr = [0x4, 0x3, 0x2, 0x1], to keep the meaning of "high" consistent, it would be arr[3].

Another way I am thinking of this is: if we have lanes 0, 1, 2, and 3, which is the high lane? I would say that it is lane 3. And lane 3 happens to be 0x1 in the above example, which in pseudocode is arr[3].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But consider 16-bit lanes 0, 1, 2, 3, 4, 5, 6, 7. Inside the x86 register lane 2 will be higher than lane 3 but lane 4 will be higher than both of them. That's not easy to remember or reason about. WebAssembly itself only ever uses the in-memory ordering of the vector bytes, so I don't think we should be considering the implementation architecture when deciding whether to describe lanes as high or low.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngzhian reminded me that not only are the lanes "reversed" when going from little-endian memory to big-endian registers, so are their contents. Considering that everything is little-endian in memory, it's actually more consistent to say the the higher order lanes are on the right, just like the higher order bytes. I will update the PR accordingly.

Copy link
Contributor

@penzn penzn Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For v128.load or v128.store lane indices are "big endian", zero lane is read from the location pointed to by the index argument, and highest lane is in the highest memory position (if index is x lane zero would be at x, lane one at x + lane_size, three - x + 2*lane_size, and so on). For multi-byte lanes, their contents are handled in little-endian way, storing 0x01020304 to a 32-bit lane results in byte sequence of 0x04, 0x03, 0x02, 0x1.

Shuffle also assigns lower lane indices (0-15) to the first argument and higher (16-32) - to the second. I think this semantics are correct.

Maybe we should add Python pseudocode to loads and stores :)


def S.narrow_T_u(a, b):
result = S.New()
for i in range(T.Lanes):
result[i] = S.UnsignedSaturate(a[i])
for i in range(T.Lanes):
result[T.Lanes + i] = S.UnsignedSaturate(b[i])
return result
```

### Integer to integer widening
* `i16x8.widen_low_i8x16_s(a: v128) -> v128`
* `i16x8.widen_high_i8x16_s(a: v128) -> v128`
Expand All @@ -820,3 +838,27 @@ are interpreted as signed integers.

Converts low or high half of the smaller lane vector to a larger lane vector,
sign extended or zero (unsigned) extended.

```python
def S.widen_low_T(ext, a):
result = S.New()
for i in range(S.Lanes):
result[i] = ext(a[i])

def S.widen_high_T(ext, a):
result = S.New()
for i in range(S.Lanes):
result[i] = ext(a[S.Lanes + i])

def S.widen_low_T_s(a):
return S.widen_low_T(Sext, a)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing S.widen_low_T_u and S.widen_high_T_s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah good point. Will update once we resolve the high/low terminology above.

def S.widen_high_T_s(a):
return S.widen_high_T(Sext, a)

def S.widen_low_T_u(a):
return S.widen_low_T(Zext, a)

def S.widen_high_T_u(a):
return S.widen_high_T(Zext, a)
```