This repository was archived by the owner on Dec 22, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 42
Add pseudocode for widening and narrowing operations #105
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
5841bf9
Add pseudocode for widening and narrowing operations
tlively e68d214
Switch to "low" meaning [0,n/2) and "high" meaning [n/2,n)
tlively f1570ed
Merge branch 'master' of github.com:WebAssembly/simd into clarify-low…
tlively 5bb3e3a
Formatting
tlively File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -808,6 +808,24 @@ will use unsigned saturation to handle overflow, 0x00 or 0xff for i8x16. | |
| Regardless of the whether the operation is signed or unsigned, the input lanes | ||
| are interpreted as signed integers. | ||
|
|
||
| ```python | ||
| def S.narrow_T_s(a, b): | ||
| result = S.New() | ||
| for i in range(T.Lanes): | ||
| result[i] = S.SignedSaturate(a[i]) | ||
| for i in range(T.Lanes): | ||
| result[T.Lanes + i] = S.SignedSaturate(b[i]) | ||
| return result | ||
|
|
||
| def S.narrow_T_u(a, b): | ||
| result = S.New() | ||
| for i in range(T.Lanes): | ||
| result[i] = S.UnsignedSaturate(a[i]) | ||
| for i in range(T.Lanes): | ||
| result[T.Lanes + i] = S.UnsignedSaturate(b[i]) | ||
| return result | ||
| ``` | ||
|
|
||
| ### Integer to integer widening | ||
| * `i16x8.widen_low_i8x16_s(a: v128) -> v128` | ||
| * `i16x8.widen_high_i8x16_s(a: v128) -> v128` | ||
|
|
@@ -820,3 +838,27 @@ are interpreted as signed integers. | |
|
|
||
| Converts low or high half of the smaller lane vector to a larger lane vector, | ||
| sign extended or zero (unsigned) extended. | ||
|
|
||
| ```python | ||
| def S.widen_low_T(ext, a): | ||
| result = S.New() | ||
| for i in range(S.Lanes): | ||
| result[i] = ext(a[i]) | ||
|
|
||
| def S.widen_high_T(ext, a): | ||
| result = S.New() | ||
| for i in range(S.Lanes): | ||
| result[i] = ext(a[S.Lanes + i]) | ||
|
|
||
| def S.widen_low_T_s(a): | ||
| return S.widen_low_T(Sext, a) | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we missing S.widen_low_T_u and S.widen_high_T_s?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah good point. Will update once we resolve the high/low terminology above. |
||
| def S.widen_high_T_s(a): | ||
| return S.widen_high_T(Sext, a) | ||
|
|
||
| def S.widen_low_T_u(a): | ||
| return S.widen_low_T(Zext, a) | ||
|
|
||
| def S.widen_high_T_u(a): | ||
| return S.widen_high_T(Zext, a) | ||
| ``` | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #103 I mentioned that
narrow [a1,a0] [b1,b0] = [b1,b0,a1,a0]this is "register view".In pseudocode, I think we deal with arrays/lists, hence "memory view" (little endian). So I think it looks a bit different.
a = [a0, a1, a2, a3]
b = [b0, b1, b2, b3]
narrow(a,b) will try to put a on the low lanes, i.e. result[0], result[1], result[2], result[3]. Note, I think low lanes should be the smaller lane numbers, so lane 0 is the lowest lane.
Thus in pseudo code, the arguments to SignedSaturate are swapped:
Giving result = [a0, a1, a2, a3, b0, b1, b2, b3] (little endian view).
Does this make sense? Or am I still confused?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense. I didn't realize your comment in #103 was the register view, so I will have to update the pseudocode.
Apart from that, it looks like we disagree about which lanes are "low" and "high." This is reasonable because we did not use those terms to refer to lanes before adding these instructions. Your interpretation is that the "high" lanes have the greater indices, mine is that the "high" lanes should be on the left (by analogy to high-order bits).
Do people think "high" lanes should be on the right with greater indices (:tada:) or on the left with lesser indices (:heart:)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think high lanes should be on the left, analogy to high-order bits, when we look at it in terms of a 128-bit register. But then in memory, those end up being on the right part of an array.
So given a 128-bit register representing a i32x4:
[ 0x1 | 0x2 | 0x3 | 0x4 ]The highest lane contain
0x1.When represented as an array (little-endian):
arr = [0x4, 0x3, 0x2, 0x1], to keep the meaning of "high" consistent, it would bearr[3].Another way I am thinking of this is: if we have lanes 0, 1, 2, and 3, which is the high lane? I would say that it is lane 3. And lane 3 happens to be
0x1in the above example, which in pseudocode isarr[3].There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But consider 16-bit lanes 0, 1, 2, 3, 4, 5, 6, 7. Inside the x86 register lane 2 will be higher than lane 3 but lane 4 will be higher than both of them. That's not easy to remember or reason about. WebAssembly itself only ever uses the in-memory ordering of the vector bytes, so I don't think we should be considering the implementation architecture when deciding whether to describe lanes as high or low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngzhian reminded me that not only are the lanes "reversed" when going from little-endian memory to big-endian registers, so are their contents. Considering that everything is little-endian in memory, it's actually more consistent to say the the higher order lanes are on the right, just like the higher order bytes. I will update the PR accordingly.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For
v128.loadorv128.storelane indices are "big endian", zero lane is read from the location pointed to by the index argument, and highest lane is in the highest memory position (if index isxlane zero would be atx, lane one atx + lane_size, three -x + 2*lane_size, and so on). For multi-byte lanes, their contents are handled in little-endian way, storing0x01020304to a 32-bit lane results in byte sequence of0x04, 0x03, 0x02, 0x1.Shuffle also assigns lower lane indices (0-15) to the first argument and higher (16-32) - to the second. I think this semantics are correct.
Maybe we should add Python pseudocode to loads and stores :)