Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Accelerated shuffle masks #199

Open
penzn opened this issue Feb 20, 2020 · 3 comments
Open

Accelerated shuffle masks #199

penzn opened this issue Feb 20, 2020 · 3 comments

Comments

@penzn
Copy link
Contributor

penzn commented Feb 20, 2020

In #196 there was a discussion on what shuffle patterns get accelerated by hardware on various platforms. Right now those are handled inconsistently by the toolchain, and relgardless of what remedy we pick it is good to know what gets accelerated and what does not.

Tentative list from #196 (comment) and #196 (comment):

  • Shuffle with wider lanes
  • Pack/Unpack (ABCDABCD -> AABBCCDD, reverse and the like)
  • Byte shift (shift 128-bit value by a number of bytes with or without wraparound)
  • Blends between two vectors with 32/16/8 masks; equivalent to bitselect with a constant mask but bitselect is slower (constant load + 3 instructions)
  • Restricted shuffle with first two components coming from first vector and second two from the second vector (SSE2)

@zeux, thank you for your list.

@jan-wassenberg
Copy link

Some other suggestions for potentially useful patterns, described as lane indices of the resulting 32-bit lanes:
2301 (swapping 32-bit pairs), 1032 (swapping 64-bit pairs), 0321/2103 (rotate right/left), 0123 (reverse). All but the last are currently used within JPEG XL.

@dtig
Copy link
Member

dtig commented Feb 26, 2020

Thanks @penzn for filing - I'm guessing the purpose of this is currently for documenting known fast shuffles? I'm marking this with a documentation label till #196 is resolved to keep the bulk of the discussion regarding shuffles there.

@penzn
Copy link
Contributor Author

penzn commented Feb 26, 2020

Thanks @penzn for filing - I'm guessing the purpose of this is currently for documenting known fast shuffles?

Yes, more or less. Sorry I've missed the sync call today. This can be part of the resolution for #196 as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants