Skip to content

feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd#60

Open
runlevel5 wants to merge 10 commits intoNugine:mainfrom
runlevel5:feat/powerpc64-vsx-base64
Open

feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd#60
runlevel5 wants to merge 10 commits intoNugine:mainfrom
runlevel5:feat/powerpc64-vsx-base64

Conversation

@runlevel5
Copy link
Copy Markdown

Add VSX backend to vsimd and wire it up in base64-simd for SIMD-accelerated base64 encode/decode/check on ppc64le (POWER7+).

vsimd changes:

  • VSX ISA type with runtime feature detection
  • V64/V128/V256 vector types using vector_unsigned_char
  • All unified ops (splat, add, sub, eq, lt, sat, max, min, bitwise) with scalar fallbacks for u64 (Altivec limitation)
  • SIMD128 trait: load/store (vec_xl/vec_xst), swizzle (vec_perm), shifts, multiply (vec_mladd/vec_mul), zip/unzip (vec_mergeh/mergel + vec_perm), avgr (vec_avg), bsl, bswap, all_zero/any_zero
  • SIMD256: split-to-2x128 path for all 256-bit ops
  • Mask/highbit operations following NEON arm32 pattern
  • Table lookup with correct high-bit-only (0x80) OOB masking
  • dispatch! macro arms for compile/resolve static+dynamic
  • Native runtime detection and dispatch

base64-simd changes:

  • VSX added to encode (split_bits) and decode (merge_bits) using the NEON/WASM128 shift-and-mask path
  • All 4 dispatch entries (encode/decode/check/find_non_ascii_whitespace)
  • powerpc_target_feature nightly feature gate

Gated behind feature="unstable" + target_arch="powerpc64". Tested on POWER9 (ppc64le) with nightly-2026-02-07.

What's next?

Get feedbacks and doing the VSX work for other crates

Add VSX backend to vsimd and wire it up in base64-simd for
SIMD-accelerated base64 encode/decode/check on ppc64le (POWER7+).

vsimd changes:
- VSX ISA type with runtime feature detection
- V64/V128/V256 vector types using vector_unsigned_char
- All unified ops (splat, add, sub, eq, lt, sat, max, min, bitwise)
  with scalar fallbacks for u64 (Altivec limitation)
- SIMD128 trait: load/store (vec_xl/vec_xst), swizzle (vec_perm),
  shifts, multiply (vec_mladd/vec_mul), zip/unzip (vec_mergeh/mergel
  + vec_perm), avgr (vec_avg), bsl, bswap, all_zero/any_zero
- SIMD256: split-to-2x128 path for all 256-bit ops
- Mask/highbit operations following NEON arm32 pattern
- Table lookup with correct high-bit-only (0x80) OOB masking
- dispatch! macro arms for compile/resolve static+dynamic
- Native runtime detection and dispatch

base64-simd changes:
- VSX added to encode (split_bits) and decode (merge_bits) using
  the NEON/WASM128 shift-and-mask path
- All 4 dispatch entries (encode/decode/check/find_non_ascii_whitespace)
- powerpc_target_feature nightly feature gate

Gated behind feature="unstable" + target_arch="powerpc64".
Tested on POWER9 (ppc64le) with nightly-2026-02-07.
@runlevel5 runlevel5 marked this pull request as draft February 9, 2026 02:08
@runlevel5
Copy link
Copy Markdown
Author

I am setting this to draft for the time-being. One thing I want to work on first is to set up benchmarking to be 100% we do not cause any regression.

Restructure the VSX u8x16xn_lookup to reduce instruction count:
- Before: extract high bit, extract low nibble, compare, create OOB
  mask, combine index, vec_perm (6 ops per V128 lookup)
- After: extract low nibble, vec_perm, signed compare, andnot
  (4 ops per V128 lookup)

This eliminates 16 vector ops per decode iteration (4 lookups x 2
halves), fixing the decode regression and improving all ALSW-based
operations:
- decode: 1662 -> 2111 MB/s (+27%, now 1.15x over scalar)
- check:  3087 -> 4453 MB/s (+44%, now 1.69x over scalar)
- encode: unchanged (does not use table lookup)
@runlevel5 runlevel5 marked this pull request as ready for review February 9, 2026 05:58
@Nugine Nugine requested review from Nugine and Copilot February 9, 2026 08:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new POWER/ppc64le VSX (Altivec) SIMD backend to vsimd and wires it into base64-simd’s multiversion dispatch so base64 encode/decode/check/non-ascii scanning can use VSX when available (behind feature="unstable" on powerpc64).

Changes:

  • Introduce VSX as a new instruction set with runtime detection and dispatch integration (isa, native, macros).
  • Implement VSX vector types and a broad set of unified/SIMD128/SIMD256 operations using core::arch::powerpc64::*.
  • Enable base64-simd encode/decode paths to use VSX via existing NEON/WASM-style shift+mask logic and add vsx to dispatch target lists.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/vsimd/src/vector.rs Adds ppc64 VSX-backed V64/V128/V256/V512 vector representations.
crates/vsimd/src/unified.rs Implements VSX branches for unified ops (splat/add/sub/eq/lt/sat/max/min/bitwise).
crates/vsimd/src/table.rs Adds VSX-aware lookup semantics to match the crate’s table-lookup contract.
crates/vsimd/src/simd256.rs Includes VSX in relevant SIMD256 paths and provides VSX-specific unzip behavior.
crates/vsimd/src/simd128.rs Implements many VSX SIMD128 operations (load/store, swizzle, bswap helpers, zip/unzip, avgr, etc.).
crates/vsimd/src/native.rs Adds runtime feature detection + dispatch for VSX.
crates/vsimd/src/mask.rs Extends mask/highbit helpers to work with VSX.
crates/vsimd/src/macros.rs Adds dispatch! compile/resolve support for the "vsx" target.
crates/vsimd/src/lib.rs Adds nightly feature gates required for ppc64 stdarch intrinsics; adjusts portable_simd gating.
crates/vsimd/src/isa.rs Defines VSX ISA type id, enables detection plumbing, and implements InstructionSet for VSX.
crates/base64-simd/src/multiversion.rs Adds "vsx" to multiversion dispatch target lists for encode/decode/check/scan.
crates/base64-simd/src/lib.rs Enables powerpc_target_feature for ppc64 under unstable.
crates/base64-simd/src/encode.rs Treats VSX like NEON/WASM128 for the shift+mask encode implementation.
crates/base64-simd/src/decode.rs Treats VSX like NEON/WASM128 for the shift+mask decode implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…dices

vec_perm masks indices with & 0x1f, so sentinel values like 0x80 wrap
to 0-15 and select bytes from the data vector instead of producing zero.
Preprocess the index vector so any lane >= 16 maps to index 16, which
selects from the zero vector passed as vec_perm's second argument.

Add test covering identity, reverse, boundary (16), sentinel (0x80,
0xFF), mixed valid/OOB, and varied OOB values.
SSSE3 _mm_shuffle_epi8 uses idx & 0x0f for indices 16-127 and only
zeroes on bit 7, while NEON/WASM zero for any index >= 16. Remove the
index-16 boundary test and restrict out-of-range tests to values with
bit 7 set (>= 128), which is the common contract all backends agree on
and the only range used by callers in this codebase.
Copy link
Copy Markdown
Owner

@Nugine Nugine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI test failed. I'm not familiar with the PowerPC architecture.
Could you take a look?

@runlevel5
Copy link
Copy Markdown
Author

The CI test failed. I'm not familiar with the PowerPC architecture.
Could you take a look?

Sure

@runlevel5
Copy link
Copy Markdown
Author

@Nugine coud you please also apply for a native ppc64le CI runner by lodging a request at https://github.com/IBM/actionspz?

runlevel5 and others added 2 commits February 26, 2026 17:55
- Split stdarch_powerpc_feature_detection feature gate to require
  feature="detect" (E0635: unknown feature in no_std builds)
- Replace vec_cmpge with vec_cmpgt for vector_unsigned_char comparison
  (vec_cmpge only supports vector_float)
- Add powerpc64le-unknown-linux-gnu to CI test matrix and QEMU setup
- Add powerpc mode to testgen.py with VSX rustflags
@Nugine Nugine self-requested a review February 26, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants