feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd by runlevel5 · Pull Request #60 · Nugine/simd

runlevel5 · 2026-02-09T02:03:59Z

Add VSX backend to vsimd and wire it up in base64-simd for SIMD-accelerated base64 encode/decode/check on ppc64le (POWER7+).

vsimd changes:

VSX ISA type with runtime feature detection
V64/V128/V256 vector types using vector_unsigned_char
All unified ops (splat, add, sub, eq, lt, sat, max, min, bitwise) with scalar fallbacks for u64 (Altivec limitation)
SIMD128 trait: load/store (vec_xl/vec_xst), swizzle (vec_perm), shifts, multiply (vec_mladd/vec_mul), zip/unzip (vec_mergeh/mergel + vec_perm), avgr (vec_avg), bsl, bswap, all_zero/any_zero
SIMD256: split-to-2x128 path for all 256-bit ops
Mask/highbit operations following NEON arm32 pattern
Table lookup with correct high-bit-only (0x80) OOB masking
dispatch! macro arms for compile/resolve static+dynamic
Native runtime detection and dispatch

base64-simd changes:

VSX added to encode (split_bits) and decode (merge_bits) using the NEON/WASM128 shift-and-mask path
All 4 dispatch entries (encode/decode/check/find_non_ascii_whitespace)
powerpc_target_feature nightly feature gate

Gated behind feature="unstable" + target_arch="powerpc64". Tested on POWER9 (ppc64le) with nightly-2026-02-07.

What's next?

Get feedbacks and doing the VSX work for other crates

Add VSX backend to vsimd and wire it up in base64-simd for SIMD-accelerated base64 encode/decode/check on ppc64le (POWER7+). vsimd changes: - VSX ISA type with runtime feature detection - V64/V128/V256 vector types using vector_unsigned_char - All unified ops (splat, add, sub, eq, lt, sat, max, min, bitwise) with scalar fallbacks for u64 (Altivec limitation) - SIMD128 trait: load/store (vec_xl/vec_xst), swizzle (vec_perm), shifts, multiply (vec_mladd/vec_mul), zip/unzip (vec_mergeh/mergel + vec_perm), avgr (vec_avg), bsl, bswap, all_zero/any_zero - SIMD256: split-to-2x128 path for all 256-bit ops - Mask/highbit operations following NEON arm32 pattern - Table lookup with correct high-bit-only (0x80) OOB masking - dispatch! macro arms for compile/resolve static+dynamic - Native runtime detection and dispatch base64-simd changes: - VSX added to encode (split_bits) and decode (merge_bits) using the NEON/WASM128 shift-and-mask path - All 4 dispatch entries (encode/decode/check/find_non_ascii_whitespace) - powerpc_target_feature nightly feature gate Gated behind feature="unstable" + target_arch="powerpc64". Tested on POWER9 (ppc64le) with nightly-2026-02-07.

runlevel5 · 2026-02-09T02:09:01Z

I am setting this to draft for the time-being. One thing I want to work on first is to set up benchmarking to be 100% we do not cause any regression.

Restructure the VSX u8x16xn_lookup to reduce instruction count: - Before: extract high bit, extract low nibble, compare, create OOB mask, combine index, vec_perm (6 ops per V128 lookup) - After: extract low nibble, vec_perm, signed compare, andnot (4 ops per V128 lookup) This eliminates 16 vector ops per decode iteration (4 lookups x 2 halves), fixing the decode regression and improving all ALSW-based operations: - decode: 1662 -> 2111 MB/s (+27%, now 1.15x over scalar) - check: 3087 -> 4453 MB/s (+44%, now 1.69x over scalar) - encode: unchanged (does not use table lookup)

Copilot

Pull request overview

Adds a new POWER/ppc64le VSX (Altivec) SIMD backend to vsimd and wires it into base64-simd’s multiversion dispatch so base64 encode/decode/check/non-ascii scanning can use VSX when available (behind feature="unstable" on powerpc64).

Changes:

Introduce VSX as a new instruction set with runtime detection and dispatch integration (isa, native, macros).
Implement VSX vector types and a broad set of unified/SIMD128/SIMD256 operations using core::arch::powerpc64::*.
Enable base64-simd encode/decode paths to use VSX via existing NEON/WASM-style shift+mask logic and add vsx to dispatch target lists.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
crates/vsimd/src/vector.rs	Adds ppc64 VSX-backed V64/V128/V256/V512 vector representations.
crates/vsimd/src/unified.rs	Implements VSX branches for unified ops (splat/add/sub/eq/lt/sat/max/min/bitwise).
crates/vsimd/src/table.rs	Adds VSX-aware lookup semantics to match the crate’s table-lookup contract.
crates/vsimd/src/simd256.rs	Includes VSX in relevant SIMD256 paths and provides VSX-specific unzip behavior.
crates/vsimd/src/simd128.rs	Implements many VSX SIMD128 operations (load/store, swizzle, bswap helpers, zip/unzip, avgr, etc.).
crates/vsimd/src/native.rs	Adds runtime feature detection + dispatch for VSX.
crates/vsimd/src/mask.rs	Extends mask/highbit helpers to work with VSX.
crates/vsimd/src/macros.rs	Adds `dispatch!` compile/resolve support for the `"vsx"` target.
crates/vsimd/src/lib.rs	Adds nightly feature gates required for ppc64 stdarch intrinsics; adjusts portable_simd gating.
crates/vsimd/src/isa.rs	Defines `VSX` ISA type id, enables detection plumbing, and implements InstructionSet for VSX.
crates/base64-simd/src/multiversion.rs	Adds `"vsx"` to multiversion dispatch target lists for encode/decode/check/scan.
crates/base64-simd/src/lib.rs	Enables `powerpc_target_feature` for ppc64 under `unstable`.
crates/base64-simd/src/encode.rs	Treats VSX like NEON/WASM128 for the shift+mask encode implementation.
crates/base64-simd/src/decode.rs	Treats VSX like NEON/WASM128 for the shift+mask decode implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/vsimd/src/simd128.rs

…dices vec_perm masks indices with & 0x1f, so sentinel values like 0x80 wrap to 0-15 and select bytes from the data vector instead of producing zero. Preprocess the index vector so any lane >= 16 maps to index 16, which selects from the zero vector passed as vec_perm's second argument. Add test covering identity, reverse, boundary (16), sentinel (0x80, 0xFF), mixed valid/OOB, and varied OOB values.

SSSE3 _mm_shuffle_epi8 uses idx & 0x0f for indices 16-127 and only zeroes on bit 7, while NEON/WASM zero for any index >= 16. Remove the index-16 boundary test and restrict out-of-range tests to values with bit 7 set (>= 128), which is the common contract all backends agree on and the only range used by callers in this codebase.

Nugine

The CI test failed. I'm not familiar with the PowerPC architecture.
Could you take a look?

runlevel5 · 2026-02-25T21:37:58Z

The CI test failed. I'm not familiar with the PowerPC architecture.
Could you take a look?

Sure

runlevel5 · 2026-02-25T21:39:38Z

@Nugine coud you please also apply for a native ppc64le CI runner by lodging a request at https://github.com/IBM/actionspz?

- Split stdarch_powerpc_feature_detection feature gate to require feature="detect" (E0635: unknown feature in no_std builds) - Replace vec_cmpge with vec_cmpgt for vector_unsigned_char comparison (vec_cmpge only supports vector_float) - Add powerpc64le-unknown-linux-gnu to CI test matrix and QEMU setup - Add powerpc mode to testgen.py with VSX rustflags

runlevel5 marked this pull request as draft February 9, 2026 02:08

runlevel5 added 2 commits February 9, 2026 13:20

style: apply cargo fmt and fix clippy redundant pointer cast warnings

3820ef0

runlevel5 marked this pull request as ready for review February 9, 2026 05:58

Nugine requested review from Nugine and Copilot February 9, 2026 08:26

Copilot started reviewing on behalf of Nugine February 9, 2026 08:27 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

crates/vsimd/src/simd128.rs Outdated Show resolved Hide resolved

runlevel5 added 4 commits February 10, 2026 19:03

style: apply cargo fmt to swizzle test

c769f03

style: apply cargo fmt

1bd6417

Nugine mentioned this pull request Feb 11, 2026

Add powerpc64le VSX coverage in CI #61

Merged

Merge branch 'main' into feat/powerpc64-vsx-base64

99a39af

Nugine requested changes Feb 25, 2026

View reviewed changes

runlevel5 and others added 2 commits February 26, 2026 17:55

Merge branch 'main' into feat/powerpc64-vsx-base64

6afe652

Nugine mentioned this pull request Feb 26, 2026

New Project: base64-simd IBM/actionspz#86

Open

2 tasks

Nugine self-requested a review February 26, 2026 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd#60

feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd#60
runlevel5 wants to merge 10 commits intoNugine:mainfrom
runlevel5:feat/powerpc64-vsx-base64

runlevel5 commented Feb 9, 2026

Uh oh!

runlevel5 commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Nugine left a comment

Uh oh!

runlevel5 commented Feb 25, 2026

Uh oh!

runlevel5 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

runlevel5 commented Feb 9, 2026

vsimd changes:

base64-simd changes:

What's next?

Uh oh!

runlevel5 commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Nugine left a comment

Choose a reason for hiding this comment

Uh oh!

runlevel5 commented Feb 25, 2026

Uh oh!

runlevel5 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants