PR-X2 Worker B: soa_struct! #[soa(pad_to_lanes = N)] field attribute#169
Conversation
Worker B of PR-X2 per .claude/knowledge/pr-x2-design.md § "Worker
decomposition" line 459. SIMD-staged kernels need each SoA field's
underlying Vec to be a multiple of the lane width N so the consumer
walks the buffer with one uniform N-lane loop — no scalar tail-case
branch. Pre-PR-X2 callers achieved this by hand (W3-W6
GaussianBatch::with_capacity + eager-zero fill); this PR makes it
declarative on the field.
Macro surface:
soa_struct! {
pub struct Cells {
#[soa(pad_to_lanes = 8)]
pub palette: u8,
pub label: u32, // unpadded
}
}
let mut c = Cells::new();
c.push(7, 100);
assert_eq!(c.len(), 1); // logical row count
assert_eq!(c.palette.len(), 8); // physical, rounded to lane 8
assert_eq!(c.label.len(), 1); // unpadded: physical == logical
Implementation:
- Added optional `$(#[soa(pad_to_lanes = $pad:literal)])?` per field
in the macro_rules! head — Rust 1.32+ optional-meta repetition.
- Generated struct grows a private `_logical_len: usize` so `len()` /
`is_empty()` return the **semantic** row count independent of any
field's lane padding.
- `push()` dispatches per-field through internal `@push_field` arms:
• padded arm grows the Vec to `(logical+1).div_ceil(N)*N` filling
with `<$ty as Default>::default()`, then writes the new value
at `[logical]`
• plain arm is the pre-PR-X2 `Vec::push` call
The dispatch uses macro_rules! tt-munching with literal-token
separators (`pad = $pad`) so a single repetition handles both shapes.
- Compile-time guard: `const { assert!($pad > 0) }` inside the padded
arm — `pad_to_lanes = 0` is rejected at expansion, not at runtime.
- `with_capacity(cap)` reserves `cap` on each field but does NOT
pre-pad; padding happens lazily on push (matches the original
`with_capacity` semantics modulo the lane-tail).
- `clear()` resets _logical_len + .clear() on each field. Re-pushing
rebuilds padding from scratch.
Breaking change: `len()` no longer mirrors `self.<field>.len()` after
direct field mutation (e.g. `s.x.push(...)` bypasses `_logical_len`).
The canonical entry point is the macro-generated `push`. Pre-existing
`macro_public_visibility_passthrough` test updated to use `push`.
New tests (`src/hpc/soa.rs`, 5 added):
- pad_to_lanes_single_push_grows_to_lane — mixed cadence 8+16+none
- pad_to_lanes_crosses_lane_boundary — 9 pushes against lane 8
- pad_to_lanes_clear_resets_both — clear() round-trips
- pad_to_lanes_uniform_cadence — all-padded variant
- pad_to_lanes_with_capacity_empty — empty state invariants
Plus a `# Example — #[soa(pad_to_lanes = N)] field attribute` doctest
on the `soa_struct!` macro itself.
Verified:
cargo test -p ndarray --lib hpc::soa 38 passed
cargo test --doc -p ndarray hpc::soa 14 passed
cargo fmt --check clean
cargo clippy --features approx,serde,rayon -- -D warnings clean
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ee172aa84
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 codex review on PR #169: unconditional `_logical_len` injection turned previously all-public generated structs into structs with a private field, breaking downstream code that constructs them via struct literals (`MyBatch { a: vec![], b: vec![] }`) and exhaustive pattern matches — even for callers who never use `#[soa(pad_to_lanes = N)]`. Fix: split `soa_struct!` into two arms. Arm 1 — unpadded (no `#[soa(...)]` anywhere): byte-for-byte the pre-PR-X2 emit. No `_logical_len` field. `len()` reads field lengths under `debug_assert`. Struct-literal construction works exactly as before. macro_rules! tries this arm first; if any field carries a `#[soa(pad_to_lanes = N)]` attribute, the no-attribute pattern fails to match and the macro falls through to arm 2. Arm 2 — padded (at least one `#[soa(...)]`): the new PR-X2 B emit with `_logical_len` + lane-padded `push`. Reached only when the user opted in by tagging a field — struct-literal construction was never going to work for these anyway because the user can't fill the lane-padded `Vec` slots themselves. The pre-existing `macro_public_visibility_passthrough` test is restored to its original form (direct `s.x.push(...)` on the unpadded `Soa3`) since arm 1 is byte-identical to pre-PR-X2. Verified: cargo test -p ndarray --lib hpc::soa 38 passed cargo test --doc -p ndarray hpc::soa 14 passed cargo fmt --check clean cargo clippy --features approx,serde,rayon -- -D warnings clean Resolves PR #169 P1 review thread r3272307622.
|
@codex review P1 thread Generated by Claude Code |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
PR-X2 Worker B per
.claude/knowledge/pr-x2-design.md§ "Worker decomposition" line 459. Adds the#[soa(pad_to_lanes = N)]field attribute tosoa_struct!so SIMD-staged kernels can walk each padded field with one uniform N-lane loop — no scalar tail-case branch.1 file, +231 / -19 vs
master. Companion to Worker A (#168, merged atfb95cb32).Macro surface
Padded tail is filled with
<$ty as Default>::default()— caller-specified sentinels are out of scope (design § "Open questions Q4", deferred follow-up).Implementation
$(#[soa(pad_to_lanes = $pad:literal)])?per field in themacro_rules!head — stable since Rust 1.32._logical_len: usize.len()/is_empty()return the semantic row count, independent of any field's lane padding.push()dispatches per-field through internal@push_fieldarms:(logical + 1).div_ceil(N) * Nfilling withDefault::default(), then writes the value at[logical]Vec::push(val)callpad = $pad) so a single repetition handles both shapes.const { assert!($pad > 0) }inside the padded arm —pad_to_lanes = 0is rejected at expansion time, not runtime.with_capacity(cap)reservescapper field but does not pre-pad; padding happens lazily on push.clear()resets_logical_len+.clear()per field. Re-pushing rebuilds padding from scratch.Breaking change
len()no longer mirrorsself.<field>.len()after direct field mutation — e.g.s.x.push(...)bypasses_logical_lenand produces a stale logical count. The canonical entry point is the macro-generatedpush.The existing
macro_public_visibility_passthroughtest was updated to usepush(it had pushed directly into each field, which still works for visibility but no longer incrementslen()).New tests (5)
pad_to_lanes_single_push_grows_to_lane— mixed cadence (lane 8 + lane 16 + unpadded)pad_to_lanes_crosses_lane_boundary— 9 pushes against lane 8 → physical 16pad_to_lanes_clear_resets_both—clear()round-trippad_to_lanes_uniform_cadence— all-padded variantpad_to_lanes_with_capacity_empty— empty state invariantsPlus a
# Example — #[soa(pad_to_lanes = N)] field attributedoctest on thesoa_struct!macro itself.Verified locally
cargo test -p ndarray --lib hpc::soa— 38 passed (was 33; +5 padded)cargo test --doc -p ndarray hpc::soa— 14 passed (was 13; +1 padded doctest)cargo fmt --check— cleancargo clippy --features approx,serde,rayon -- -D warnings— cleanOut of scope
{field_name}_padded_len()convenience accessor (design § Worker decomposition line 459) — skipped because fields arepubandself.<field>.len()already gives the physical length without ceremony.🤖 Generated with Claude Code
Generated by Claude Code