Skip to content

fix(hpc/activations): sigmoid_f32 stride mismatch (orphan rescue from PR #154)#155

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/sigmoid-stride-mismatch-fix
May 18, 2026
Merged

fix(hpc/activations): sigmoid_f32 stride mismatch (orphan rescue from PR #154)#155
AdaWorldAPI merged 1 commit into
masterfrom
claude/sigmoid-stride-mismatch-fix

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Rescues an orphaned fix from PR #154: the Codex review of #154 flagged a stride-mismatch bug in sigmoid_f32 that I committed (b5a63fc7) but landed AFTER the merge cycle had completed — so the merged version of #154 still carries the bug.

The bug

Same-shaped contiguous views with different memory orders (C-order input + F-order output) both succeeded at as_slice_memory_order but with mismatched logical indexing. The flat SIMD primitive then wrote sigmoid values into the wrong output coordinates instead of falling back to the stride-aware Zip cold path.

Reproducer (also the regression test added in this PR):

let x: Array2<f32> = arr2(&[[0.0_f32, 100.0], [-100.0, 0.0]]); // C-order, strides [2, 1]
let mut out: Array2<f32> = Array::zeros((2, 2).f());             // F-order, strides [1, 2]
sigmoid_f32(x.view(), out.view_mut());
// Before fix: wrong values at out[[0,1]] and out[[1,0]] (logical indexing mismatch)
// After fix:  out[[0,0]]≈0.5, out[[0,1]]≈1.0, out[[1,0]]≈0.0, out[[1,1]]≈0.5

The fix

Add the same x.strides() == out.strides() guard that hpc/vml.rs already uses in dispatch_unary_contig / dispatch_binary_contig (vml's unary and binary fns were never affected because they routed through these helpers). Mismatched-stride inputs now correctly route to the stride-aware Zip cold path.

Why other W2 surfaces are unaffected

  • hpc/reductions.rs (sum / mean / max / min / nrm2 / argmax / argmin) — read-only commutative/associative reductions. Memory-order of the iteration doesn't change the scalar result.
  • hpc/vml.rs (16 unary + 4 binary fns) — already routed through dispatch_unary_contig / dispatch_binary_contig which carry the strides-equality guard.
  • activations::softmax_f32 / log_softmax_f32ArrayView1 only; 1-D as_slice() returns None for non-unit stride, so the cold path engages for any strided 1-D view.

Only sigmoid_f32 was generic-D AND used the raw as_slice_memory_order calls without the guard.

Test plan

  • cargo check -p ndarray --no-default-features --features std clean
  • cargo test -p ndarray --lib --no-default-features --features std hpc::activations — 17 passed / 0 failed (was 16 before this fix)
  • New test_sigmoid_f32_c_in_f_out_mismatched_strides exercises the exact Codex repro
  • cargo fmt --all -- --check clean
  • cargo clippy --no-default-features --features std -- -D warnings clean
  • CI matrix

Generated by Claude Code

Codex flagged: same-shaped contiguous views with different memory
orders (C-order input + F-order output) both succeeded at
as_slice_memory_order but with mismatched logical indexing — the flat
SIMD primitive wrote sigmoid values into the wrong output coordinates.

Fix: add the same strides-equality guard that hpc/vml.rs already uses
in dispatch_unary_contig / dispatch_binary_contig. Mismatched-stride
inputs now route to the stride-aware Zip cold path.

Adds test_sigmoid_f32_c_in_f_out_mismatched_strides regression:
2x2 C-order input, F-order zero-init output, asserts logical
coordinates carry correct sigmoid values. Activations test count:
16 -> 17.

Reductions are unaffected (read-only commutative/associative — memory
order doesn't change the scalar result). vml unary/binary already
guarded via dispatch_*_contig.
@AdaWorldAPI AdaWorldAPI merged commit f1d3303 into master May 18, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants