Skip to content

feat(hpc): SIMD wishlist items #1, #5, #6, #10#26

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/continue-session-0mAVa
Mar 23, 2026
Merged

feat(hpc): SIMD wishlist items #1, #5, #6, #10#26
AdaWorldAPI merged 1 commit into
masterfrom
claude/continue-session-0mAVa

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

#1 VML wiring: Wire F32x16/F64x8 SIMD types into scalar VML loops

  • vsln: 16-wide via simd_ln_f32
  • vdsqrt: 8-wide via F64x8::sqrt()
  • vdabs: 8-wide via F64x8::abs()
  • vssin/vscos: batch load/store via F32x16 (scalar per-lane, SIMD framework)
  • vspow: 16-wide via exp(b*ln(a)) using simd_exp_f32 + simd_ln_f32
  • 7 new tests covering SIMD paths

#5 columnar_view: Zero-copy Arrow interop

  • SoakingBuffer::as_columnar_slice() / as_columnar_slice_mut()
  • PlaneBuffer::as_binary_slice()
  • 3 new tests for zero-copy view correctness

#6 simd_apply: Generic fused SIMD kernel

  • simd_apply(a, b, out, Fn(F32x16, F32x16) -> F32x16)
  • simd_apply_unary(x, out, Fn(F32x16) -> F32x16)
  • simd_apply_inplace(a, b, Fn(F32x16, F32x16) -> F32x16)
  • Proper tail handling via zero-padded SIMD
  • 6 new tests (add, FMA, sqrt, inplace, empty, tail-only)

#10 prefetch: Explicit _mm_prefetch in cascade_query

  • prefetch_t0/t1 wrappers (x86_64 SSE, no-op elsewhere)
  • Stroke 1: prefetch PREFETCH_DISTANCE=4 candidates ahead (L1)
  • Stroke 2: prefetch next survivor's data (L2)
  • Stroke 3: prefetch next survivor's data (L1)

All 51 targeted tests pass. Scorecard: 4/10 → 6/10 done.

https://claude.ai/code/session_01CdqyUTUfjKZuk8YGJzv6LB

#1 VML wiring: Wire F32x16/F64x8 SIMD types into scalar VML loops
  - vsln: 16-wide via simd_ln_f32
  - vdsqrt: 8-wide via F64x8::sqrt()
  - vdabs: 8-wide via F64x8::abs()
  - vssin/vscos: batch load/store via F32x16 (scalar per-lane, SIMD framework)
  - vspow: 16-wide via exp(b*ln(a)) using simd_exp_f32 + simd_ln_f32
  - 7 new tests covering SIMD paths

#5 columnar_view: Zero-copy Arrow interop
  - SoakingBuffer::as_columnar_slice() / as_columnar_slice_mut()
  - PlaneBuffer::as_binary_slice()
  - 3 new tests for zero-copy view correctness

#6 simd_apply: Generic fused SIMD kernel
  - simd_apply(a, b, out, Fn(F32x16, F32x16) -> F32x16)
  - simd_apply_unary(x, out, Fn(F32x16) -> F32x16)
  - simd_apply_inplace(a, b, Fn(F32x16, F32x16) -> F32x16)
  - Proper tail handling via zero-padded SIMD
  - 6 new tests (add, FMA, sqrt, inplace, empty, tail-only)

#10 prefetch: Explicit _mm_prefetch in cascade_query
  - prefetch_t0/t1 wrappers (x86_64 SSE, no-op elsewhere)
  - Stroke 1: prefetch PREFETCH_DISTANCE=4 candidates ahead (L1)
  - Stroke 2: prefetch next survivor's data (L2)
  - Stroke 3: prefetch next survivor's data (L1)

All 51 targeted tests pass. Scorecard: 4/10 → 6/10 done.

https://claude.ai/code/session_01CdqyUTUfjKZuk8YGJzv6LB
@AdaWorldAPI AdaWorldAPI merged commit 62f5574 into master Mar 23, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants