Skip to content

refactor: aabb + p64 rewired — ALL 372 raw intrinsics eliminated aabb (69→0): F32x16 operators + simd_min/max/le/ge, scalar SSE fallback p64 (18→0): scalar array AND/XOR/popcount (LLVM auto-vectorizes, zero deps) 372/372 intrinsics eliminated across 12 HPC files + p64. Only 1 remaining: _mm_prefetch in jitson_cranelift (JIT hint, not data path). All 1510 ndarray tests + 23 p64 tests pass. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp#80

Merged
AdaWorldAPI merged 5 commits into
masterfrom
claude/setup-embedding-pipeline-Fa65C
Apr 3, 2026

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

No description provided.

claude added 5 commits April 3, 2026 18:35
Zero raw intrinsics remaining. All 3 SIMD functions rewired:
  hamming_avx512bw: U8x64 XOR + nibble_popcount_lut + shuffle_bytes + sum_bytes_u64
  popcount_avx512bw: same LUT-based popcount pattern via U8x64
  hamming_avx2: u64 XOR + count_ones (no U8x32, uses scalar popcount)

New U8x64 polyfill methods (all 3 tiers):
  shuffle_bytes(idx) — _mm512_shuffle_epi8 wrapper
  sum_bytes_u64() — SAD against zero + horizontal u64 sum
  nibble_popcount_lut() — 4-lane replicated popcount lookup table

20 bitwise tests pass. Zero _mm*_ calls outside simd polyfill files.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
cam_pq (6→0): F32x16::gather for VPGATHERDPS
packed (5→0): core::arch::asm prefetch
palette_codec (8→0): scalar nibble extraction
New F32x16::gather() polyfill method. 1510 tests pass.
155/372 intrinsics eliminated.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
byte_scan (15→0): U8x64::cmpeq_mask + scalar AVX2 fallback
distance (13→0): F32x8 arithmetic operators
spatial_hash (16→0): F32x8 + scalar comparison fallback
199/372 intrinsics eliminated. 1510 tests pass.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
nibble (46→0): U8x64 for AVX-512, scalar arrays for SSE/AVX2
property_mask (40→0): U64x8 for AVX-512, scalar u64 for AVX2
285/372 intrinsics eliminated. 1510 tests pass.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
aabb (69→0): F32x16 operators + simd_min/max/le/ge, scalar SSE fallback
p64 (18→0): scalar array AND/XOR/popcount (LLVM auto-vectorizes, zero deps)

372/372 intrinsics eliminated across 12 HPC files + p64.
Only 1 remaining: _mm_prefetch in jitson_cranelift (JIT hint, not data path).

All 1510 ndarray tests + 23 p64 tests pass.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
@AdaWorldAPI AdaWorldAPI merged commit 3345070 into master Apr 3, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants