Describe the enhancement requested
Describe the enhancement requested
BlockSplitBloomFilter::FindHash currently ships the scalar reference probe, an 8-iteration short-circuit loop.
Proposing a runtime-dispatched implementation: branchless OR-accumulator reduction at the baseline (autovectorizes to SSE on x86, NEON on aarch64), plus an xsimd kernel built with -mavx2 for the runtime AVX2 dispatch target. There's no on-disk
format change, no public API change, and it's bit-identical to the scalar reference.
Discussed on the dev list: https://lists.apache.org/thread/omof0fq47tndfd80g5hwp2bvjmzvpb40
Insert path uses the same loop shape and will follow as a separate issue / PR to keep this change focused.
Component(s)
C++, Parquet
Describe the enhancement requested
Describe the enhancement requested
BlockSplitBloomFilter::FindHashcurrently ships the scalar reference probe, an 8-iteration short-circuit loop.Proposing a runtime-dispatched implementation: branchless OR-accumulator reduction at the baseline (autovectorizes to SSE on x86, NEON on aarch64), plus an xsimd kernel built with
-mavx2for the runtime AVX2 dispatch target. There's no on-diskformat change, no public API change, and it's bit-identical to the scalar reference.
Discussed on the dev list: https://lists.apache.org/thread/omof0fq47tndfd80g5hwp2bvjmzvpb40
Insert path uses the same loop shape and will follow as a separate issue / PR to keep this change focused.
Component(s)
C++, Parquet