Skip to content

IN LIST: add Float16 bitmap filter#23311

Open
geoffreyclaude wants to merge 1 commit into
apache:mainfrom
geoffreyclaude:perf/in_list_float16_bitmap_filter
Open

IN LIST: add Float16 bitmap filter#23311
geoffreyclaude wants to merge 1 commit into
apache:mainfrom
geoffreyclaude:perf/in_list_float16_bitmap_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

#23299 extends the bitmap IN filter to the signed 1-byte and 2-byte integer types by handling each logical Arrow type directly. Float16 is the remaining 2-byte primitive type that can use the same compact bitmap idea: it has 65,536 possible bit patterns, so an 8 KiB bitmap can represent every possible value.

This PR follows the same direct typed shape as #23299. It does not reinterpret whole arrays as UInt16; instead, the Float16 bitmap filter maps each value to its IEEE-754 half-precision bit pattern with to_bits(). That keeps the logical array type intact while preserving bit-pattern equality semantics, including distinct NaN payloads and +0.0 versus -0.0.

What changes are included in this PR?

  • Adds Float16Type support to the existing BitmapFilter.
  • Routes DataType::Float16 constant-list filtering to that bitmap path.
  • Extends the existing type-combination coverage to include Float16.
  • Adds focused coverage for slices, nulls, NOT IN, +0.0 / -0.0, and NaN payload bit patterns.
  • Adds focused in_list_strategy benchmark rows for Float16.

Are these changes tested?

Yes.

  • cargo fmt --all
  • cargo test -p datafusion-physical-expr bitmap_filter_f16 --lib
  • cargo test -p datafusion-physical-expr test_in_list_from_array_type_combinations --lib
  • cargo test -p datafusion-physical-expr --bench in_list_strategy --no-run
  • cargo clippy --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Local benchmark snapshot

Built and run with release-nonlto, filtered to the new Float16 rows:

cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- narrow_integer/f16 --save-baseline <baseline>

Compared baselines: #23299 -> #23311

Method: directly compared Criterion's raw sample minima (min(time / iterations)) from sample.json. Lower is better; changes within +/-5% are treated as noise.

Summary: 6 relevant rows, 6 faster, 0 slower, 0 within +/-5%.

Benchmark #23299 #23311 Change
narrow_integer/f16/list=4/match=0% 19.582 us 3.911 us -80.0% (5.01x faster)
narrow_integer/f16/list=4/match=50% 44.138 us 3.871 us -91.2% (11.40x faster)
narrow_integer/f16/list=64/match=0% 19.977 us 3.878 us -80.6% (5.15x faster)
narrow_integer/f16/list=64/match=50% 55.792 us 3.903 us -93.0% (14.29x faster)
narrow_integer/f16/list=256/match=0% 21.727 us 3.885 us -82.1% (5.59x faster)
narrow_integer/f16/list=256/match=50% 51.737 us 3.918 us -92.4% (13.21x faster)

@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_float16_bitmap_filter branch from 2630cbd to 6f7992e Compare July 3, 2026 18:17
@geoffreyclaude

geoffreyclaude commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@alamb I forgot to mention Float16 in your Optimize Int8 and Int16 integer IN filters . This should cover the gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant