Skip to content

Optimize check_short_circuit with early-exit bit scanning#22580

Open
telleroutlook wants to merge 2 commits into
apache:mainfrom
telleroutlook:optimize-short-circuit-early-exit
Open

Optimize check_short_circuit with early-exit bit scanning#22580
telleroutlook wants to merge 2 commits into
apache:mainfrom
telleroutlook:optimize-short-circuit-early-exit

Conversation

@telleroutlook
Copy link
Copy Markdown

Closes #15631

What changes

Added any_bit_set() and any_bit_unset() helpers in check_short_circuit() that scan the BooleanBuffer word-by-word and exit on the first word that determines the answer, instead of always doing a full count_set_bits() popcount scan.

For AND: check any_bit_set first (no true values → ReturnLeft), then any_bit_unset (all true → ReturnRight). Only fall through to count_set_bits() when the exact count is needed for the pre-selection threshold.

For OR: check any_bit_unset first (no false values → ReturnLeft), then any_bit_set (all false → ReturnRight).

Why

count_set_bits() scans the entire buffer regardless of the data. For cases where the first few words already tell us the answer (all zeros, all ones, or even just "has at least one set bit"), we can skip the full scan.

Benchmark results

8192 rows, existing binary_op bench:

Scenario Baseline Optimized Change
and/one_true_first 544 µs 450 µs -17%
and/one_true_last 469 µs 401 µs -15%
and/one_true_middle 428 µs 389 µs -9%
and/one_true_middle_left 413 µs 371 µs -10%
and/one_true_middle_right 408 µs 391 µs -4%
and/all_false 161 ns 163 ns ~same
and/all_true_in_and 1.25 ms 1.25 ms ~same
or/all_true 139 ns 171 ns ~same

The biggest wins are in the mixed cases where early exit avoids a full buffer scan before falling through to the pre-selection path. The "all same" cases were already fast and see no significant change.

Test plan

  • All 83 existing binary expression tests pass
  • Ran cargo bench --bench binary_op -- short_circuit
  • CI

Replace unconditional count_set_bits() with any_bit_set/any_bit_unset
checks that exit on the first non-trivial word. For AND/OR short-circuit
paths where the answer is obvious from a few words, this avoids a full
buffer scan. Falls back to count_set_bits() only when the exact count
is needed for the pre-selection threshold.

Benchmarks (8192 rows):
- and/one_true_first: 544µs → 450µs (-17%)
- and/one_true_last: 469µs → 401µs (-15%)
@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label May 28, 2026
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the performance of early exit evaluation in binary_expr

1 participant