Add extra case_when benchmarks #18097

pepijnve · 2025-10-16T11:36:45Z

Which issue does this PR close?

None

Rationale for this change

More microbenchmarks make it easier to asses the performance impact of CaseExpr implementation changes.

What changes are included in this PR?

Add microbenchmarks for case expressions that are a bit more representative for real world queries.

Are these changes tested?

n/a

Are there any user-facing changes?

no

alamb

Thanks @pepijnve -- I ran these benchmarks locally and it looks good

I also did a brief profile and it looks like it is measuring what we expect:

alamb · 2025-10-16T17:38:57Z

datafusion/physical-expr/benches/case_when.rs

-
-fn criterion_benchmark(c: &mut Criterion) {
-    // create input data
+fn make_batch(row_count: usize, column_count: usize) -> RecordBatch {


is it worth a comment saying that this column could is always 3 or more?

I'll add a description and assertion

alamb · 2025-10-16T17:40:20Z

datafusion/physical-expr/benches/case_when.rs

-    ));
+    let mut columns: Vec<ArrayRef> = vec![c1, c2, c3];
+    for _ in 3..column_count {
+        columns.push(Arc::new(Int32Array::from_value(0, row_count)));


do you think it would be useful to use different values for the columns? Maybe since it is a benchmark (not correctness test) all zeros will be fine

The values themselves don't really matter for the benchmark. Maybe it's safer to not use identical values just to be sure that the array never gets REE encoded.

alamb · 2025-10-16T17:40:57Z

datafusion/physical-expr/benches/case_when.rs

+    );
+
+    // No expression, when/then/else, column reference values
+    c.bench_function(


These are nice labels:

Benchmarking case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 EN...: Warming up for 3.0000 s

alamb · 2025-10-16T17:41:52Z

datafusion/physical-expr/benches/case_when.rs

+
+    // Many when/then branches where all are effectively reachable
+    c.bench_function(format!("case_when {}x{}: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END", batch.num_rows(), batch.num_columns()).as_str(), |b| {
+        let when_thens = (0..batch.num_rows() as i32).map(|i| (make_x_cmp_y(&c1, Operator::Eq, i), lit(i))).collect();


that is a lot of when_thens!

Intentionally so. This is a torture test benchmark to really stress the code.

The first 'all reachable' one is really a worst case scenario test case. This is intended to be able to measure improvements in the processing that's being done in each loop iteration. Filtering, scattering, etc.

The second 'few reachable' one is intended to measure the short circuiting behaviour.

alamb · 2025-10-17T16:47:56Z

Thank you @pepijnve

## Which issue does this PR close? - Followup to #18097 ## Rationale for this change The last benchmark was incorrectly essentially indentical to the second to last one. The actual predicate was using `=` instead of `<`. ## What changes are included in this PR? - Adjust the operator in the case predicates to `<` - Adds two additional benchmarks covering `case x when ...` ## Are these changes tested? Verified with debugger. ## Are there any user-facing changes? No

github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 16, 2025

pepijnve mentioned this pull request Oct 16, 2025

Short circuit complex case evaluation modes as soon as possible #17898

Open

Extend case_when benchmark

7585b62

pepijnve force-pushed the case_bench branch from b459df2 to 7585b62 Compare October 16, 2025 16:49

alamb approved these changes Oct 16, 2025

View reviewed changes

pepijnve added 2 commits October 17, 2025 09:43

Document requirement that column_count >= 3

4b4bdea

Make additional columns not constant valued

c02cf9a

alamb added this pull request to the merge queue Oct 17, 2025

Merged via the queue into apache:main with commit 765f2b9 Oct 17, 2025
32 checks passed

pepijnve mentioned this pull request Oct 17, 2025

Use < instead of = in case benchmark predicates, use Integers #18144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add extra case_when benchmarks #18097

Add extra case_when benchmarks #18097

Uh oh!

pepijnve commented Oct 16, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Oct 16, 2025

Uh oh!

pepijnve Oct 17, 2025

Uh oh!

alamb Oct 16, 2025

Uh oh!

pepijnve Oct 17, 2025

Uh oh!

alamb Oct 16, 2025

Uh oh!

alamb Oct 16, 2025

Uh oh!

pepijnve Oct 17, 2025 •

edited

Loading

Uh oh!

alamb commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add extra case_when benchmarks #18097

Add extra case_when benchmarks #18097

Uh oh!

Conversation

pepijnve commented Oct 16, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pepijnve Oct 17, 2025 •

edited

Loading