Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 1, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

I ran some microbenchmarks comparing DataFusion and DuckDB (see apache/datafusion-benchmarks#28) and found that CASE WHEN expressions were much slower in DataFusion, so I asked Claude to make it go faster. Note that this particular optimization doesn't help with the specific benchmark that I was running. I will create another PR for that, but this optimization seems valid too.

Batch Size BEFORE (sequential) AFTER (HashMap) Speedup
8192x3 156.87 µs 52.22 µs 3.0x
8192x50 158.25 µs 54.33 µs 2.9x
8192x100 164.49 µs 55.08 µs 3.0x

What changes are included in this PR?

Optimize CASE expr WHEN literal THEN non-literal-expression by using O(1) HashMap lookup for branch selection instead of O(n) sequential comparisons.

Problem

When a CASE expression has many branches, such as:

CASE status
  WHEN 'active' THEN col_a
  WHEN 'pending' THEN col_b
  ...20+ more branches...
END

The existing code falls back to sequential evaluation because the THEN expressions aren't literals, even though the WHEN values are. This results in O(branches × rows) comparisons.

Solution

Added a new EvalMethod::WithExprLookupTable that:

  1. Uses existing WhenLiteralIndexMap HashMap infrastructure for O(1) branch lookup per row
  2. Groups rows by matching branch
  3. Evaluates THEN expressions only for matching rows (preserving short-circuit semantics)
  4. Merges results using existing ResultBuilder

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates functions Changes to functions implementation labels Jan 1, 2026
@github-actions github-actions bot removed the functions Changes to functions implementation label Jan 1, 2026
@andygrove andygrove changed the title perf: Improve performance of CaseExpr with many branches [WIP] perf: Improve performance of CaseExpr with many branches and non-literal THEN expressions [WIP] Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant