Skip to content

perf: optimize date subtraction to avoid intermediate array allocation#22591

Open
lyne7-sc wants to merge 5 commits into
apache:mainfrom
lyne7-sc:perf/date_sub
Open

perf: optimize date subtraction to avoid intermediate array allocation#22591
lyne7-sc wants to merge 5 commits into
apache:mainfrom
lyne7-sc:perf/date_sub

Conversation

@lyne7-sc
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

When evaluating Date32 - Date32 or Date64 - Date64, DataFusion needs to return an Int64 representing the difference in days(after #19563). The current implementation went through Arrow's sub_wrapping, then converted to Int64 days. This allocates an unnecessary intermediate array and bypasses vectorized kernels.

Since Date32/Date64 are just i32/i64 under the hood, we can compute the day difference directly on the native values.

What changes are included in this PR?

  • Replace apply_date_subtraction + duration_to_days with subtract_date_to_days, which operates directly on primitive values using binary()/unary()
  • Add date subtraction benchmarks

Benchmarks

group                               baseline                               optimized
-----                               --------                               ---------
date32_subtract/20_percent_nulls    21.16    30.9±4.15µs        ? ?/sec    1.00  1462.0±129.72ns        ? ?/sec
date32_subtract/no_nulls            18.32    21.9±2.72µs        ? ?/sec    1.00  1196.5±123.89ns        ? ?/sec
date64_subtract/20_percent_nulls    9.97     34.3±2.26µs        ? ?/sec    1.00      3.4±0.12µs        ? ?/sec
date64_subtract/no_nulls            7.65     25.3±2.77µs        ? ?/sec    1.00      3.3±0.09µs        ? ?/sec

Are these changes tested?

Yes, new UTs and exist slt

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label May 28, 2026
{
/// Extract the native value from a scalar by converting to a single-element
/// array and reading index 0. Returns `None` for null scalars.
fn scalar_to_native<P: ArrowPrimitiveType>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we're better off destructuring the scalar value directly instead of needing to roundtrip through an array just to extract the scalar value 🤔

Tradeoff being that it'd be more verbose in terms of code

let result: Int64Array = left.unary(|l| day_diff_fn(l, right_val));
Ok(ColumnarValue::Array(Arc::new(result)))
}
None => Ok(ColumnarValue::Array(new_null_array(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can return a null scalar here instead of a null array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants