[SPARK-55318] Performance Optimizations for vector_avg/vector_sum #54097

zhidongqu-db · 2026-02-02T20:57:50Z

What changes were proposed in this pull request?

Implement a set of performance optimizations for the vector aggregation functions recently added.

Reuse binary buffer in-place: instead of allocating a new ByteBuffer for each update/merge call
Hoist division out of loop: compute invCount = 1.0f / newCount once before the loop instead of dividing per element
Hoist weight calculations out of loop: compute leftWeight and rightWeight once before the loop instead of 2 divisions per element
Skip null checks when unnecessary: Check ArrayType.containsNull at initialization and skip the per-element null check entirely when the array type cannot contain nulls

Why are the changes needed?

Existing implementation can cause excessive GCs due to wasted binary buffers on each update. This is particularly problematic for running large aggregation over high dimensional vectors.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

Yes, code assistance with Claude Opus 4.5 in combination with manual editing by the author.

optimizations

47a84c5

zhidongqu-db changed the title ~~Performance Optimizations for vector_avg/vector_sum~~ [SPARK-55318] Performance Optimizations for vector_avg/vector_sum Feb 2, 2026

Merge remote-tracking branch 'upstream' into vector-agg-reuse-buffer

c3c73d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55318] Performance Optimizations for vector_avg/vector_sum #54097

[SPARK-55318] Performance Optimizations for vector_avg/vector_sum #54097

zhidongqu-db commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-55318] Performance Optimizations for vector_avg/vector_sum #54097

Are you sure you want to change the base?

[SPARK-55318] Performance Optimizations for vector_avg/vector_sum #54097

Conversation

zhidongqu-db commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhidongqu-db commented Feb 2, 2026 •

edited

Loading