Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 26, 2025

Which issue does this PR close?

N/A

Rationale for this change

Use a re-usable string buffer instead of allocating a new string for each input value.

Benchmark Main (µs) Optimized (µs) Improvement
size=1024, repeat=3
repeat_string_view 76.51 70.14 -8.3%
repeat_string 78.63 71.41 -9.2%
repeat_large_string 76.40 71.08 -7.0%
size=1024, repeat=30
repeat_string_view 109.02 93.51 -14.2%
repeat_string 108.46 92.12 -15.1%
repeat_large_string 105.99 91.66 -13.5%
size=4096, repeat=3
repeat_string_view 139.44 113.95 -18.3%
repeat_string 133.62 112.25 -16.0%
repeat_large_string 131.94 108.41 -17.8%
size=4096, repeat=30
repeat_string_view 251.77 193.95 -23.0%
repeat_string 250.58 191.86 -23.4%
repeat_large_string 248.88 188.43 -24.3%
overflow tests
size=1024 58.14 58.02 ~0% (no change)
size=4096 58.26 58.08 ~0% (no change)

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Dec 26, 2025
@andygrove andygrove added the performance Make DataFusion faster label Dec 26, 2025
@Jefffrey
Copy link
Contributor

We could even try creating the values/offsets/null buffers manually, in order to copy the strings directly into the values buffer instead of the intermediate buffer to skip builder API completely, but perhaps gets too into the weeds

@andygrove
Copy link
Member Author

We could even try creating the values/offsets/null buffers manually, in order to copy the strings directly into the values buffer instead of the intermediate buffer to skip builder API completely, but perhaps gets too into the weeds

Thanks @Jefffrey, I will explore this as a separate PR. Thanks for the review.

@andygrove andygrove added this pull request to the merge queue Dec 28, 2025
Merged via the queue into apache:main with commit 5b90cee Dec 28, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants