Skip to content

perf: Optimize substr for Utf8, LargeUtf8#21366

Open
neilconway wants to merge 3 commits intoapache:mainfrom
neilconway:neilc/optimize-substr-zerocopy
Open

perf: Optimize substr for Utf8, LargeUtf8#21366
neilconway wants to merge 3 commits intoapache:mainfrom
neilconway:neilc/optimize-substr-zerocopy

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

For Utf8 and LargeUtf8 inputs, we can optimize substr to avoid copying the output strings; instead, we can return a StringViewArray that points into the input value buffer.

Benchmarks (M4 Max):

Array args, no count, short strings (strlen=12):

  • string_view: 25.8 → 25.2 (-2.5%)
  • string: 32.9 → 23.8 (-27.9%)
  • large_string: 35.0 → 24.0 (-31.4%)

Array args, with count, long strings (count=64, strlen=128):

  • string_view: 41.5 → 41.9 (+1.0%)
  • string: 47.4 → 37.0 (-22.0%)
  • large_string: 47.6 → 37.5 (-21.2%)

Array args, short count, long strings (count=6, strlen=128):

  • string_view: 48.8 → 48.7 (-0.2%)
  • string: 59.5 → 49.1 (-17.6%)
  • large_string: 65.1 → 49.2 (-24.4%)

Scalar start, no count, short strings (strlen=12):

  • string_view: 26.3 → 25.9 (-1.3%)
  • string: 32.8 → 24.3 (-25.9%)

Scalar start, no count, long strings (strlen=128):

  • string_view: 41.4 → 41.7 (+0.7%)
  • string: 47.2 → 37.2 (-21.2%)

Scalar start=1, no count, long strings (strlen=128):

  • string_view: 42.0 → 41.9 (-0.2%)
  • string: 47.7 → 37.3 (-21.8%)

Scalar args, short strings (count=6, strlen=12):

  • string_view: 47.3 → 47.8 (+1.1%)
  • string: 61.1 → 50.4 (-17.5%)

Scalar args, long strings (count=64, strlen=128):

  • string_view: 42.4 → 42.6 (+0.6%)
  • string: 47.7 → 37.8 (-20.8%)
  • large_string: 51.5 → 38.4 (-25.5%)

What changes are included in this PR?

  • Implement optimization
  • Other minor code cleanup
  • Add a benchmark (only somewhat related to this optimization but related to future optimization work)

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize substr() to avoid copying for Utf8, LargeUtf8

1 participant