Skip to content

perf: improve StringArray builder append paths#331

Merged
CurtHagenlocher merged 4 commits intoapache:mainfrom
InCerryGit:perf/string-array-builder-append
Apr 26, 2026
Merged

perf: improve StringArray builder append paths#331
CurtHagenlocher merged 4 commits intoapache:mainfrom
InCerryGit:perf/string-array-builder-append

Conversation

@InCerryGit
Copy link
Copy Markdown
Contributor

Summary

  • Avoid temporary byte-array allocations for small StringArray.Builder.Append(string) values by encoding into stack memory before appending.
  • Pre-reserve offsets, validity, and value-buffer capacity for known-count AppendRange inputs.
  • Add focused correctness coverage for nulls, empty strings, custom encodings, large-string fallback, collection inputs, and non-collection enumerables.

AppendRange(ICollection<string>) now performs a counting prepass to reserve value-buffer capacity before appending, so collection inputs are enumerated twice by design.

Benchmark

BenchmarkDotNet ShortRun, StringBuilderAppendBenchmark, 10,000 ASCII strings of length 32:

Method Before After
AppendSmallStrings 432.0 us / 1.66 MB 341.0 us / 1157.5 KB
AppendRangeSmallStrings 426.2 us / 1.66 MB 311.8 us / 353.68 KB

Validation

  • dotnet test test/Apache.Arrow.Tests/Apache.Arrow.Tests.csproj -c Release --filter "FullyQualifiedName~Apache.Arrow.Tests.StringArrayTests"
  • rtk dotnet build "Apache.Arrow.sln" -c Release
  • LSP diagnostics clean on changed files
  • Code review completed before commit; no blockers found

BenchmarkDotNet ShortRun, StringBuilderAppendBenchmark, 10,000 ASCII strings of length 32:

- AppendSmallStrings: 432.0 us / 1.66 MB -> 341.0 us / 1157.5 KB

- AppendRangeSmallStrings: 426.2 us / 1.66 MB -> 311.8 us / 353.68 KB
Comment thread src/Apache.Arrow/Arrays/StringArray.cs
Write encoded string bytes directly into the builder value buffer after reserving capacity, avoiding both stackalloc staging and an extra copy while keeping offsets and validity updates unchanged.

BenchmarkDotNet (StringBuilderAppendBenchmark): AppendSmallStrings 393.1 us / 1157.5 KB; AppendRangeSmallStrings 290.5 us / 353.68 KB.
Encapsulate the reserve/get-span/advance sequence in BinaryArray.BuilderBase so StringArray.Builder can encode directly into the value buffer without owning the low-level buffer length bookkeeping.

BenchmarkDotNet (StringBuilderAppendBenchmark): AppendSmallStrings 383.7 us / 1157.5 KB; AppendRangeSmallStrings 294.4 us / 353.68 KB.
Copy link
Copy Markdown
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread test/Apache.Arrow.Tests/StringArrayTests.cs
@CurtHagenlocher CurtHagenlocher merged commit c0e20b6 into apache:main Apr 26, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants