Skip to content

perf: speed up constant string joins#825

Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/constant-array-join
Open

perf: speed up constant string joins#825
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/constant-array-join

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 7, 2026

Summary

  • Keep large constant std.makeArray(... literal) arrays as an O(1) constant array view.
  • Fast-path std.join over constant/repeated string arrays with exact StringBuilder sizing.
  • Mark ASCII JSON-safe joined/repeated strings so byte rendering can skip escape scanning and UTF-8 encoding.
  • Add JVM/Native UTF-16-lane SWAR ascii-safe checks; Scala.js keeps a scalar fallback.
  • Drop remaining private[this] in touched Scala sources.

Benchmarks

Baseline: latest upstream/master at 8b67cb1e. This PR: 9e7c7f13.

JMH + GC profiler for bench/resources/cpp_suite/large_string_join.jsonnet, lower is better:

Metric upstream/master This PR Delta
runtime 0.540 ms/op 0.271 ms/op -49.8%
gc.alloc.rate.norm 1,531,272 B/op 632,976 B/op -58.7%

Scala Native hyperfine against latest source-built jrsonnet master 5b43fa8 (jrsonnet 0.5.0-pre98), lower is better:

Command Mean
sjsonnet native, upstream/master 5.3 +/- 1.6 ms
sjsonnet native, this PR 4.6 +/- 1.1 ms
latest jrsonnet 8.0 +/- 1.3 ms

The Native CLI run is short and outlier-heavy, but the current mean has this PR ahead of latest jrsonnet on large_string_join. The stronger signal is the JMH runtime/allocation reduction above.

Validation

  • ./mill -i __.checkFormat
  • git diff --check
  • ./mill -i 'sjsonnet.jvm[3.3.7].test'
  • ./mill -i 'sjsonnet.js[3.3.7].test'
  • ./mill -i 'sjsonnet.js[3.3.7].compile' 'sjsonnet.native[3.3.7].compile'
  • ./mill -i 'sjsonnet.native[3.3.7].nativeLink'
  • ./mill -i bench.runJmh sjsonnet.bench.RegressionBenchmark.main -p path=bench/resources/cpp_suite/large_string_join.jsonnet -prof gc
  • hyperfine --warmup 10 --min-runs 50

References

  • Head: 9e7c7f1330230966f65dbdd4a5765015b6826366
  • Latest source-built jrsonnet: 5b43fa88b8c43856dd5a2daa9c5c251153c5e14d

@He-Pin He-Pin marked this pull request as draft May 7, 2026 19:20
@He-Pin He-Pin marked this pull request as ready for review May 7, 2026 19:53
@He-Pin He-Pin force-pushed the perf/constant-array-join branch from 7dd77cb to 9e7c7f1 Compare May 7, 2026 19:54
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 8, 2026

Closing as superseded by #828, which carries the TOML and repeated join work with current docs-aligned benchmark data.

@He-Pin He-Pin closed this May 8, 2026
@He-Pin He-Pin reopened this May 8, 2026
@He-Pin He-Pin marked this pull request as draft May 8, 2026 05:21
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 8, 2026

Reopened and moved back to draft rather than closing. It is not negative, but it is currently superseded by #828, so it should not be ready for review unless we decide to keep it as a separate smaller join-only PR.

@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 05:22
@He-Pin He-Pin marked this pull request as draft May 8, 2026 05:23
@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 05:23
@He-Pin He-Pin marked this pull request as draft May 8, 2026 05:23
@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant