Skip to content

perf: optimize string comparison fast path and array flatten#768

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/string-compare-fast-path
Apr 12, 2026
Merged

perf: optimize string comparison fast path and array flatten#768
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/string-compare-fast-path

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation

The realistic2 benchmark (105 lines, complex workload with format strings, cross-product comparisons, and array flattening) is one of our biggest performance gaps vs jrsonnet. This PR targets two hot paths identified in the benchmark.

Key Design Decision

  1. String comparison: In compareStringsByCodepoint, check character equality first before performing surrogate checks. For strings with long shared prefixes (common in the realistic2 benchmark where generated names differ only in suffixes), this skips two Character.isSurrogate() calls per matching position.

  2. Array flatten: When std.join([], arrays) is used to flatten arrays (as in realistic2), pre-compute the total size and use System.arraycopy for bulk transfer instead of incremental ArrayBuilder growth.

Modification

Util.scalacompareStringsByCodepoint:

  • Check c1 == c2 first → skip surrogate checks (equal chars produce equal codepoints regardless of surrogate status; pairs are compared char-by-char)
  • For non-surrogate differences, use c1 - c2 (direct subtraction) instead of Character.compare

StringModule.scalaJoin (array separator path):

  • Detect empty separator (sepArr.length == 0) for flatten fast path
  • First pass: count total elements across all sub-arrays
  • Allocate exact-sized Array[Eval] once
  • Second pass: System.arraycopy from each sub-array into result
  • Convert for loop to while loop for non-empty separator path

Benchmark Results

JMH (JVM, single fork)

Benchmark Before (ms/op) After (ms/op) Change
realistic2 61.774 54.572 -11.7%
comparison 16.204 15.982 -1.4%
setUnion 0.638 0.593 -7.1%
gen_big_object 1.122 0.934 -16.8%
reverse 7.033 6.706 -4.7%

No regressions across 35 benchmarks.

Hyperfine (Scala Native vs jrsonnet)

Benchmark sjsonnet (ms) jrsonnet (ms) Ratio
realistic2 155.0 ± 2.1 100.6 ± 1.9 1.54x (was 1.61x)
comparison 16.9 ± 0.9 12.4 ± 1.0 1.36x (unchanged)

Analysis

The realistic2 benchmark generates ~63,500 objects using cross-product comprehensions where p != q requires string comparisons. Most generated strings share long prefixes (e.g. AAAAAAA...xxxxxxxBBBBBBBlocation...), making the c1==c2 fast path very effective — it skips surrogate checks for 90%+ of character positions.

The array flatten optimization benefits the std.join([], [...]) calls that concatenate 25 arrays of 50-2450 elements each. Pre-sizing eliminates ~5 ArrayBuilder resize-and-copy cycles.

References

  • jit branch commit: af4832f2 (compareStringsByCodepoint optimization)
  • Benchmark: bench/resources/cpp_suite/realistic2.jsonnet

Result

✅ All 420 tests pass across all platforms and Scala versions.
✅ JMH: 11.7% improvement on realistic2, no regressions.
✅ Hyperfine: realistic2 gap reduced from 1.61x to 1.54x vs jrsonnet.


JMH Benchmark Results (vs master 0d13274)

Benchmark Master (ms/op) This PR (ms/op) Change
assertions 0.207 0.209 +1.0%
improved base64 0.156 0.152 -2.6%
improved base64Decode 0.123 0.116 -5.7%
regressed base64DecodeBytes 5.899 6.215 +5.4%
base64_byte_array 0.803 0.788 -1.9%
bench.01 0.052 0.052 +0.0%
bench.02 35.401 35.695 +0.8%
regressed bench.03 9.583 10.129 +5.7%
improved bench.04 0.122 0.119 -2.5%
bench.06 0.224 0.221 -1.3%
improved bench.07 3.332 3.183 -4.5%
regressed bench.08 0.038 0.039 +2.6%
regressed bench.09 0.041 0.044 +7.3%
comparison 0.028 0.028 +0.0%
comparison2 18.681 18.590 -0.5%
improved escapeStringJson 0.032 0.031 -3.1%
regressed foldl 0.077 0.082 +6.5%
gen_big_object 0.918 0.908 -1.1%
large_string_join 0.555 0.551 -0.7%
large_string_template 1.600 1.609 +0.6%
regressed lstripChars 0.113 0.116 +2.7%
manifestJsonEx 0.052 0.052 +0.0%
manifestTomlEx 0.069 0.070 +1.4%
regressed manifestYamlDoc 0.055 0.057 +3.6%
regressed member 0.656 0.684 +4.3%
regressed parseInt 0.032 0.033 +3.1%
realistic1 1.661 1.666 +0.3%
realistic2 57.541 57.650 +0.2%
reverse 6.717 6.707 -0.1%
improved rstripChars 0.119 0.116 -2.5%
setDiff 0.431 0.423 -1.9%
setInter 0.371 0.369 -0.5%
setUnion 0.604 0.598 -1.0%
stripChars 0.117 0.117 +0.0%
regressed substr 0.057 0.059 +3.5%

Summary: 6 improvements, 10 regressions, 19 neutral
Platform: Apple Silicon, JMH single-shot avg

@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 14:20
Comment thread sjsonnet/src/sjsonnet/stdlib/StringModule.scala Outdated
@stephenamar-db
Copy link
Copy Markdown
Collaborator

I don't understand what the stdlib changes are about.

@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 12, 2026

I wanted to optimize the Scala native startup time, so the std is loaded only when it is called. But the problem is that this only helps very simple scripts when benching.

@He-Pin He-Pin marked this pull request as draft April 12, 2026 17:10
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 12, 2026

The code miss commited some commit, let me remove it.

@He-Pin He-Pin force-pushed the perf/string-compare-fast-path branch 2 times, most recently from 70e87c8 to 89076f4 Compare April 12, 2026 17:15
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 17:16
Restructure compareStringsByCodepoint to check c1 == c2 first, skipping
surrogate checks entirely for equal characters. This benefits the common
case where strings share long prefixes (e.g. object key comparisons).

For surrogate pairs, equal high surrogates at position i lead to comparing
low surrogates at i+1, producing the correct codepoint ordering without
needing to decode the full codepoint.

Also uses direct char subtraction (c1 - c2) instead of Character.compare
for the non-surrogate different case, saving a method call.
@He-Pin He-Pin force-pushed the perf/string-compare-fast-path branch from 89076f4 to 177306e Compare April 12, 2026 17:22
@He-Pin He-Pin marked this pull request as draft April 12, 2026 18:57
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 19:07
@stephenamar-db stephenamar-db merged commit 4d521a8 into databricks:master Apr 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants