perf: optimize string comparison fast path and array flatten by He-Pin · Pull Request #768 · databricks/sjsonnet

He-Pin · 2026-04-12T14:18:04Z

Motivation

The realistic2 benchmark (105 lines, complex workload with format strings, cross-product comparisons, and array flattening) is one of our biggest performance gaps vs jrsonnet. This PR targets two hot paths identified in the benchmark.

Key Design Decision

String comparison: In compareStringsByCodepoint, check character equality first before performing surrogate checks. For strings with long shared prefixes (common in the realistic2 benchmark where generated names differ only in suffixes), this skips two Character.isSurrogate() calls per matching position.
Array flatten: When std.join([], arrays) is used to flatten arrays (as in realistic2), pre-compute the total size and use System.arraycopy for bulk transfer instead of incremental ArrayBuilder growth.

Modification

Util.scala — compareStringsByCodepoint:

Check c1 == c2 first → skip surrogate checks (equal chars produce equal codepoints regardless of surrogate status; pairs are compared char-by-char)
For non-surrogate differences, use c1 - c2 (direct subtraction) instead of Character.compare

StringModule.scala — Join (array separator path):

Detect empty separator (sepArr.length == 0) for flatten fast path
First pass: count total elements across all sub-arrays
Allocate exact-sized Array[Eval] once
Second pass: System.arraycopy from each sub-array into result
Convert for loop to while loop for non-empty separator path

Benchmark Results

JMH (JVM, single fork)

Benchmark	Before (ms/op)	After (ms/op)	Change
realistic2	61.774	54.572	-11.7% ✅
comparison	16.204	15.982	-1.4%
setUnion	0.638	0.593	-7.1%
gen_big_object	1.122	0.934	-16.8%
reverse	7.033	6.706	-4.7%

No regressions across 35 benchmarks.

Hyperfine (Scala Native vs jrsonnet)

Benchmark	sjsonnet (ms)	jrsonnet (ms)	Ratio
realistic2	155.0 ± 2.1	100.6 ± 1.9	1.54x (was 1.61x)
comparison	16.9 ± 0.9	12.4 ± 1.0	1.36x (unchanged)

Analysis

The realistic2 benchmark generates ~63,500 objects using cross-product comprehensions where p != q requires string comparisons. Most generated strings share long prefixes (e.g. AAAAAAA...xxxxxxxBBBBBBBlocation...), making the c1==c2 fast path very effective — it skips surrogate checks for 90%+ of character positions.

The array flatten optimization benefits the std.join([], [...]) calls that concatenate 25 arrays of 50-2450 elements each. Pre-sizing eliminates ~5 ArrayBuilder resize-and-copy cycles.

References

jit branch commit: af4832f2 (compareStringsByCodepoint optimization)
Benchmark: bench/resources/cpp_suite/realistic2.jsonnet

Result

✅ All 420 tests pass across all platforms and Scala versions.
✅ JMH: 11.7% improvement on realistic2, no regressions.
✅ Hyperfine: realistic2 gap reduced from 1.61x to 1.54x vs jrsonnet.

JMH Benchmark Results (vs master `0d13274`)

Benchmark	Master (ms/op)	This PR (ms/op)	Change
assertions	0.207	0.209	+1.0%
improved base64	0.156	0.152	-2.6%
improved base64Decode	0.123	0.116	-5.7%
regressed base64DecodeBytes	5.899	6.215	+5.4%
base64_byte_array	0.803	0.788	-1.9%
bench.01	0.052	0.052	+0.0%
bench.02	35.401	35.695	+0.8%
regressed bench.03	9.583	10.129	+5.7%
improved bench.04	0.122	0.119	-2.5%
bench.06	0.224	0.221	-1.3%
improved bench.07	3.332	3.183	-4.5%
regressed bench.08	0.038	0.039	+2.6%
regressed bench.09	0.041	0.044	+7.3%
comparison	0.028	0.028	+0.0%
comparison2	18.681	18.590	-0.5%
improved escapeStringJson	0.032	0.031	-3.1%
regressed foldl	0.077	0.082	+6.5%
gen_big_object	0.918	0.908	-1.1%
large_string_join	0.555	0.551	-0.7%
large_string_template	1.600	1.609	+0.6%
regressed lstripChars	0.113	0.116	+2.7%
manifestJsonEx	0.052	0.052	+0.0%
manifestTomlEx	0.069	0.070	+1.4%
regressed manifestYamlDoc	0.055	0.057	+3.6%
regressed member	0.656	0.684	+4.3%
regressed parseInt	0.032	0.033	+3.1%
realistic1	1.661	1.666	+0.3%
realistic2	57.541	57.650	+0.2%
reverse	6.717	6.707	-0.1%
improved rstripChars	0.119	0.116	-2.5%
setDiff	0.431	0.423	-1.9%
setInter	0.371	0.369	-0.5%
setUnion	0.604	0.598	-1.0%
stripChars	0.117	0.117	+0.0%
regressed substr	0.057	0.059	+3.5%

Summary: 6 improvements, 10 regressions, 19 neutral
Platform: Apple Silicon, JMH single-shot avg

stephenamar-db · 2026-04-12T16:46:24Z

I don't understand what the stdlib changes are about.

He-Pin · 2026-04-12T17:10:03Z

I wanted to optimize the Scala native startup time, so the std is loaded only when it is called. But the problem is that this only helps very simple scripts when benching.

He-Pin · 2026-04-12T17:11:00Z

The code miss commited some commit, let me remove it.

Restructure compareStringsByCodepoint to check c1 == c2 first, skipping surrogate checks entirely for equal characters. This benefits the common case where strings share long prefixes (e.g. object key comparisons). For surrogate pairs, equal high surrogates at position i lead to comparing low surrogates at i+1, producing the correct codepoint ordering without needing to decode the full codepoint. Also uses direct char subtraction (c1 - c2) instead of Character.compare for the non-surrogate different case, saving a method call.

He-Pin marked this pull request as ready for review April 12, 2026 14:20

He-Pin commented Apr 12, 2026

View reviewed changes

Comment thread sjsonnet/src/sjsonnet/stdlib/StringModule.scala Outdated

He-Pin marked this pull request as draft April 12, 2026 17:10

He-Pin force-pushed the perf/string-compare-fast-path branch 2 times, most recently from 70e87c8 to 89076f4 Compare April 12, 2026 17:15

He-Pin marked this pull request as ready for review April 12, 2026 17:16

He-Pin force-pushed the perf/string-compare-fast-path branch from 89076f4 to 177306e Compare April 12, 2026 17:22

He-Pin mentioned this pull request Apr 12, 2026

performance optimization #666

Open

He-Pin marked this pull request as draft April 12, 2026 18:57

He-Pin marked this pull request as ready for review April 12, 2026 19:07

stephenamar-db merged commit 4d521a8 into databricks:master Apr 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize string comparison fast path and array flatten#768

perf: optimize string comparison fast path and array flatten#768
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/string-compare-fast-path

He-Pin commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

stephenamar-db commented Apr 12, 2026

Uh oh!

He-Pin commented Apr 12, 2026

Uh oh!

He-Pin commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

Modification

Benchmark Results

JMH (JVM, single fork)

Hyperfine (Scala Native vs jrsonnet)

Analysis

References

Result

JMH Benchmark Results (vs master 0d13274)

Uh oh!

Uh oh!

stephenamar-db commented Apr 12, 2026

Uh oh!

He-Pin commented Apr 12, 2026

Uh oh!

He-Pin commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 12, 2026 •

edited

Loading

JMH Benchmark Results (vs master `0d13274`)