perf: optimize string comparison fast path and array flatten#768
Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom Apr 12, 2026
Merged
Conversation
He-Pin
commented
Apr 12, 2026
Collaborator
|
I don't understand what the stdlib changes are about. |
Contributor
Author
|
I wanted to optimize the Scala native startup time, so the std is loaded only when it is called. But the problem is that this only helps very simple scripts when benching. |
Contributor
Author
|
The code miss commited some commit, let me remove it. |
70e87c8 to
89076f4
Compare
Restructure compareStringsByCodepoint to check c1 == c2 first, skipping surrogate checks entirely for equal characters. This benefits the common case where strings share long prefixes (e.g. object key comparisons). For surrogate pairs, equal high surrogates at position i lead to comparing low surrogates at i+1, producing the correct codepoint ordering without needing to decode the full codepoint. Also uses direct char subtraction (c1 - c2) instead of Character.compare for the non-surrogate different case, saving a method call.
89076f4 to
177306e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
realistic2benchmark (105 lines, complex workload with format strings, cross-product comparisons, and array flattening) is one of our biggest performance gaps vs jrsonnet. This PR targets two hot paths identified in the benchmark.Key Design Decision
String comparison: In
compareStringsByCodepoint, check character equality first before performing surrogate checks. For strings with long shared prefixes (common in the realistic2 benchmark where generated names differ only in suffixes), this skips twoCharacter.isSurrogate()calls per matching position.Array flatten: When
std.join([], arrays)is used to flatten arrays (as in realistic2), pre-compute the total size and useSystem.arraycopyfor bulk transfer instead of incrementalArrayBuildergrowth.Modification
Util.scala —
compareStringsByCodepoint:c1 == c2first → skip surrogate checks (equal chars produce equal codepoints regardless of surrogate status; pairs are compared char-by-char)c1 - c2(direct subtraction) instead ofCharacter.compareStringModule.scala —
Join(array separator path):sepArr.length == 0) for flatten fast pathArray[Eval]onceSystem.arraycopyfrom each sub-array into resultforloop towhileloop for non-empty separator pathBenchmark Results
JMH (JVM, single fork)
No regressions across 35 benchmarks.
Hyperfine (Scala Native vs jrsonnet)
Analysis
The
realistic2benchmark generates ~63,500 objects using cross-product comprehensions wherep != qrequires string comparisons. Most generated strings share long prefixes (e.g.AAAAAAA...xxxxxxxBBBBBBBlocation...), making the c1==c2 fast path very effective — it skips surrogate checks for 90%+ of character positions.The array flatten optimization benefits the
std.join([], [...])calls that concatenate 25 arrays of 50-2450 elements each. Pre-sizing eliminates ~5 ArrayBuilder resize-and-copy cycles.References
af4832f2(compareStringsByCodepoint optimization)bench/resources/cpp_suite/realistic2.jsonnetResult
✅ All 420 tests pass across all platforms and Scala versions.
✅ JMH: 11.7% improvement on realistic2, no regressions.
✅ Hyperfine: realistic2 gap reduced from 1.61x to 1.54x vs jrsonnet.
JMH Benchmark Results (vs master 0d13274)
Summary: 6 improvements, 10 regressions, 19 neutral
Platform: Apple Silicon, JMH single-shot avg