perf: optimize sort allocation paths in std.sort and set operations#752
Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom Apr 11, 2026
Merged
Conversation
b6906f6 to
42d465a
Compare
Reduce allocation overhead in sort and set operations: 1. Key function path: reuse a single-element argument buffer across all key function calls, avoiding Array(1) allocation per element. 2. Result construction: use while-loop with pre-allocated array instead of sortedIndices.map() to avoid closure + iterator allocation. 3. Strict force: use while-loop instead of vs.map(_.value) to avoid closure and intermediate array allocation. 4. String sort (no key): in-place Comparator sort via Arrays.sort instead of .map(_.cast[Val.Str]).sortBy(_.asString) which creates two intermediate array copies. 5. Array sort (no key): in-place Comparator sort via Arrays.sort instead of .map(_.cast[Val.Arr]).sortBy(identity) with intermediate copies. JMH (bench.06 sort, 1 iteration): Before: 0.359 ms/op After: 0.251 ms/op (-30.1%) Scala Native hyperfine (bench.06, --warmup 10, -N, 30 runs): Before: 7.6 ± 0.4 ms After: 5.5 ± 0.2 ms (1.39x faster) Set operations (native, 20 runs): setDiff: 8.8 → 7.7 ms (1.13x faster) setInter: 8.6 → 8.3 ms (neutral) Upstream: he-pin/sjsonnet jit branch b1f64df
42d465a to
aa7ee14
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.sortand set operations (std.setDiff,std.setInter,std.setUnion) internally sort arrays but use allocation-heavy patterns:.map()for forcing lazy values,.map().sortBy()creating intermediate copies, and allocating a newArray(1)per key function call.Key Design Decisions
Arrays.sort(double[])is faster on JVM, theVal.cachedNumreconstruction step creates GC pressure on Scala Native (measured 1.26x regression). In-place Comparator sort avoids any reconstruction.Array[Val](1)shared across all iterations instead ofArray(v.value)per element..map(): eliminates closure allocation, iterator overhead, and intermediate array copies.Modifications
sjsonnet/src/sjsonnet/stdlib/SetModule.scala:argBufacross all key function calls, use while-loop for key computationArray[Eval]with while-loop instead ofsortedIndices.map(i => vs(i))whileloop with pre-allocatedArray[Val]instead ofvs.map(_.value)Arrays.sortwith Comparator instead of.map(_.cast[Val.Str]).sortBy(_.asString)(2 intermediate array copies)Arrays.sortwith Comparator instead of.map(_.cast[Val.Arr]).sortBy(identity)(2 intermediate copies)Benchmark Results
JMH (JVM, single iteration)
Scala Native hyperfine (
-N --warmup 10)Analysis
The allocation reduction benefits both JVM and Native, but the impact is more pronounced on Native where GC overhead is higher. The sort benchmark sees the largest improvement because it exercises all the optimized paths (force + sort + result construction). Set operations see moderate improvement since the merge-based intersection/difference only calls sort once on already-sorted inputs.
References
b1f64df0Result
Sort and set operations are faster on both JVM and Scala Native with zero semantic changes. All existing tests pass.