perf: optimize sort allocation paths in std.sort and set operations by He-Pin · Pull Request #752 · databricks/sjsonnet

He-Pin · 2026-04-11T20:02:41Z

Motivation

std.sort and set operations (std.setDiff, std.setInter, std.setUnion) internally sort arrays but use allocation-heavy patterns: .map() for forcing lazy values, .map().sortBy() creating intermediate copies, and allocating a new Array(1) per key function call.

Key Design Decisions

Keep in-place Comparator sort for numerics rather than primitive double sort + reconstruction. While primitive Arrays.sort(double[]) is faster on JVM, the Val.cachedNum reconstruction step creates GC pressure on Scala Native (measured 1.26x regression). In-place Comparator sort avoids any reconstruction.
Reuse argument buffer for key function calls: single Array[Val](1) shared across all iterations instead of Array(v.value) per element.
While-loops over .map(): eliminates closure allocation, iterator overhead, and intermediate array copies.

Modifications

sjsonnet/src/sjsonnet/stdlib/SetModule.scala:

Key function path: Reuse single-element argBuf across all key function calls, use while-loop for key computation
Result construction: Pre-allocated Array[Eval] with while-loop instead of sortedIndices.map(i => vs(i))
Strict force: while loop with pre-allocated Array[Val] instead of vs.map(_.value)
String sort (no key): In-place Arrays.sort with Comparator instead of .map(_.cast[Val.Str]).sortBy(_.asString) (2 intermediate array copies)
Array sort (no key): In-place Arrays.sort with Comparator instead of .map(_.cast[Val.Arr]).sortBy(identity) (2 intermediate copies)

Benchmark Results

JMH (JVM, single iteration)

Benchmark	Before (ms/op)	After (ms/op)	Change
bench.06 (sort)	0.359	0.251	-30.1%
setDiff	0.533	0.446	-16.3%
setInter	0.367	0.386	neutral
setUnion	0.727	0.677	-6.9%

Scala Native hyperfine (`-N --warmup 10`)

Benchmark	Before (ms)	After (ms)	Speedup
bench.06 (sort, 30 runs)	7.6 ± 0.4	5.5 ± 0.2	1.39x
setDiff (20 runs)	8.8 ± 0.6	7.7 ± 0.6	1.13x
setInter (20 runs)	8.6 ± 1.3	8.3 ± 0.8	neutral

Analysis

The allocation reduction benefits both JVM and Native, but the impact is more pronounced on Native where GC overhead is higher. The sort benchmark sees the largest improvement because it exercises all the optimized paths (force + sort + result construction). Set operations see moderate improvement since the merge-based intersection/difference only calls sort once on already-sorted inputs.

References

Upstream exploration: he-pin/sjsonnet jit branch b1f64df0

Result

Sort and set operations are faster on both JVM and Scala Native with zero semantic changes. All existing tests pass.

Reduce allocation overhead in sort and set operations: 1. Key function path: reuse a single-element argument buffer across all key function calls, avoiding Array(1) allocation per element. 2. Result construction: use while-loop with pre-allocated array instead of sortedIndices.map() to avoid closure + iterator allocation. 3. Strict force: use while-loop instead of vs.map(_.value) to avoid closure and intermediate array allocation. 4. String sort (no key): in-place Comparator sort via Arrays.sort instead of .map(_.cast[Val.Str]).sortBy(_.asString) which creates two intermediate array copies. 5. Array sort (no key): in-place Comparator sort via Arrays.sort instead of .map(_.cast[Val.Arr]).sortBy(identity) with intermediate copies. JMH (bench.06 sort, 1 iteration): Before: 0.359 ms/op After: 0.251 ms/op (-30.1%) Scala Native hyperfine (bench.06, --warmup 10, -N, 30 runs): Before: 7.6 ± 0.4 ms After: 5.5 ± 0.2 ms (1.39x faster) Set operations (native, 20 runs): setDiff: 8.8 → 7.7 ms (1.13x faster) setInter: 8.6 → 8.3 ms (neutral) Upstream: he-pin/sjsonnet jit branch b1f64df

He-Pin force-pushed the perf/primitive-double-sort branch from b6906f6 to 42d465a Compare April 11, 2026 20:14

He-Pin force-pushed the perf/primitive-double-sort branch from 42d465a to aa7ee14 Compare April 11, 2026 20:53

He-Pin marked this pull request as ready for review April 11, 2026 21:47

stephenamar-db merged commit a17ec44 into databricks:master Apr 11, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize sort allocation paths in std.sort and set operations#752

perf: optimize sort allocation paths in std.sort and set operations#752
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/primitive-double-sort

He-Pin commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 11, 2026

Motivation

Key Design Decisions

Modifications

Benchmark Results

JMH (JVM, single iteration)

Scala Native hyperfine (-N --warmup 10)

Analysis

References

Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Scala Native hyperfine (`-N --warmup 10`)