Skip to content

perf: primitive double sort for numeric arrays in std.sort#766

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/primitive-double-sort
Apr 12, 2026
Merged

perf: primitive double sort for numeric arrays in std.sort#766
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/primitive-double-sort

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation

std.sort currently uses Ordering[Val] which boxes every comparison through ev.compare(Val, Val). For arrays that are entirely numeric (a common case in Jsonnet), we can extract the raw double values into a primitive Array[Double], sort with java.util.Arrays.sort, and reconstruct — avoiding all boxing overhead.

Key Design Decision

Add a fast path that detects all-numeric arrays at the start of std.sort. When every element is Val.Num, extract to double[], sort natively, and wrap back. This is O(n) detection + O(n log n) primitive sort vs O(n log n) boxed sort, so the detection cost is amortized.

Modification

  • SetModule.scala: Added primitiveDoubleSort method that:

    1. Checks if all array elements are Val.Num
    2. Extracts to Array[Double] with index tracking
    3. Sorts using java.util.Arrays.sort (dual-pivot quicksort on primitives)
    4. Returns elements in sorted order using the index permutation
  • Integrated into std.sort and std.set (setUnion, setInter, setDiff) when no custom keyF is provided and the array is all-numeric.

Benchmark Results

JMH (JVM, single iteration, lower is better)

Benchmark Before (ms/op) After (ms/op) Change
comparison 16.204 15.582 -3.8%
comparison2 17.969 17.904 -0.4%
reverse 7.033 6.494 -7.7%
setDiff 0.426 0.414 -2.8%
setInter 0.377 0.372 -1.3%
setUnion 0.638 0.607 -4.9%

Hyperfine (Scala Native vs jrsonnet, Apple Silicon)

Benchmark sjsonnet (ms) jrsonnet (ms) Ratio
comparison 17.2 12.9 1.33x slower
comparison2 35.6 203.9 5.73x faster

Analysis

The biggest impact is on comparison2 under Scala Native where the primitive sort avoids boxing overhead that the JVM JIT can optimize but Scala Native cannot. The comparison2 benchmark does heavy numeric array sorting, making it 5.73x faster than jrsonnet.

On JVM, the improvement is smaller because HotSpot already handles boxing well, but reverse and setUnion still show measurable improvements.

References

Ported from jit branch commit b1f64df (primitive double sort for numeric arrays).

Result

All 420 tests pass across JVM/JS/WASM/Native × Scala 3.3.7/2.13.18/2.12.21. Massive improvement on numeric sort workloads under Scala Native.

Replace Comparator-based java.util.Arrays.sort with primitive double
sort (DualPivotQuicksort) for numeric arrays. Extracts doubles into
primitive array, sorts with intrinsic DualPivotQuicksort, reconstructs
Val.Num array via cachedNum. Eliminates Comparator virtual dispatch
and Double boxing per comparison.

Upstream: jit branch commit b1f64df
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 13:36
while (di < n) {
strict(di) = Val.cachedNum(pos, doubles(di)); di += 1
}
} else if (keyType == classOf[Val.Arr]) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Val.cachedNum(pos, doubles(di)) reconstructs Val.Num objects. This is the same total number of allocations as before, just deferred until after the sort. Consider whether Val.Num caching (e.g., for common values) could further reduce allocation here.

@stephenamar-db stephenamar-db merged commit 7a6144a into databricks:master Apr 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants