Skip to content

Optimize default set operations#813

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/set-default-key-fastpaths
May 1, 2026
Merged

Optimize default set operations#813
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/set-default-key-fastpaths

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 30, 2026

Summary

Optimize default-key std.setUnion, std.setInter, and std.setDiff by adding a dedicated default keyF=id merge path.

The new path keeps the current forced key at each merge cursor, compares numbers/strings directly, and writes into a pre-sized Array[Eval] before trimming. Custom keyF keeps the previous implementation, so function-call semantics are unchanged.

This is intentionally narrower than jrsonnet's whole set implementation: the goal is a JVM/JIT-friendly hot path without the large per-type method explosion that did not improve results enough to justify the code size.

JMH

Command:

./mill -i bench.runJmh sjsonnet.bench.RegressionBenchmark.main \
  -p 'path=bench/resources/sjsonnet_suite/setUnion.jsonnet,bench/resources/sjsonnet_suite/setInter.jsonnet,bench/resources/sjsonnet_suite/setDiff.jsonnet' \
  -wi 3 -i 5 -w 1s -r 2s -f 1 -tu ms

Environment:

  • JMH 1.37
  • JVM: Zulu/OpenJDK 21.0.10 aarch64
  • VM options from environment: --enable-native-access=ALL-UNNAMED -Xmx4G -XX:+UseG1GC
Benchmark master this PR delta
setUnion.jsonnet 0.594 ± 0.039 ms/op 0.500 ± 0.036 ms/op 15.8% faster
setInter.jsonnet 0.354 ± 0.024 ms/op 0.334 ± 0.015 ms/op 5.6% faster
setDiff.jsonnet 0.416 ± 0.053 ms/op 0.365 ± 0.011 ms/op 12.3% faster

JMH GC profiler

Command:

./mill -i bench.runJmh sjsonnet.bench.RegressionBenchmark.main \
  -p 'path=bench/resources/sjsonnet_suite/setUnion.jsonnet,bench/resources/sjsonnet_suite/setInter.jsonnet,bench/resources/sjsonnet_suite/setDiff.jsonnet' \
  -wi 2 -i 3 -w 1s -r 1s -f 1 -tu ms -prof gc

Normalized allocation is effectively unchanged/slightly lower:

Benchmark master alloc.norm this PR alloc.norm
setUnion.jsonnet 1,512,845.660 B/op 1,510,378.717 B/op
setInter.jsonnet 1,268,889.087 B/op 1,268,595.813 B/op
setDiff.jsonnet 1,329,900.452 B/op 1,327,755.458 B/op

Native hyperfine

Command shape:

hyperfine --shell=none --warmup 20 --runs 100 \
  '<master native> -J 1 -o /dev/null bench/resources/sjsonnet_suite/setUnion.jsonnet' \
  '<this PR native> -J 1 -o /dev/null bench/resources/sjsonnet_suite/setUnion.jsonnet' \
  '<jrsonnet release> -o /dev/null bench/resources/sjsonnet_suite/setUnion.jsonnet' \
  # repeated for setInter/setDiff
Benchmark master native this PR native jrsonnet release
setUnion.jsonnet 4.7 ± 0.3 ms 4.4 ± 0.1 ms 2.8 ± 0.1 ms
setInter.jsonnet 4.2 ± 0.2 ms 4.2 ± 0.3 ms 2.1 ± 0.1 ms
setDiff.jsonnet 4.2 ± 0.1 ms 4.2 ± 0.2 ms 2.4 ± 0.1 ms

Native CLI remains dominated by startup/whole-program costs on these small inputs. jrsonnet is still about 1.9-2.2x faster for the native CLI set cases.

Correctness / validation

./mill -i 'sjsonnet.jvm[3.3.7].test.testOnly' \
  sjsonnet.StdWithKeyFTests \
  sjsonnet.StdSetUnionTests \
  sjsonnet.StdSetDiffTests \
  sjsonnet.StdLibOfficialCompatibilityTests \
  sjsonnet.PreserveOrderTests

./mill -i 'sjsonnet.jvm[3.3.7]'.test
./mill -i 'sjsonnet.js[3.3.7]'.compile 'sjsonnet.native[3.3.7]'.compile '__.checkFormat'
./mill -i 'sjsonnet.native[3.3.7].nativeLink'

@stephenamar-db stephenamar-db merged commit 3586e07 into databricks:master May 1, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants