Skip to content

perf: convert stdlib hot paths from Scala collections to while-loops#765

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/stdlib-while-loops-v2
Apr 12, 2026
Merged

perf: convert stdlib hot paths from Scala collections to while-loops#765
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/stdlib-while-loops-v2

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation

Scala collection methods like forall, exists, map, mkString, indexWhere, and forEach create closures, iterators, and intermediate collections that are harder for JIT/AOT compilers to optimize. In hot stdlib paths, these can be replaced with imperative while-loops for better performance.

Key Design Decision

Replace functional collection operations with while-loops only in hot stdlib paths where profiling shows measurable impact. This improves JIT inlining, reduces allocation pressure, and produces more predictable native code under Scala Native LTO.

Modification

ObjectModule (6 methods):

  • objectFields, objectFieldsAll, objectFieldsEx: iterator → while-loop over allKeysArray
  • getObjValuesFromKeys: for-comprehension → while-loop
  • objectKeysValues, objectKeysValuesAll: for-comprehension → while-loop with pre-allocated array
  • mergePatch.distinctKeys: (lKeys ++ rKeys).distinctLinkedHashSet-based dedup

ArrayModule (7 methods):

  • all: forall → early-exit while-loop returning Val.staticTrue/False
  • any: exists → early-exit while-loop returning Val.staticTrue/False
  • member (array path): indexWhere → while-loop with found flag
  • contains: indexWhere → while-loop with found flag
  • remove: indexWhere → while-loop with idx sentinel
  • minArray/maxArray (keyF path): map.zipWithIndex.min/max → single-pass while-loop
  • repeat: ArrayBuilder ++= lazyArraySystem.arraycopy

ManifestModule (1 method):

  • manifestYamlStream: map.mkStringStringBuilder with while-loop

StringModule (1 method):

  • escapeStringXML: for (c <- str)while with charAt

Val.Obj (1 method):

  • visibleKeyNames: forEach lambda → entrySet iterator while-loop

Benchmark Results

JMH (JVM, single iteration, lower is better)

Benchmark Before (ms/op) After (ms/op) Change
gen_big_object 1.122 0.933 -16.8%
large_string_template 2.432 1.659 -31.8%
realistic2 61.774 56.379 -8.7%
large_string_join 0.582 0.561 -3.6%
member 0.665 0.661 -0.6%
setDiff 0.426 0.416 -2.3%
setInter 0.377 0.371 -1.6%
setUnion 0.638 0.623 -2.3%
bench.02 35.330 34.461 -2.5%

No regressions observed (bench.07 variance is JMH single-fork noise).

Hyperfine (Scala Native vs jrsonnet, Apple Silicon)

Benchmark sjsonnet (ms) jrsonnet (ms) Ratio
gen_big_object 10.1 13.2 1.31x faster
large_string_join 9.0 7.7 1.17x slower
large_string_template 14.3 6.9 2.07x slower
realistic2 161.2 100.4 1.61x slower
member 8.4 5.8 1.45x slower
comparison 18.3 13.5 1.36x slower

Analysis

The biggest JMH win is large_string_template (-31.8%) which benefits from the ObjectModule while-loop conversions since format rendering triggers object field enumeration. The gen_big_object improvement (-16.8%) directly measures ObjectModule throughput. realistic2 (-8.7%) shows compound benefits from all modules.

On Scala Native, gen_big_object is now definitively faster than jrsonnet (1.31x). The remaining gaps in string-heavy benchmarks require rope strings (#761) and format-level optimizations.

References

Ported from jit branch exploration commits:

  • af4832f (std.all/any/member while-loops)
  • cd612df (escapeStringXML, std.contains/remove while-loops)
  • d1629fd (manifestYamlStream StringBuilder)
  • 8cd8d31 (minArray/maxArray single-pass)
  • 9149654 (visibleKeyNames entrySet iterator)

Result

All 420 tests pass across JVM/JS/WASM/Native × Scala 3.3.7/2.13.18/2.12.21. Consistent improvements on object-heavy and realistic workloads with no regressions.

Replace functional collection operations (forall, exists, map, mkString,
indexWhere, ++= , for-each) with imperative while-loops across stdlib modules
for better JIT/AOT optimization and reduced allocation:

- ObjectModule: objectFields, objectFieldsAll, objectFieldsEx, getObjValuesFromKeys,
  objectKeysValues, objectKeysValuesAll, mergePatch distinctKeys (LinkedHashSet)
- ArrayModule: all, any, member, contains, remove, minArray/maxArray (keyF path),
  repeat (System.arraycopy)
- ManifestModule: manifestYamlStream (StringBuilder)
- StringModule: escapeStringXML
- Val.Obj: visibleKeyNames (entrySet iterator)

Ported from jit branch commits: af4832f, cd612df, d1629fd, 8cd8d31, 9149654
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 13:20
var i = 1
while (i < strict.length) {
val v = func.apply1(strict(i), pos.fileScope.noOffsetPos)(ev, TailstrictModeDisabled)
if (ev.compare(v, bestVal) < 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minArray/maxArray with keyF throws ArrayIndexOutOfBoundsException on empty arrays (func.apply1(strict(0), ...)). The original .map(...).min also threw, but with an unclear error. Suggest adding an explicit check: if (strict.isEmpty) Error.fail("min/maxArray cannot be called on an empty array with keyF").

val la = arr.asLazyArray
var i = 0
var found = false
while (i < la.length && !found) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains and remove use var found = false flag, inconsistent with the early return style of all/any above. Suggest using early return to eliminate the flag:

var i = 0
while (i < la.length) {
  if (ev.equal(la(i).value, elem)) return true
  i += 1
}
false

Same for remove.

var li = 0
while (li < lKeys.length) { allKeys.add(lKeys(li)); li += 1 }
var ri = 0
while (ri < rKeys.length) { allKeys.add(rKeys(ri)); ri += 1 }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkedHashSet initial capacity lKeys.length + rKeys.length over-allocates (deduped elements are always fewer than the sum). Suggest using math.max(lKeys.length, rKeys.length) as a tighter bound.

@stephenamar-db stephenamar-db merged commit 89c64be into databricks:master Apr 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants