perf: convert stdlib hot paths from Scala collections to while-loops#765
Conversation
Replace functional collection operations (forall, exists, map, mkString, indexWhere, ++= , for-each) with imperative while-loops across stdlib modules for better JIT/AOT optimization and reduced allocation: - ObjectModule: objectFields, objectFieldsAll, objectFieldsEx, getObjValuesFromKeys, objectKeysValues, objectKeysValuesAll, mergePatch distinctKeys (LinkedHashSet) - ArrayModule: all, any, member, contains, remove, minArray/maxArray (keyF path), repeat (System.arraycopy) - ManifestModule: manifestYamlStream (StringBuilder) - StringModule: escapeStringXML - Val.Obj: visibleKeyNames (entrySet iterator) Ported from jit branch commits: af4832f, cd612df, d1629fd, 8cd8d31, 9149654
| var i = 1 | ||
| while (i < strict.length) { | ||
| val v = func.apply1(strict(i), pos.fileScope.noOffsetPos)(ev, TailstrictModeDisabled) | ||
| if (ev.compare(v, bestVal) < 0) { |
There was a problem hiding this comment.
minArray/maxArray with keyF throws ArrayIndexOutOfBoundsException on empty arrays (func.apply1(strict(0), ...)). The original .map(...).min also threw, but with an unclear error. Suggest adding an explicit check: if (strict.isEmpty) Error.fail("min/maxArray cannot be called on an empty array with keyF").
| val la = arr.asLazyArray | ||
| var i = 0 | ||
| var found = false | ||
| while (i < la.length && !found) { |
There was a problem hiding this comment.
contains and remove use var found = false flag, inconsistent with the early return style of all/any above. Suggest using early return to eliminate the flag:
var i = 0
while (i < la.length) {
if (ev.equal(la(i).value, elem)) return true
i += 1
}
falseSame for remove.
| var li = 0 | ||
| while (li < lKeys.length) { allKeys.add(lKeys(li)); li += 1 } | ||
| var ri = 0 | ||
| while (ri < rKeys.length) { allKeys.add(rKeys(ri)); ri += 1 } |
There was a problem hiding this comment.
LinkedHashSet initial capacity lKeys.length + rKeys.length over-allocates (deduped elements are always fewer than the sum). Suggest using math.max(lKeys.length, rKeys.length) as a tighter bound.
Motivation
Scala collection methods like
forall,exists,map,mkString,indexWhere, andforEachcreate closures, iterators, and intermediate collections that are harder for JIT/AOT compilers to optimize. In hot stdlib paths, these can be replaced with imperative while-loops for better performance.Key Design Decision
Replace functional collection operations with while-loops only in hot stdlib paths where profiling shows measurable impact. This improves JIT inlining, reduces allocation pressure, and produces more predictable native code under Scala Native LTO.
Modification
ObjectModule (6 methods):
objectFields,objectFieldsAll,objectFieldsEx: iterator → while-loop overallKeysArraygetObjValuesFromKeys: for-comprehension → while-loopobjectKeysValues,objectKeysValuesAll: for-comprehension → while-loop with pre-allocated arraymergePatch.distinctKeys:(lKeys ++ rKeys).distinct→LinkedHashSet-based dedupArrayModule (7 methods):
all:forall→ early-exit while-loop returningVal.staticTrue/Falseany:exists→ early-exit while-loop returningVal.staticTrue/Falsemember(array path):indexWhere→ while-loop withfoundflagcontains:indexWhere→ while-loop withfoundflagremove:indexWhere→ while-loop withidxsentinelminArray/maxArray(keyF path):map.zipWithIndex.min/max→ single-pass while-looprepeat:ArrayBuilder ++= lazyArray→System.arraycopyManifestModule (1 method):
manifestYamlStream:map.mkString→StringBuilderwith while-loopStringModule (1 method):
escapeStringXML:for (c <- str)→whilewithcharAtVal.Obj (1 method):
visibleKeyNames:forEachlambda →entrySetiterator while-loopBenchmark Results
JMH (JVM, single iteration, lower is better)
No regressions observed (bench.07 variance is JMH single-fork noise).
Hyperfine (Scala Native vs jrsonnet, Apple Silicon)
Analysis
The biggest JMH win is
large_string_template(-31.8%) which benefits from the ObjectModule while-loop conversions since format rendering triggers object field enumeration. Thegen_big_objectimprovement (-16.8%) directly measures ObjectModule throughput.realistic2(-8.7%) shows compound benefits from all modules.On Scala Native,
gen_big_objectis now definitively faster than jrsonnet (1.31x). The remaining gaps in string-heavy benchmarks require rope strings (#761) and format-level optimizations.References
Ported from jit branch exploration commits:
Result
All 420 tests pass across JVM/JS/WASM/Native × Scala 3.3.7/2.13.18/2.12.21. Consistent improvements on object-heavy and realistic workloads with no regressions.