perf: lazy range arrays for O(1) std.range creation by He-Pin · Pull Request #771 · databricks/sjsonnet

He-Pin · 2026-04-12T16:10:11Z

Motivation

std.range(from, to) eagerly allocates an Array[Eval] of Val.Num objects. For large ranges like std.range(1, 1000000), this creates 1M objects upfront even when only a subset of elements are ever accessed. This is particularly wasteful in patterns like array comparison where two large ranges share a common prefix and only the tail elements differ.

Key Design Decision

Encode lazy range state directly in Val.Arr using a boolean _isRange flag and _rangeFrom integer field. When isRange is true, arr is null and elements are computed on demand via Val.cachedNum(pos, _rangeFrom + i). This avoids introducing a new subclass and keeps the hot-path value(i) dispatch to a single boolean check.

A separate _isRange boolean flag is used instead of a sentinel value (e.g., Int.MinValue) to avoid collisions with valid range start values.

Reversed ranges store the high end as _rangeFrom and count down (_rangeFrom - i), with correct double-reverse handling that restores the original forward range.

Modification

Val.scala: Add _isRange boolean + _rangeFrom field to Val.Arr. Update value(i), eval(i), asLazyArray, reversed() to handle range state. Add materializeRange() for bulk access. Add Arr.range() factory.
ArrayModule.scala: Replace eager Array[Eval] allocation in std.range with Val.Arr.range(pos, from, size) for O(1) creation.
New test: lazy_range_correctness.jsonnet — covers iteration, concat+comparison, slicing, sort, reverse, double-reverse, member, map, foldl, empty ranges, large ranges.

Benchmark Results

JMH (JVM, Scala 3.3.7)

Benchmark	Before (ms/op)	After (ms/op)	Change
comparison	16.204	0.028	579x faster 🔥
reverse	7.033	6.823	3% faster ✅
bench.06	0.226	0.218	4% faster ✅
setUnion	0.638	0.623	No regression ✅
gen_big_object	1.122	0.973	No regression ✅
All others	—	—	No regression ✅

Hyperfine (Scala Native vs jrsonnet)

Benchmark	sjsonnet (ms)	jrsonnet (ms)	Ratio
comparison	6.5 ± 1.4	13.4 ± 1.9	sjsonnet 2.07x faster ✅
reverse	—	—	~1.04x faster (no regression)
bench.06	~6.8	~5.3	jrsonnet 1.29x (startup-dominated, unchanged)

Previous: comparison was 1.36x slower than jrsonnet → Now 2.07x faster 🎉

Analysis

The comparison benchmark (std.range(1, 1000000) + [1] < std.range(1, 1000000) + [2]) previously allocated 2M Val.Num objects upfront, then compared element by element. With lazy ranges:

O(1) range creation — only stores start + length, no allocation
O(1) concat — ConcatView wraps range + singleton without copying
O(1) comparison — sharedConcatPrefixLength detects shared range prefix (same object reference), skips to the differing tail element

This turns an O(n) operation into O(1) end-to-end.

No regressions detected across all 35+ JMH benchmarks and full test suite (420 tests × 3 Scala versions × 4 platforms).

References

The same optimization lives in jrsonnet and scala to too.

Result

All tests pass (./mill __.test). Comparison benchmark flipped from 1.36x slower to 2.07x faster than jrsonnet.

Replace eager Array[Eval] allocation in std.range with a lazy representation that stores only the start value and length. Elements are computed on demand via Val.cachedNum, making std.range(1, 1000000) O(1) instead of O(n). Key changes: - Add _isRange boolean flag and _rangeFrom field to Val.Arr - Val.Arr.range() factory creates lazy ranges without allocation - value(i)/eval(i) compute range elements on the fly - reversed() creates reversed ranges without materialization - materializeRange() converts to flat array when bulk access needed - Double-reverse correctly restores original range direction The comparison benchmark (std.range(1,1M) + [1] < std.range(1,1M) + [2]) improves from 16.2ms to 0.028ms (579x faster) on JMH, and sjsonnet native is now 2x faster than jrsonnet on this benchmark. Inspired by jrsonnet's RangeArray (arr/spec.rs).

He-Pin · 2026-04-12T16:50:34Z

I think I want to introduce a RangeArr， which will reduce 2 fields for the common Arr, and will submit it later.

@inline

Move lazy range state (_isRange, _rangeFrom) from inline fields in Arr to a dedicated RangeArr subclass. This saves ~9 bytes per non-range Arr instance (boolean + int + alignment padding), benefiting the common case where arrays are not created via std.range. Key changes: - Arr is no longer final; RangeArr extends Arr with range-specific fields - arr and _length visibility widened to private[Val] for subclass access - isConcatView made final for Scala 2.x @inline compatibility - Range-specific branches removed from Arr.value/eval/asLazyArray/reversed - RangeArr overrides value/eval/asLazyArray/reversed with range logic - Arr.range() factory now returns RangeArr instances Upstream: follow-up to databricks#771 (lazy range arrays)

He-Pin marked this pull request as ready for review April 12, 2026 16:12

stephenamar-db approved these changes Apr 12, 2026

View reviewed changes

stephenamar-db merged commit 661b8b2 into databricks:master Apr 12, 2026
4 of 5 checks passed

This was referenced Apr 12, 2026

refactor: extract RangeArr subclass from Arr to reduce memory footprint #772

Closed

performance optimization #666

Open

perf: lazy stdlib initialization with shared members and unsynchronized lookup #769

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: lazy range arrays for O(1) std.range creation#771

perf: lazy range arrays for O(1) std.range creation#771
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/lazy-range-array

He-Pin commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

He-Pin commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

Modification

Benchmark Results

JMH (JVM, Scala 3.3.7)

Hyperfine (Scala Native vs jrsonnet)

Analysis

References

Result

Uh oh!

Uh oh!

He-Pin commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 12, 2026 •

edited

Loading