Skip to content

perf: lazy range arrays for O(1) std.range creation#771

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/lazy-range-array
Apr 12, 2026
Merged

perf: lazy range arrays for O(1) std.range creation#771
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/lazy-range-array

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation

std.range(from, to) eagerly allocates an Array[Eval] of Val.Num objects. For large ranges like std.range(1, 1000000), this creates 1M objects upfront even when only a subset of elements are ever accessed. This is particularly wasteful in patterns like array comparison where two large ranges share a common prefix and only the tail elements differ.

Key Design Decision

Encode lazy range state directly in Val.Arr using a boolean _isRange flag and _rangeFrom integer field. When isRange is true, arr is null and elements are computed on demand via Val.cachedNum(pos, _rangeFrom + i). This avoids introducing a new subclass and keeps the hot-path value(i) dispatch to a single boolean check.

A separate _isRange boolean flag is used instead of a sentinel value (e.g., Int.MinValue) to avoid collisions with valid range start values.

Reversed ranges store the high end as _rangeFrom and count down (_rangeFrom - i), with correct double-reverse handling that restores the original forward range.

Modification

  • Val.scala: Add _isRange boolean + _rangeFrom field to Val.Arr. Update value(i), eval(i), asLazyArray, reversed() to handle range state. Add materializeRange() for bulk access. Add Arr.range() factory.
  • ArrayModule.scala: Replace eager Array[Eval] allocation in std.range with Val.Arr.range(pos, from, size) for O(1) creation.
  • New test: lazy_range_correctness.jsonnet — covers iteration, concat+comparison, slicing, sort, reverse, double-reverse, member, map, foldl, empty ranges, large ranges.

Benchmark Results

JMH (JVM, Scala 3.3.7)

Benchmark Before (ms/op) After (ms/op) Change
comparison 16.204 0.028 579x faster 🔥
reverse 7.033 6.823 3% faster ✅
bench.06 0.226 0.218 4% faster ✅
setUnion 0.638 0.623 No regression ✅
gen_big_object 1.122 0.973 No regression ✅
All others No regression ✅

Hyperfine (Scala Native vs jrsonnet)

Benchmark sjsonnet (ms) jrsonnet (ms) Ratio
comparison 6.5 ± 1.4 13.4 ± 1.9 sjsonnet 2.07x faster
reverse ~1.04x faster (no regression)
bench.06 ~6.8 ~5.3 jrsonnet 1.29x (startup-dominated, unchanged)

Previous: comparison was 1.36x slower than jrsonnet → Now 2.07x faster 🎉

Analysis

The comparison benchmark (std.range(1, 1000000) + [1] < std.range(1, 1000000) + [2]) previously allocated 2M Val.Num objects upfront, then compared element by element. With lazy ranges:

  1. O(1) range creation — only stores start + length, no allocation
  2. O(1) concatConcatView wraps range + singleton without copying
  3. O(1) comparisonsharedConcatPrefixLength detects shared range prefix (same object reference), skips to the differing tail element

This turns an O(n) operation into O(1) end-to-end.

No regressions detected across all 35+ JMH benchmarks and full test suite (420 tests × 3 Scala versions × 4 platforms).

References

The same optimization lives in jrsonnet and scala to too.

Result

All tests pass (./mill __.test). Comparison benchmark flipped from 1.36x slower to 2.07x faster than jrsonnet.

Replace eager Array[Eval] allocation in std.range with a lazy
representation that stores only the start value and length. Elements
are computed on demand via Val.cachedNum, making std.range(1, 1000000)
O(1) instead of O(n).

Key changes:
- Add _isRange boolean flag and _rangeFrom field to Val.Arr
- Val.Arr.range() factory creates lazy ranges without allocation
- value(i)/eval(i) compute range elements on the fly
- reversed() creates reversed ranges without materialization
- materializeRange() converts to flat array when bulk access needed
- Double-reverse correctly restores original range direction

The comparison benchmark (std.range(1,1M) + [1] < std.range(1,1M) + [2])
improves from 16.2ms to 0.028ms (579x faster) on JMH, and sjsonnet
native is now 2x faster than jrsonnet on this benchmark.

Inspired by jrsonnet's RangeArray (arr/spec.rs).
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 16:12
@stephenamar-db stephenamar-db merged commit 661b8b2 into databricks:master Apr 12, 2026
4 of 5 checks passed
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 12, 2026

I think I want to introduce a RangeArr, which will reduce 2 fields for the common Arr, and will submit it later.

He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Apr 12, 2026
Move lazy range state (_isRange, _rangeFrom) from inline fields in Arr
to a dedicated RangeArr subclass. This saves ~9 bytes per non-range Arr
instance (boolean + int + alignment padding), benefiting the common case
where arrays are not created via std.range.

Key changes:
- Arr is no longer final; RangeArr extends Arr with range-specific fields
- arr and _length visibility widened to private[Val] for subclass access
- isConcatView made final for Scala 2.x @inline compatibility
- Range-specific branches removed from Arr.value/eval/asLazyArray/reversed
- RangeArr overrides value/eval/asLazyArray/reversed with range logic
- Arr.range() factory now returns RangeArr instances

Upstream: follow-up to databricks#771 (lazy range arrays)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants