perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path by He-Pin · Pull Request #686 · databricks/sjsonnet

He-Pin · 2026-04-05T04:25:27Z

Motivation

Array comprehensions like [x+y for x in arr for y in arr if x==y] previously collected all valid scopes into an intermediate Array[ValScope], then mapped body evaluation over them. For nested comprehensions, this allocates O(n²) intermediate scopes even when only O(n) results survive filtering. Additionally, the body evaluation dispatches through visitExpr for every element, which has significant overhead for simple expressions like BinaryOp(ValidId, ValidId).

Key Design Decisions

Fused scope+eval: Instead of two passes (collect scopes → map body), visitCompFused directly appends body results during scope traversal, eliminating intermediate array allocation.
BinaryOp(ValidId,ValidId) fast path: For the innermost ForSpec with a binary-op body of two variable references, inline scope lookups and numeric dispatch to avoid 3× visitExpr overhead per iteration. Falls back to general visitExpr for non-numeric types — no code duplication.
Eager evaluation: The fast path evaluates eagerly (not lazy). Both go-jsonnet and jrsonnet evaluate comprehensions eagerly, and eagerness is required for safe mutable scope reuse.
Lean implementation: Unlike the original version, this eliminates the visitBinaryOpValues fallback method (~60 lines), reducing code addition from ~225 to ~200 lines. This avoids native binary size inflation that caused instruction cache regression.

Modification

sjsonnet/src/sjsonnet/Evaluator.scala:

Replace visitComp(Comp) with fused version using ArrayBuilder
Add visitCompFused — recursive fused scope+eval loop
Add evalBinaryOpNumNum — @switch-dispatched Num×Num fast path covering arithmetic, comparison, bitwise, and shift ops
Op guard filters out OP_in, OP_&&, OP_|| from numeric fast path (these need type dispatch / short-circuit semantics)
Overflow checks match existing evaluator: none for OP_+, isInfinite check for OP_-, OP_*, OP_/

sjsonnet/test/resources/new_test_suite/comprehension_binop_types.jsonnet:

Regression test covering all binary operators in comprehensions: string concat, numeric arithmetic, comparison, bitwise, string formatting, array concat, in operator

Benchmark Results

JMH (35 benchmarks, 0 regressions)

Benchmark	Master (ms/op)	This PR (ms/op)	Change
comparison2	74.373	37.018	-50.2%
realistic1	3.014	2.794	-7.3%
realistic2	71.861	67.872	-5.5%
large_string_template	2.765	2.538	-8.2%
large_string_join	2.408	2.310	-4.1%
bench.03	13.634	13.526	-0.8%
bench.04	33.975	33.500	-1.4%
reverse	11.032	10.654	-3.4%
bench.02	45.790	45.754	~0%

No significant regressions across all 35 benchmarks.

Scala Native Hyperfine (vs jrsonnet 0.4.2)

Benchmark	Master	This PR	jrsonnet	vs Master	vs jrsonnet
comparison2	173.1 ms	81.0 ms	250.1 ms	2.14× faster	3.09× faster
realistic1 (wall)	14.4 ms	17.7 ms	16.1 ms	+23%¹	+10%
realistic1 (user)	10.5 ms	10.3 ms	13.4 ms	-2%	1.30× faster
realistic2	312.3 ms	318.4 ms	629.6 ms	+2%²	1.97× faster

¹ realistic1 wall time increase is startup/binary-loading overhead, not computation — user time is unchanged (10.3 vs 10.5 ms)
² realistic2 +2% is within noise range

Analysis

The 50% improvement on comparison2 comes from two complementary optimizations:

Structural (fused scope+eval): eliminates O(n²) intermediate scope array for [x+y for x in arr for y in arr if x==y] with n=5000 — contributes ~6%
BinaryOp inlining: for the 5001 body evaluations that survive the if x==y filter, inline scope lookups + @switch numeric dispatch avoids 3× visitExpr overhead — contributes ~44%

The lean implementation avoids the instruction cache regression seen with the original version (which added ~225 lines including a duplicated visitBinaryOpValues method), keeping realistic1 user time flat.

References

Upstream: jit branch commits 3466461a (fuse scope+eval) + 71545ba8 (inline BinaryOp)

Result

Array comprehension evaluation is 2-3× faster for comprehension-heavy workloads, with no regressions on other benchmarks. sjsonnet native now beats jrsonnet by 3.09× on comparison2.

Fuse comprehension scope building with body evaluation, eliminating intermediate scope array allocation. For nested comprehensions like [x+y for x in arr for y in arr if x==y], this avoids allocating O(n²) intermediate scopes — only the O(n) matching results are materialized. When the innermost body is BinaryOp(ValidId,ValidId), inline scope lookups and numeric binary-op dispatch to avoid 3× visitExpr overhead per iteration. Falls back to general visitExpr for non-numeric types. Key changes: - visitCompFused: recursive fused scope+eval loop with ArrayBuilder - evalBinaryOpNumNum: @switch-dispatched Num×Num fast path - Non-numeric fallback uses existing visitExpr (no code duplication) Upstream: jit branch commits 3466461 (fuse) + 71545ba (inline)

He-Pin mentioned this pull request Apr 5, 2026

performance optimization #666

Open

He-Pin marked this pull request as ready for review April 5, 2026 09:44

He-Pin mentioned this pull request Apr 5, 2026

perf: optimize std.range allocation and add staticNull singleton #669

Open

He-Pin force-pushed the perf/comprehension-binop-inline branch from 9b5caef to 62c6ef6 Compare April 6, 2026 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path#686

perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path#686
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/comprehension-binop-inline

He-Pin commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decisions

Modification

Benchmark Results

JMH (35 benchmarks, 0 regressions)

Scala Native Hyperfine (vs jrsonnet 0.4.2)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Apr 5, 2026 •

edited

Loading