perf: pre-cached indent arrays for bulk newline+spaces by He-Pin · Pull Request #676 · databricks/sjsonnet

He-Pin · 2026-04-04T14:23:43Z

Motivation

The JSON renderer generates indentation strings (newline + spaces) on every nested element. For deeply nested Jsonnet output, Renderer.visitKey and visitEnd repeatedly construct identical indent strings. The current implementation calls elemBuilder.append('\n') followed by a while loop appending spaces — this is O(depth) per indent operation.

Key Design Decision

Pre-cache indent strings (newline + spaces) in a companion object array up to depth 64 (MaxCachedDepth). For depths ≤64, indent operations become a single array lookup + bulk write. For depths >64 (rare in practice), fall through to the original loop.

Modification

sjsonnet/src/sjsonnet/Renderer.scala:

Added companion object with MaxCachedDepth = 64 constant and indentCache: Array[Array[Char]]
Cache stores pre-computed "\n" + " " * (depth * indent) as char arrays for depths 0–64
flushBuffer() fast path: when depth ≤ MaxCachedDepth, uses elemBuilder.appendAll(cachedArray, len) instead of character-by-character loop
Original loop preserved as fallback for depths > 64

Benchmark Results

JMH — Full Suite (35 benchmarks, 1+1 warmup)

No regressions detected. All benchmarks within noise margin.

Note

The indentation cache optimization primarily benefits:

Deeply nested JSON output — common in Jsonnet configurations (Kubernetes manifests, CI configs)
std.manifestJsonEx — uses indentation for pretty-printing
Scala Native — no JIT to optimize the loop; pre-cached arrays enable System.arraycopy

Analysis

Memory: One-time allocation of 64 char arrays (total ~2KB) — negligible.
Thread safety: Cache is in a companion object, initialized once. Arrays are read-only after initialization.
Threshold: 64 levels covers virtually all real-world Jsonnet output (even deeply nested Kubernetes manifests rarely exceed 20 levels).

References

upickle.core.CharBuilder.appendAll(char[], int) for bulk writes
Original character-by-character indent loop in Renderer.flushBuffer

Result

Pre-cached indent arrays eliminate per-character overhead for nested JSON rendering. No regressions. Benefits deeply nested output and Scala Native.

sjsonnet/src/sjsonnet/Renderer.scala

He-Pin · 2026-04-09T04:46:32Z

Good catch — extracted the magic 16 into a named constant Renderer.MaxCachedDepth in a new companion object. The comparison now reads depth < MaxCachedDepth instead of depth < indentCache.length.

Note that the indent cache content is instance-specific (depends on the indent constructor parameter — commonly 2, 3, or 4), but the size (16 depth levels) is a fixed constant shared across all instances.

sjsonnet/src/sjsonnet/Renderer.scala

Extract MaxCachedDepth=16 to Renderer companion object constant per review. Pre-compute indentCache arrays for depths 0..15 to replace per-character space emission with a single bulk appendAll in flushBuffer.

He-Pin · 2026-04-10T15:36:38Z

Superseded by #730 which combines this optimization with the other renderer throughput improvements (indent cache + bulk copy + direct long rendering) into a single coherent PR with comprehensive benchmarks.

…rect long rendering) (#730) ## Motivation The materialization/rendering pipeline is the primary bottleneck for large-output workloads. For `realistic2` (28.6 MB output, 568K lines, 125K objects, 380K strings), `--debug-stats` shows 99.8% of wall time is spent in materialization. The previous implementation used per-character loops for indent rendering and intermediate `String` allocation for number formatting, leaving significant throughput on the table. ## Key Design Decisions 1. **Indent cache scope**: Lives in `BaseCharRenderer` (not `Renderer`) so all renderer subclasses (`Renderer`, `MaterializeJsonRenderer`, `PythonRenderer`) benefit automatically. 2. **MaxCachedDepth = 32**: Covers virtually all real-world Jsonnet (realistic2 max depth ~5). Beyond this, falls back to the original per-character loop. 3. **Negative accumulator** in `appendLong`: Handles `Long.MinValue` correctly without overflow (negating `Long.MinValue` overflows `Long`). 4. **Zero-allocation number rendering**: For integer-valued doubles (the common case in Jsonnet), digits are written directly into `CharBuilder` instead of going through `Long.toString` → `String` → char-by-char copy. ## Modifications ### `BaseCharRenderer.scala` - Added companion object with `MaxCachedDepth = 32` - Added `indentCache` field: pre-computed `Array[Array[Char]]` with `newline + indent*d spaces` for each depth level, constructed once at renderer creation - Updated `renderIndent()` to use cached arrays via `appendAll` (single `System.arraycopy`) for depths < 32 - Updated `appendString()` to use `String.getChars` bulk copy instead of char-by-char loop ### `Renderer.scala` - Updated `visitFloat64()` to render integers directly via `RenderUtils.appendLong()` - Updated `flushBuffer()` to use `indentCache` for bulk indent rendering - Added `RenderUtils.appendLong()`: renders `Long` directly into `CharBuilder` using negative accumulator + reverse-in-place algorithm ### `RendererTests.scala` - Added `appendLong` edge case tests: 0, positive, negative, large, `Long.MaxValue`, `Long.MinValue` - Added `visitFloat64Integers` tests for end-to-end integer rendering - Added `indentZero` test for `indent=0` edge case ## Benchmark Results ### JMH (JVM, isolated runs, lower is better) | Benchmark | Before (ms/op) | After (ms/op) | Change | |-----------|----------------|---------------|--------| | **realistic2** | 68.749 | 58.001 | **-15.6%** ✅ | | **reverse** | 10.494 | 8.436 | **-19.6%** ✅ | | gen_big_object | 1.066 | 1.000 | -6.2% ✅ | | bench.02 | 39.832 | 39.322 | -1.3% ≈ | | comparison | 20.216 | 21.060 | +4.2% (noise — eval-only, output is `true`) | | realistic1 | 2.015 | 2.133 | within noise | No regressions across the full 35-benchmark JMH suite. ### Hyperfine (Scala Native, `--warmup 3 --min-runs 10`) **realistic2** (28.6 MB output): | Implementation | Time (ms) | vs jrsonnet | |---|---|---| | sjsonnet-native (master) | 264.9 ± 4.2 | 2.48x slower | | sjsonnet-native (this PR) | 262.2 ± 2.9 | 2.45x slower | | jrsonnet 0.5.0-pre98 | 106.8 ± 16.3 | baseline | **reverse** (large array output): | Implementation | Time (ms) | vs jrsonnet | |---|---|---| | sjsonnet-native (master) | 53.1 ± 2.8 | 2.22x slower | | sjsonnet-native (this PR) | 38.0 ± 2.3 | **1.59x slower** | | jrsonnet 0.5.0-pre98 | 24.0 ± 1.7 | baseline | Gap closed from 2.22x → 1.59x (**-28.4%** improvement). **gen_big_object**: | Implementation | Time (ms) | vs jrsonnet | |---|---|---| | sjsonnet-native (master) | 12.1 ± 1.5 | 1.16x slower | | sjsonnet-native (this PR) | 10.4 ± 1.1 | **1.01x — tied!** | | jrsonnet 0.5.0-pre98 | 10.5 ± 1.3 | baseline | **realistic1**: | Implementation | Time (ms) | vs jrsonnet | |---|---|---| | sjsonnet-native (master) | 12.9 ± 1.4 | — | | sjsonnet-native (this PR) | 12.0 ± 1.4 | **1.15x faster** | | jrsonnet 0.5.0-pre98 | 13.9 ± 2.1 | baseline | sjsonnet already **beats** jrsonnet on realistic1 (1.15x faster). ## Analysis The JVM improvement is larger (15.6% on realistic2) because the JIT compiler was still leaving performance on the table with the char-by-char loops. On Scala Native, LLVM already partially optimizes these loops, so the native improvement is smaller for realistic2 but significant for reverse (28.4%), where the output contains many integer-valued doubles that benefit from the zero-allocation `appendLong` path. The `gen_big_object` benchmark is now **tied with jrsonnet** (10.4ms vs 10.5ms), and `realistic1` beats jrsonnet by 1.15x. ## Result - ✅ All 141 test suites pass (JVM 3.3.7) - ✅ Compiles on all platforms (JVM, JS, Native) - ✅ No regressions across the full benchmark suite - ✅ Comprehensive new test coverage for edge cases This PR supersedes #676 (renderer-indent-cache), #681 (renderer-bulk-append), and #685 (direct-long-rendering) which implemented subsets of these optimizations individually.

He-Pin marked this pull request as ready for review April 5, 2026 00:28

He-Pin mentioned this pull request Apr 5, 2026

performance optimization #666

Open

stephenamar-db requested changes Apr 8, 2026

View reviewed changes

sjsonnet/src/sjsonnet/Renderer.scala Outdated Show resolved Hide resolved

He-Pin force-pushed the perf/renderer-indent-cache branch 5 times, most recently from f2e7618 to 9db668d Compare April 9, 2026 04:46

He-Pin force-pushed the perf/renderer-indent-cache branch 2 times, most recently from 7ec85ce to abfe59a Compare April 9, 2026 15:18

stephenamar-db reviewed Apr 9, 2026

View reviewed changes

sjsonnet/src/sjsonnet/Renderer.scala Outdated Show resolved Hide resolved

He-Pin force-pushed the perf/renderer-indent-cache branch 2 times, most recently from f336323 to 75e9d8e Compare April 10, 2026 03:44

perf: pre-cached indent arrays for bulk newline+spaces

47fe1e6

Extract MaxCachedDepth=16 to Renderer companion object constant per review. Pre-compute indentCache arrays for depths 0..15 to replace per-character space emission with a single bulk appendAll in flushBuffer.

He-Pin force-pushed the perf/renderer-indent-cache branch from 75e9d8e to 47fe1e6 Compare April 10, 2026 09:24

He-Pin mentioned this pull request Apr 10, 2026

perf: renderer throughput optimization (indent cache + bulk copy + direct long rendering) #730

Merged

He-Pin closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: pre-cached indent arrays for bulk newline+spaces#676

perf: pre-cached indent arrays for bulk newline+spaces#676
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/renderer-indent-cache

He-Pin commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

He-Pin commented Apr 9, 2026

Uh oh!

Uh oh!

He-Pin commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

Modification

Benchmark Results

JMH — Full Suite (35 benchmarks, 1+1 warmup)

Note

Analysis

References

Result

Uh oh!

Uh oh!

He-Pin commented Apr 9, 2026

Uh oh!

Uh oh!

He-Pin commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 4, 2026 •

edited

Loading