perf: render escaped byte strings in one pass#845
Conversation
Motivation: Long JSON strings that contain escape characters used to scan the UTF-8 byte array twice in ByteRenderer: once to find escapes and once to pre-compute the exact escaped length. The large_string_template benchmark spends visible time in this path. Modification: Render escaped long strings with one copy/escape pass, growing ByteBuilder incrementally and refreshing its backing array after capacity checks. Add a regression test that compares ByteRenderer output with the char Renderer for long escaped strings including two-byte escapes, six-byte control escapes, and a trailing plain tail. Result: The large_string_template JMH target improves from roughly 0.70-0.72 ms/op to roughly 0.58-0.65 ms/op in local runs, while full tests and formatting checks remain green. References: bench/resources/cpp_suite/large_string_template.jsonnet
|
Closing after review — the change reverses a deliberate design decision from #809 and the benefit is too narrow to justify it. Why this contradicts #809PR #809 explicitly stated:
This PR puts those per-iteration Benchmark numbers don't hold up
The only positive signal is a single JVM workload, and even there the variance widens significantly. Native shows no movement and allocation is flat — both consistent with "one fewer SWAR pass over warm cache" rather than a structural improvement. Untested regression risk: dense-escape workloads`large_string_template.jsonnet` is a sparse-escape workload (one `\n` per ~70 bytes of plain text). The conservative initial buffer `bLen + 2 + (bLen >>> 5)` (~3% slack) is fine for that shape, but for dense `\u00XX` control-character escaping (6 bytes out per byte in):
No benchmark in this PR exercises that case, so the reported win could turn into a regression on real-world strings with many control characters. Better direction if we revisit thisIf we want to keep one pass without giving up on the precompute, the right shape is to make `findFirstEscapeChar` return both the position and an extra-bytes counter in a single SWAR pass — preserving #809's "reserve once, no per-iter ensureLength" property without needing a second scan. That's strictly better than either current master or this PR. Closing for now. Reopen-worthy with that shape, or with a benchmark suite that includes a dense-escape case. |
Motivation:
large_string_templaterenders a ~550 KB JSON string with many escaped newlines. ByteRenderer's long-string path already finds the first escaped byte, but the escaped branch then rescanned the rest of the byte array to compute the exact output length before doing the actual copy/escape pass.Key Design Decision:
Avoid the second escape scan and keep the existing byte-oriented renderer path. The renderer now reserves a conservative initial buffer size, updates
ByteBuilder.lengthbefore capacity checks, and refreshes the backing byte array after each possible growth.Modification:
BaseByteRenderer.visitLongString.findFirstEscapeChar.\u00XXcontrol escaping, and trailing plain text.Benchmark Results:
JMH, JVM 21, single-threaded
bench.runRegressionsonbench/resources/cpp_suite/large_string_template.jsonnet:large_string_templaterepeat_formatguardlarge_string_joinguardJMH GC profile on
large_string_template:Scala Native hyperfine, 10 runs,
large_string_template:ca61a7a3jrsonnet upstream benchmark document lists this same case as Rust 2.1 ms and Scala Native 14.5 ms on its machine; local absolute numbers differ, but the remaining local gap is still about 2.1x in favor of jrsonnet.
Analysis:
Stack profiling showed
BaseByteRenderer.visitLongStringand escape scanning on the hot path, while lazy format concatenation itself did not show enough top-stack weight. A rejected side experiment that marked formatted strings as ASCII-safe was incorrect for newline-containing JSON strings; a correct ASCII-only char loop was much slower. The retained change preserves the existing UTF-8 byte path and only removes duplicated escape scanning.Validation:
./mill --no-server --ticker false --color false -j 1 'sjsonnet.jvm[3.3.7]'.test.testOnly sjsonnet.RendererTestspassed../mill --no-server --ticker false --color false -j 1 'sjsonnet.jvm[3.3.7]'.testpassed../mill --no-server --ticker false --color false -j 1 __.testpassed../mill --no-server --ticker false --color false -j 1 __.checkFormatpassed../mill --no-server --ticker false --color false -j 1 __.reformatwas run before commit../mill --no-server --ticker false --color false -j 1 'sjsonnet.native[3.3.7]'.nativeLinkpassed for both baseline and this branch.References:
bench/resources/cpp_suite/large_string_template.jsonnet.jrsonnet/docs/benchmarks.adoc, section "Large string template".Result:
Long escaped JSON strings render with one escape/copy pass in ByteRenderer. JVM JMH improves on the target workload; Scala Native is slightly positive but within hyperfine noise, so this remains draft for review.