Skip to content

perf: direct-to-stdout rendering, bypass StringWriter allocation#746

Closed
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/direct-stdout-native
Closed

perf: direct-to-stdout rendering, bypass StringWriter allocation#746
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/direct-stdout-native

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 11, 2026

Motivation

When outputting to stdout (no --output-file), sjsonnet renders JSON into a StringWriter backed by StringBuffer, calls toString() to create the full output string, then println() to encode and write it. For realistic2 (28.5MB JSON output), this creates ~100MB of intermediate allocations:

  1. StringBuffer growth: 2× growth factor means peak internal buffer is ~57MB for 28.5MB output
  2. toString() copy: Creates a full 28.5MB String copy of the char array
  3. println encoding: Re-encodes the entire 28.5MB String from UTF-16 to UTF-8

Total: 3 traversals of the full output data before it reaches stdout.

Key Design Decision

Stream directly to stdout via BufferedOutputStream (64KB buffer) + OutputStreamWriter, eliminating all intermediate allocations:

BEFORE (3 passes over 28.5MB):
  Renderer → CharBuilder → StringWriter(StringBuffer) → .toString [28MB COPY]
  → println(string) [28MB UTF-16→UTF-8 ENCODE] → stdout

AFTER (single streaming pass, 64KB buffer):
  Renderer → CharBuilder → OutputStreamWriter → BufferedOutputStream(64KB) → stdout
  • Memory: ~64KB buffer vs ~142MB peak (StringBuffer + String + encoding buffer)
  • On error (eval/parse failure), nothing is written — the error occurs before materialization begins
  • Trailing newline handling moved into writeToFile (previously in main0 via println)

Supersedes #680 with a simpler approach: true streaming instead of buffering into CompactByteArrayOutputStream.

Modification

sjsonnet/src-jvm-native/sjsonnet/SjsonnetMainBase.scala:

  • writeToFile: When outputFile is None, renders directly through OutputStreamWriterBufferedOutputStream(stdout, 65536) instead of StringWriter
  • Trailing newline written conditionally on success (matching previous println behavior)
  • On error, flushes any partial content but skips trailing newline
  • Threading stdout: PrintStream through mainConfiguredrenderNormalwriteToFile

Benchmark Results

Environment: Apple M3 Max, macOS 15.4

Scala Native — hyperfine (warmup 3, runs 10, > /dev/null)

Benchmark Master (ms) PR (ms) Δ vs jrsonnet
realistic2 (28.5MB) 257.8 ± 7.7 226.4 ± 2.5 −12.2% 2.21× (was 2.52×)
comparison2 90.0 ± 2.6 63.9 ± 2.3 −29.0%
comparison_primitives 87.1 ± 1.6 65.7 ± 2.9 −24.6%
reverse 38.7 ± 1.6 30.6 ± 2.1 −20.9%
base64DecodeBytes 32.6 ± 2.2 26.8 ± 1.6 −17.8%
comparison 30.4 ± 1.9 31.9 ± 1.5 ~neutral
realistic1 12.5 ± 0.2 12.1 ± 1.0 ~neutral

JMH (JVM, ms/op, lower is better)

Benchmark ms/op
realistic2 58.6 (was ~63.7, −8.1%)
realistic1 1.8
comparison 20.7
comparison2 30.0

Analysis

  • Output-heavy benchmarks improved 12-29% — improvement scales with output size
  • System time dropped significantly (e.g. realistic2: 29.2ms → 16.4ms) due to reduced memory pressure and GC
  • No regression on small-output benchmarks
  • All 46 tests pass; output is byte-identical to master (verified via md5)

References

Result

Streaming stdout rendering eliminates ~100MB intermediate allocations for large outputs. Improves realistic2 by 12.2% on native, with 17-29% improvements on other output-heavy benchmarks. Zero functional change.

Eliminate the intermediate StringWriter/StringBuffer path when rendering
to stdout. Previously, all CLI output was:
  Renderer → CharBuilder → StringWriter(StringBuffer) → .toString → println

For large outputs (e.g. realistic_2 at 28.5MB), this created ~100MB of
intermediate allocations: StringBuffer growth + String copy + encoding.

Now renders directly to BufferedOutputStream wrapping stdout:
  Renderer → CharBuilder → OutputStreamWriter → BufferedOutputStream → stdout

This is a single-pass streaming write with only a 64KB buffer.

Upstream: jit branch commits b09647c (direct-write stdout concept)
@He-Pin He-Pin force-pushed the perf/direct-stdout-native branch from a01dc4f to 902c214 Compare April 11, 2026 20:14
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 11, 2026

Closing: functionality is fully subsumed by PR #745 (byte[] rendering pipeline), which includes the direct-to-stdout rendering from this PR plus additional optimizations (fused materializer, NativeOutputStream for Scala Native, SWAR escape scanning).

@He-Pin He-Pin closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant