Skip to content

perf: buffer native stdout writes#869

Open
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/native-stdout-buffering
Open

perf: buffer native stdout writes#869
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/native-stdout-buffering

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 23, 2026

Motivation:

Scala Native kube-prometheus rendering still showed write/output overhead after the renderer and strict JSON import stack. NativeOutputStream already bypasses the JVM-compatible PrintStream path by writing through C fwrite, but stdout still used the platform default stdio buffering.

Key Design Decision:

Keep this optimization Native-only and local to stdout buffering. Instead of changing renderer flush thresholds or ByteBuilder behavior globally, configure the C stdio stream with full buffering before any NativeOutputStream writes occur. Passing a null buffer lets libc own the buffer lifetime, so the Scala object does not need to retain native memory.

Modification:

  • Configure NativeOutputStream with setvbuf(file, null, _IOFBF, 256 KiB) during construction.
  • Leave JVM, JS, YAML, expect-string, and file-output code paths unchanged.
  • Preserve existing explicit flush() behavior for trailing newline and close handling.

Benchmark Results:

Workload: jrsonnet/tests/realworld/entry-kube-prometheus.jsonnet -J vendor

Candidate was benchmarked on the Scala Native 0.5.12 stacked exploration branch against clean cf7b8af9.

Order Clean Candidate Result
Forward mean 218.848 ms 188.528 ms -13.9%
Forward median 215.517 ms 187.368 ms -13.1%
Reverse mean 224.045 ms 183.701 ms -18.0%
Reverse median 224.281 ms 182.914 ms -18.4%

Output equality matched by cmp.

Validation:

  • ./mill --no-server --ticker false --color false __.reformat
  • ./mill --no-server --ticker false --color false -j 1 __.test — 444 passed, 0 failed
  • ./mill --no-server --ticker false --color false bench.runRegressions

Analysis:

This is a lower-risk write/flush optimization than increasing ByteBuilder thresholds: it does not alter rendering order, JSON escaping, object materialization, or JVM/JS behavior. It only changes the buffering policy of the Native stdout FILE*, and explicit flushes still happen at the same public boundaries.

References:

Result:

Native stdout rendering writes fewer/smoother buffered chunks for large JSON output while preserving byte-identical output and the existing flush contract.

Motivation:
Scala Native kube-prometheus rendering still showed write overhead after the renderer/import stack. NativeOutputStream already writes through fwrite, but stdout used the platform default stdio buffer.

Modification:
Configure NativeOutputStream with full C stdio buffering via setvbuf before rendering, using a 256 KiB buffer size.

Result:
Native kube-prometheus A/B on the 0.5.12 stack improved in both orders: forward 218.848 ms clean vs 188.528 ms candidate; reverse 183.701 ms candidate vs 224.045 ms clean. Output matched by cmp; clean PR branch reformat, full tests, and bench regressions passed.
@He-Pin He-Pin marked this pull request as ready for review May 23, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant