Skip to content

benchmarks: add prost bytes-feature variant + MediaFrame for zero-copy comparison#61

Merged
iainmcgin merged 4 commits intomainfrom
feat/prost-bytes-benchmarks
Apr 23, 2026
Merged

benchmarks: add prost bytes-feature variant + MediaFrame for zero-copy comparison#61
iainmcgin merged 4 commits intomainfrom
feat/prost-bytes-benchmarks

Conversation

@iainmcgin
Copy link
Copy Markdown
Collaborator

Adds a benchmark variant that enables prost's bytes::Bytes substitution (prost-build::Config::bytes(["."])) and a bytes-heavy MediaFrame message so the feature has something to work with. Motivated by issue #56 ("Clarify zero-copy semantics") — the commenter correctly pointed out that prost has a similar-in-spirit feature and asked for a fair comparison; this PR puts one in the README.

What's new

  • benchmarks/prost-bytes/ — mirrors benchmarks/prost/ but builds with .bytes(["."]) and decodes from bytes::Bytes input (so Bytes::copy_to_bytes is the ref-count slice, not a copy). Only decode / merge are measured — the substitution does not affect the encode path, so those benches would be redundant with the existing prost crate.
  • bench.MediaFrame — primary bytes body (1–10 KB) + repeated bytes chunks (2–6 × 0.2–2 KB) + map<string, bytes> attachments (0–4 × 50–500 B). Plumbed through gen-datasets, bench-buffa (owned + view + json), bench-prost, bench-prost-bytes.
  • task bench-prost-bytes and task bench-cross-prost-bytes; task bench-cross now runs all three variants and produces benchmarks/results/prost-bytes.json. benchmarks/charts/generate.py gains a prost (bytes) series and a MediaFrame row; the README Performance section is updated.
  • Small cleanup in gen-datasets: random_bytes now uses rng.fill_bytes instead of an elementwise rng.random() loop.

Headline numbers (MiB/s, task bench-cross, Docker)

Binary decode

Message buffa buffa (view) prost prost (bytes)
ApiResponse 862 1,475 756 676
LogRecord 722 1,984 712 676
AnalyticsEvent 199 320 254 194
GoogleMessage1 1,014 1,341 956 931
MediaFrame 16,816 73,004 9,648 23,516

On the four non-bytes messages, prost (bytes) tracks default prost within noise — the substitution only affects proto bytes fields, and these messages have none. On MediaFrame it's ~2.4× default prost, confirming the feature lands when it has bytes fields to work with. buffa (view) stays well ahead of both prost variants because it borrows strings, messages, and map/repeated scaffolding from the input buffer too, not just bytes payloads.

perf stat evidence (MediaFrame decode, native)

Native run with perf stat -e task-clock,cycles,instructions,L1-dcache-load-misses,branch-misses around --measurement-time 6 --sample-size 10; per-decoded-message rates (non-hermetic, ±few-% run-to-run):

variant GiB/s cyc/msg ins/msg L1-miss/msg IPC
prost 9.1 3,929 10,470 113 2.67
buffa owned 16.6 2,101 5,679 91 2.70
prost (bytes) 21.9 1,632 4,625 5 2.83
buffa (view) 69.8 508 1,675 4 3.30

prost (bytes) closes the cache-miss gap (113 → 5 L1 misses/msg — allocator traffic from cloning bytes payloads is gone). It still runs ~2.8× the instructions of views because String::from_utf8 / HashMap::insert / Vec::push work is still happening for the non-bytes fields (frame_id, content_type, map keys, repeated scaffolding). Views skip that scaffolding — strings stay &'a str, attachments is a MapView, chunks is a RepeatedView.

Files touched

  • new benchmarks/prost-bytes/ + benchmarks/Dockerfile.bench-prost-bytes
  • benchmarks/proto/bench_messages.proto — MediaFrame
  • benchmarks/gen-datasets/src/main.rs — gen_media_frame, random_bytes → fill_bytes
  • benchmarks/buffa/benches/protobuf.rs — MediaFrame owned + view + json
  • benchmarks/prost/benches/protobuf.rs — MediaFrame decode + json
  • benchmarks/charts/generate.pyprost (bytes) series, MediaFrame row
  • Cargo.toml, Taskfile.yml, README.md, chart SVGs, tables.md

Notes for reviewers

  • The encode-side numbers for prost (bytes) are deliberately absent (same codepath as default prost — see the module docstring on benchmarks/prost-bytes/src/lib.rs).
  • protobuf-v4 and Go don't have a MediaFrame row — those suites have a fixed set of messages we didn't extend for this experiment. bench-cross ignores missing rows cleanly.
  • gen-datasets' module boilerplate gained #[allow(clippy::enum_variant_names, clippy::upper_case_acronyms, ...)] attributes: on current main the generated log_record::Severity enum variants (UNSPECIFIED/DEBUG/...) and the oneof Value* names trip those lints. The attributes match what's applied to generated code elsewhere in the workspace.

…e message

Adds a bench-prost-bytes crate that builds prost types with
prost-build's .bytes(['.']) substitution (bytes::Bytes for every bytes
field) and decodes from bytes::Bytes input so prost's zero-copy
copy_to_bytes slicing path is actually exercised. Only decode/merge are
measured; the substitution does not affect the encode path.

Introduces a MediaFrame benchmark message (single large bytes body +
repeated bytes chunks + map<string, bytes> attachments) so the new
variant has bytes fields to work with. The existing four messages are
string-heavy and leave the feature inert — MediaFrame exercises it.

Also:
- random_bytes in gen-datasets uses rng.fill_bytes (was elementwise).
- charts/generate.py gains a 'prost (bytes)' series and a MediaFrame
  row; README performance tables and explanatory note are updated.
- Taskfile.yml adds bench-prost-bytes + bench-cross-prost-bytes and
  integrates the new variant into bench-cross.

Throughput on MediaFrame decode (MiB/s, native, 50 payloads, ~11 KB
each): buffa (view) 73,004 / prost (bytes) 23,516 / buffa 16,816 /
prost 9,648. perf-stat evidence in the PR description.
@iainmcgin iainmcgin marked this pull request as ready for review April 22, 2026 23:40
@iainmcgin iainmcgin requested a review from asacamano April 22, 2026 23:40
@@ -6,314 +6,314 @@ czigtasqle xjjjtn-84 *
yfms-sojymtdroogcxeindwlqpekwbdrgxjiasirlyqruqseit����ゞ�
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a generated file?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh whoops yeah let me check if I can deterministically produce these at the start of runs instead of committing them

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually will keep it this way for now, and replace with a pre-generation step via taskfile later

asacamano
asacamano previously approved these changes Apr 23, 2026
MediaFrame's ~70 GiB/s binary-decode throughput compressed the other four
messages' bars into a few pixels on the shared-scale composite charts.
Replace each composite SVG with one chart per (chart-type × message) so
each chart picks its own nice-max, making the smaller throughput
differences readable again.

- charts/generate.py: loop over (chart, message) pairs; drop series with
  no value for the current message so empty bars don't render.
- Delete the four composite SVGs; add 20 per-message SVGs.
- README: list the 5 per-message charts vertically under each section.
Both bench_test.go and google/benches/protobuf.rs had a fixed list of
four messages. Add MediaFrame so the new bytes-heavy dataset is
exercised across all four implementations, not just buffa + prost
variants.

Updated numbers (MiB/s binary decode of MediaFrame): buffa 16,816,
buffa (view) 73,004, prost 9,648, prost (bytes) 23,516, protobuf-v4
17,633, Go 1,241. protobuf-v4's arena allocator lands near buffa
owned on decode but trails on encode (~10 GiB/s vs buffa's 46).

Ancillary drift from a fresh cross-impl run refreshes the other four
messages too.
…ts to GiB/s

- Axis labels use plain integers with commas up to 9,999 — '1.2k' for 1,200
  was hard to read at a glance when '1,200' fit in the same space.
- When a chart's max value exceeds 10 GiB/s (10,240 MiB/s), rescale the
  whole chart to GiB/s so the axis doesn't need 'k' at all. MediaFrame
  binary decode / encode and the buffa (view) LogRecord benches trip this
  — '73.0' GiB/s is cleaner than '73k' or '75,000' MiB/s.
- Bar-inline values follow the same unit as the axis: integer MiB/s with
  thousands-separator commas, or two-decimal GiB/s.
@iainmcgin iainmcgin merged commit a0c668c into main Apr 23, 2026
7 checks passed
@iainmcgin iainmcgin deleted the feat/prost-bytes-benchmarks branch April 23, 2026 03:48
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants