benchmarks: add prost bytes-feature variant + MediaFrame for zero-copy comparison by iainmcgin · Pull Request #61 · anthropics/buffa

iainmcgin · 2026-04-22T23:15:29Z

Adds a benchmark variant that enables prost's bytes::Bytes substitution (prost-build::Config::bytes(["."])) and a bytes-heavy MediaFrame message so the feature has something to work with. Motivated by issue #56 ("Clarify zero-copy semantics") — the commenter correctly pointed out that prost has a similar-in-spirit feature and asked for a fair comparison; this PR puts one in the README.

What's new

benchmarks/prost-bytes/ — mirrors benchmarks/prost/ but builds with .bytes(["."]) and decodes from bytes::Bytes input (so Bytes::copy_to_bytes is the ref-count slice, not a copy). Only decode / merge are measured — the substitution does not affect the encode path, so those benches would be redundant with the existing prost crate.
bench.MediaFrame — primary bytes body (1–10 KB) + repeated bytes chunks (2–6 × 0.2–2 KB) + map<string, bytes> attachments (0–4 × 50–500 B). Plumbed through gen-datasets, bench-buffa (owned + view + json), bench-prost, bench-prost-bytes.
task bench-prost-bytes and task bench-cross-prost-bytes; task bench-cross now runs all three variants and produces benchmarks/results/prost-bytes.json. benchmarks/charts/generate.py gains a prost (bytes) series and a MediaFrame row; the README Performance section is updated.
Small cleanup in gen-datasets: random_bytes now uses rng.fill_bytes instead of an elementwise rng.random() loop.

Headline numbers (MiB/s, `task bench-cross`, Docker)

Binary decode

Message	buffa	buffa (view)	prost	prost (bytes)
ApiResponse	862	1,475	756	676
LogRecord	722	1,984	712	676
AnalyticsEvent	199	320	254	194
GoogleMessage1	1,014	1,341	956	931
MediaFrame	16,816	73,004	9,648	23,516

On the four non-bytes messages, prost (bytes) tracks default prost within noise — the substitution only affects proto bytes fields, and these messages have none. On MediaFrame it's ~2.4× default prost, confirming the feature lands when it has bytes fields to work with. buffa (view) stays well ahead of both prost variants because it borrows strings, messages, and map/repeated scaffolding from the input buffer too, not just bytes payloads.

`perf stat` evidence (MediaFrame decode, native)

Native run with perf stat -e task-clock,cycles,instructions,L1-dcache-load-misses,branch-misses around --measurement-time 6 --sample-size 10; per-decoded-message rates (non-hermetic, ±few-% run-to-run):

variant	GiB/s	cyc/msg	ins/msg	L1-miss/msg	IPC
prost	9.1	3,929	10,470	113	2.67
buffa owned	16.6	2,101	5,679	91	2.70
prost (bytes)	21.9	1,632	4,625	5	2.83
buffa (view)	69.8	508	1,675	4	3.30

prost (bytes) closes the cache-miss gap (113 → 5 L1 misses/msg — allocator traffic from cloning bytes payloads is gone). It still runs ~2.8× the instructions of views because String::from_utf8 / HashMap::insert / Vec::push work is still happening for the non-bytes fields (frame_id, content_type, map keys, repeated scaffolding). Views skip that scaffolding — strings stay &'a str, attachments is a MapView, chunks is a RepeatedView.

Files touched

new benchmarks/prost-bytes/ + benchmarks/Dockerfile.bench-prost-bytes
benchmarks/proto/bench_messages.proto — MediaFrame
benchmarks/gen-datasets/src/main.rs — gen_media_frame, random_bytes → fill_bytes
benchmarks/buffa/benches/protobuf.rs — MediaFrame owned + view + json
benchmarks/prost/benches/protobuf.rs — MediaFrame decode + json
benchmarks/charts/generate.py — prost (bytes) series, MediaFrame row
Cargo.toml, Taskfile.yml, README.md, chart SVGs, tables.md

Notes for reviewers

The encode-side numbers for prost (bytes) are deliberately absent (same codepath as default prost — see the module docstring on benchmarks/prost-bytes/src/lib.rs).
protobuf-v4 and Go don't have a MediaFrame row — those suites have a fixed set of messages we didn't extend for this experiment. bench-cross ignores missing rows cleanly.
gen-datasets' module boilerplate gained #[allow(clippy::enum_variant_names, clippy::upper_case_acronyms, ...)] attributes: on current main the generated log_record::Severity enum variants (UNSPECIFIED/DEBUG/...) and the oneof Value* names trip those lints. The attributes match what's applied to generated code elsewhere in the workspace.

…e message Adds a bench-prost-bytes crate that builds prost types with prost-build's .bytes(['.']) substitution (bytes::Bytes for every bytes field) and decodes from bytes::Bytes input so prost's zero-copy copy_to_bytes slicing path is actually exercised. Only decode/merge are measured; the substitution does not affect the encode path. Introduces a MediaFrame benchmark message (single large bytes body + repeated bytes chunks + map<string, bytes> attachments) so the new variant has bytes fields to work with. The existing four messages are string-heavy and leave the feature inert — MediaFrame exercises it. Also: - random_bytes in gen-datasets uses rng.fill_bytes (was elementwise). - charts/generate.py gains a 'prost (bytes)' series and a MediaFrame row; README performance tables and explanatory note are updated. - Taskfile.yml adds bench-prost-bytes + bench-cross-prost-bytes and integrates the new variant into bench-cross. Throughput on MediaFrame decode (MiB/s, native, 50 payloads, ~11 KB each): buffa (view) 73,004 / prost (bytes) 23,516 / buffa 16,816 / prost 9,648. perf-stat evidence in the PR description.

asacamano · 2026-04-23T00:09:31Z

@@ -6,314 +6,314 @@ czigtasqle	xjjjtn-84 *
 yfms-sojymtdroogcxeindwlqpekwbdrgxjiasirlyqruqseit����ゞ�


Is this a generated file?

oh whoops yeah let me check if I can deterministically produce these at the start of runs instead of committing them

actually will keep it this way for now, and replace with a pre-generation step via taskfile later

MediaFrame's ~70 GiB/s binary-decode throughput compressed the other four messages' bars into a few pixels on the shared-scale composite charts. Replace each composite SVG with one chart per (chart-type × message) so each chart picks its own nice-max, making the smaller throughput differences readable again. - charts/generate.py: loop over (chart, message) pairs; drop series with no value for the current message so empty bars don't render. - Delete the four composite SVGs; add 20 per-message SVGs. - README: list the 5 per-message charts vertically under each section.

Both bench_test.go and google/benches/protobuf.rs had a fixed list of four messages. Add MediaFrame so the new bytes-heavy dataset is exercised across all four implementations, not just buffa + prost variants. Updated numbers (MiB/s binary decode of MediaFrame): buffa 16,816, buffa (view) 73,004, prost 9,648, prost (bytes) 23,516, protobuf-v4 17,633, Go 1,241. protobuf-v4's arena allocator lands near buffa owned on decode but trails on encode (~10 GiB/s vs buffa's 46). Ancillary drift from a fresh cross-impl run refreshes the other four messages too.

…ts to GiB/s - Axis labels use plain integers with commas up to 9,999 — '1.2k' for 1,200 was hard to read at a glance when '1,200' fit in the same space. - When a chart's max value exceeds 10 GiB/s (10,240 MiB/s), rescale the whole chart to GiB/s so the axis doesn't need 'k' at all. MediaFrame binary decode / encode and the buffa (view) LogRecord benches trip this — '73.0' GiB/s is cleaner than '73k' or '75,000' MiB/s. - Bar-inline values follow the same unit as the axis: integer MiB/s with thousands-separator commas, or two-decimal GiB/s.

iainmcgin mentioned this pull request Apr 22, 2026

Clarify zero copy semantics #56

Open

iainmcgin marked this pull request as ready for review April 22, 2026 23:40

iainmcgin requested a review from asacamano April 22, 2026 23:40

asacamano reviewed Apr 23, 2026

View reviewed changes

asacamano previously approved these changes Apr 23, 2026

View reviewed changes

iainmcgin dismissed asacamano’s stale review via 75faac8 April 23, 2026 03:14

iainmcgin added 2 commits April 23, 2026 03:26

iainmcgin merged commit a0c668c into main Apr 23, 2026
7 checks passed

iainmcgin deleted the feat/prost-bytes-benchmarks branch April 23, 2026 03:48

github-actions Bot locked and limited conversation to collaborators Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks: add prost bytes-feature variant + MediaFrame for zero-copy comparison#61

benchmarks: add prost bytes-feature variant + MediaFrame for zero-copy comparison#61
iainmcgin merged 4 commits intomainfrom
feat/prost-bytes-benchmarks

iainmcgin commented Apr 22, 2026

Uh oh!

asacamano Apr 23, 2026

Uh oh!

iainmcgin Apr 23, 2026

Uh oh!

iainmcgin Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -6,314 +6,314 @@ czigtasqle xjjjtn-84 *
		yfms-sojymtdroogcxeindwlqpekwbdrgxjiasirlyqruqseit��ゞ�

Conversation

iainmcgin commented Apr 22, 2026

What's new

Headline numbers (MiB/s, task bench-cross, Docker)

Binary decode

perf stat evidence (MediaFrame decode, native)

Files touched

Notes for reviewers

Uh oh!

asacamano Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

iainmcgin Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

iainmcgin Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Headline numbers (MiB/s, `task bench-cross`, Docker)

`perf stat` evidence (MediaFrame decode, native)