Skip to content

codegen: emit impl Reflectable for owned messages (bridge mode)#136

Merged
iainmcgin merged 11 commits into
mainfrom
reflect/07-codegen
May 21, 2026
Merged

codegen: emit impl Reflectable for owned messages (bridge mode)#136
iainmcgin merged 11 commits into
mainfrom
reflect/07-codegen

Conversation

@iainmcgin
Copy link
Copy Markdown
Collaborator

What

Adds generate_reflection: bool to CodeGenConfig (default false). When enabled:

  • Each owned message gets impl ::buffa_descriptor::reflect::Reflectable whose reflect() round-trips through DynamicMessage::from_message (encode → decode → ReflectCow::Owned). The impl resolves the message index from the package's lazily-built DescriptorPool by Self::FULL_NAME, with a panic message that names the type when unregistered (the cross-crate-pool case).
  • The __buffa::reflect submodule embeds FILE_DESCRIPTOR_SET_BYTES (the full transitive closure, serialized once per generate() call so build-time CPU does not scale with package count) and a descriptor_pool() accessor backed by std::sync::OnceLock.
  • A package-root pub use __buffa::reflect::descriptor_pool re-export so the accessor is reachable at pkg::descriptor_pool() without routing through the reserved __buffa sentinel.
  • Map-entry synthetic messages are skipped.
  • When gate_impls_on_crate_features is also on, the impls are wrapped in #[cfg(feature = "reflect")].

buffa-build gains .generate_reflection(true); protoc-gen-buffa gains the reflection=true option.

Runtime requirements

The consuming crate must depend on buffa-descriptor with the reflect feature and on std. The buffa-build doc shows the Cargo.toml feature pattern and warns about the silent-no-impl footgun when gate_impls_on_crate_features is on but the consumer crate doesn't declare a reflect feature.

Performance

reflect() is one full encode/decode round-trip plus a heap allocation per call — hold onto the returned handle for repeated field reads. Benchmarks (PR 8) put the bridge round-trip at ~1–4× the cost of a typed decode. The vtable mode (zero-copy reflective access on view types) is a deferred follow-up; the call-site contract is the same either way, so flipping modes later requires no consumer diff.

Verification

  • buffa-test/build.rs enables reflection for basic.proto; tests/reflectable.rs asserts the codegen output works through &dyn ReflectMessage, including the generic-interceptor pattern.
  • Codegen unit tests parse the emitted impls as syn::ItemImpl and round-trip the embedded FDS bytes.
  • All 5 conformance modes pass.

Net change

+496/-6 (9 files).

iainmcgin added 7 commits May 20, 2026 04:15
Adds a #[doc(hidden)] buffa::json_helpers::wkt module with the shared
formatting and parsing primitives for the well-known types' JSON forms:
Timestamp RFC 3339 (fmt_timestamp, parse_timestamp), Duration decimal
seconds (fmt_duration, parse_duration, validate_duration), FieldMask
camelCase (snake_to_camel, camel_to_snake, field_mask_path_round_trips),
and the Howard Hinnant civil-calendar helpers (days_to_date,
date_to_days).

buffa-types' typed serde impls (Timestamp, Duration, FieldMask) now call
into this module rather than carrying their own implementations. The
adapters preserve the Option-returning private API the existing test
suite (~50 call sites) targets, so no test churn.

Sharing the implementation is load-bearing: the conformance suite
exercises the typed JSON path today, and a forthcoming reflective JSON
codec on DynamicMessage will exercise the same forms. A divergence
between the two (one accepting a fractional-second precision the other
rejects, or two civil-calendar implementations disagreeing on a
leap-year edge) would be a user-visible inconsistency. With the
implementation shared, drift is impossible.

The module is #[doc(hidden)] because the supported entry points are the
typed serde impls and (forthcoming) DynamicMessage's JSON codec — these
helpers operate on raw scalars and have no semver contract.
Two changes that lay the foundation for runtime reflection.

Linked descriptor types (buffa-descriptor/src/desc.rs):

MessageDescriptor, FieldDescriptor, FieldKind, SingularKind,
OneofDescriptor, EnumDescriptor, EnumValueDescriptor, ServiceDescriptor,
MethodDescriptor — the processed, feature-resolved form of the raw
FileDescriptorProto tree. Where the raw protos use string type_name
references and unresolved FeatureSet options, these types use pool
indices (MessageIndex, EnumIndex, ServiceIndex) and pre-resolved
edition features (presence, packed, delimited, enum openness).

FieldKind flattens protobuf's orthogonal type x label x map-entry axes
into a single Copy discriminant that maps 1:1 to runtime
representation, the same approach protobuf-es takes with its fieldKind
union.

Fields are private with #[inline] accessor methods, matching the buffa
convention for hand-written API types (SizeCache, UnknownFields, Tag).
Construction is gated to DescriptorPool (forthcoming) — downstream
test fixtures go through DescriptorPool::decode from FDS bytes, so
they don't skip the feature-resolution and validation passes.

Field indices within a message are u16, capping fields-per-message at
65,535. Field numbers stay u32 per the protobuf spec.

Feature resolution dedup:

The shared core (file/message/enum/oneof feature resolution, edition
defaults, FeatureSet merge) moves from buffa-codegen/src/features.rs to
buffa-descriptor/src/features.rs and is re-exported from buffa-codegen.
buffa-codegen retains the codegen-only resolve_field, which overlays
the referenced enum's enum_type from CodeGenContext::is_enum_closed —
a lookup built during codegen and not available to a runtime pool.

A divergence between codegen and the runtime pool would mean generated
code and reflective code disagree on packed encoding, presence, or
enum openness — sharing the implementation makes that impossible.
DescriptorPool builds the linked descriptor types from a
FileDescriptorSet. Construction is three-pass:

1. Register: walk every file, recording the fully-qualified name of
   every message and enum (including nested ones) and assigning each a
   pool index. Forward references and cross-file references resolve in
   pass 2.
2. Link: walk again, building the linked MessageDescriptor for each
   message — resolving type_name strings to indices, classifying fields
   as singular/list/map, resolving editions features down the
   file -> message -> field chain, validating field numbers and the
   u16 field-count cap.
3. Link services: services reference message types by name for their
   input/output, so they link after the type passes.

The pool retains the original FileDescriptorProtos after linking
(file_by_name() accessor) so gRPC server reflection can serve the raw
bytes.

DescriptorPool::decode treats its input as untrusted — it's the entry
point for consumers loading descriptors from a schema registry, gRPC
server reflection peer, or on-disk policy bundle. Malformed input
returns PoolError, never panics: out-of-range field numbers, negative
extension ranges, dangling type names, and unparseable wire bytes are
all handled. The pass-1/pass-2 walk-order invariant is asserted in
release builds because a desync silently corrupts every cross-reference
in the pool.

Behind the new `reflect` feature (default-off). Tests in
tests/pool_e2e.rs against a protoc-compiled FileDescriptorSet
exercising proto3 presence, editions feature resolution, packed
encoding, map entries, oneofs (including synthetic), service
descriptors, idempotent re-add, and wrong-kind/missing lookups.
…ReflectCow

The reflection runtime in buffa-descriptor/src/reflect/. Behind the
`reflect` feature.

Value model (value.rs):
- Value (owned) / ValueRef<'_> (borrowed, ≤32B compile-time-asserted)
  with the wire-level scalar types plus List, Map, Message containers.
- MapValue: a sorted-Vec<(MapKey, Value)> newtype with a sorted/no-
  duplicates invariant. get_str(&str) is allocation-free (binary
  search, no MapKey constructed) — the CEL m["key"] hot path.
  MapValue::new() is const fn so the absent-map default is a real
  static — no leak pattern, no OnceLock, no unsafe.
- ReflectList / ReflectMap traits over the container variants, with
  &dyn trait objects in ValueRef::List/Map. This is the load-bearing
  shape decision: a future vtable-mode `impl ReflectMessage for
  FooView<'a>` holds RepeatedView<'a, T>/MapView<'a, K, V> which
  cannot yield a &[Value] without materializing. Trait objects let
  bridge mode (Vec<Value>/MapValue) and a future view impl share the
  same ValueRef shape. Both &dyn and &[Value] are 16-byte fat
  pointers, so the size budget holds.
- MapKeyRef<'a>: borrowed MapKey with the spec-restricted variant set
  so ReflectMap::for_each consumers match exhaustively over only the
  valid key types.

Trait surface (message.rs):
- ReflectMessage: dyn-safe, storage-agnostic. Accessors take
  &FieldDescriptor, return ValueRef; for_each_set takes &mut dyn FnMut.
  which_oneof() default-implemented mirrors protoreflect.WhichOneof.
- ReflectMessageMut: set/clear with oneof-sibling clearing.
- ReflectCow::{Borrowed,Owned}: clone-on-write reflective handle. Owned
  is boxed to keep ReflectCow at 24 bytes (one fat pointer + tag), the
  budget that keeps ValueRef at 32.
- Reflectable: the codegen entry point. Bridge-mode reflect() round-
  trips through encode/decode; the call site `foo.reflect().get(fd)`
  is the same in vtable mode (deferred).

DynamicMessage (dynamic.rs):
- BTreeMap<u32, Value> + Arc<DescriptorPool> + MessageIndex. Descriptor-
  driven encode/decode preserving unknown fields and field-number
  ordering. Validates wire type before dispatch (mismatch → unknown
  field, per the protobuf spec). Singular message merge, oneof
  last-wins, oneof explicit presence.
- from_message/to_message bridge for generated types.

Tests in tests/dynamic_e2e.rs and buffa-test/tests/reflect_bridge.rs
(generated <-> dynamic round-trip).
Proto3 canonical JSON for DynamicMessage. Behind `reflect` + `json`.

Serialization is `impl serde::Serialize for DynamicMessage` — a
mechanical walk over the descriptor's fields with per-SingularKind
dispatch. json_name and presence are pre-resolved on FieldDescriptor,
so the walk requires no string formatting or feature lookups.
Deserialization needs the descriptor as input, so it's a
DeserializeSeed (DynamicMessageSeed) rather than a Deserialize; the
ergonomic wrapper is DynamicMessage::from_json.

Well-known types are special-cased by full_name and handled by
reflective WKT codecs in json_wkt.rs — they read fields by number
through the DynamicMessage surface rather than bridging through
buffa-types. The shared formatting primitives (Timestamp RFC 3339,
Duration decimal seconds, FieldMask camelCase) live in
buffa::json_helpers::wkt (PR #128) so the typed and reflective JSON
paths can't drift.

google.protobuf.Any resolves the inner @type against the same pool
and recurses — message types spread their fields alongside @type, WKT
inner types wrap their canonical form in a "value" key.

Strict-validation cases the spec requires (Timestamp/Duration bounds,
oneof duplicate detection, FieldMask round-trip safety, Value
NaN/Infinity rejection, integer/float overflow, lowercase RFC 3339
separators) are all enforced. Implicit-presence default omission is
respected: a field set to its type's default value is not serialized.
Routes binary and JSON conformance test cases through DynamicMessage
to validate the reflective codec against the protobuf conformance
corpus, independently of the generated typed codec.

build.rs gains emit_reflect_fds() which produces a FileDescriptorSet
from the conformance test message protos so the runner can build a
DescriptorPool. The runner builds the pool once via OnceLock and
dispatches the four input/output combinations:
- binary -> DynamicMessage::decode -> encode -> binary
- binary -> decode -> to_json -> JSON
- JSON -> from_json -> encode -> binary
- JSON -> from_json -> to_json -> JSON

Result: 2764 successes, 7 expected failures, 0 unexpected failures
across proto2/proto3/editions 2023. Expected failures triaged in
known_failures_reflect.txt:
- 6 Recommended duplicate-JSON-key tests: serde_json keeps the last
  value silently and the visitor never sees the first; the typed JSON
  path has the same limitation.
- 1 Recommended proto2 extension JSON key test: extensions are a
  phase-4 reflection deliverable.

The text-format suite is skipped (DynamicMessage has no TextFormat).
Adds generate_reflection: bool to CodeGenConfig (default false). When
enabled:

- Each owned message gets impl ::buffa_descriptor::reflect::Reflectable
  whose reflect() round-trips through DynamicMessage::from_message
  (encode -> decode -> ReflectCow::Owned). The impl resolves the
  message index from the package's lazily-built DescriptorPool by
  Self::FULL_NAME, with a panic message that names the type when
  unregistered (the cross-crate-pool case).
- The __buffa::reflect submodule embeds FILE_DESCRIPTOR_SET_BYTES
  (the full transitive closure, serialized once per generate() call so
  build-time CPU does not scale with package count) and a
  descriptor_pool() accessor backed by std::sync::OnceLock.
- A package-root `pub use __buffa::reflect::descriptor_pool` re-export
  so the accessor is reachable at pkg::descriptor_pool() without
  routing through the reserved __buffa sentinel.
- Map-entry synthetic messages are skipped (not registered in the pool
  by name and never reflected over directly).
- When gate_impls_on_crate_features is also on, the impls are wrapped
  in #[cfg(feature = "reflect")].

Runtime requirements: buffa-descriptor with the `reflect` feature, std.
The buffa-build doc shows the Cargo.toml feature pattern and warns
about the silent-no-impl footgun when gate_impls_on_crate_features is
on but the consumer crate does not declare a `reflect` feature.

buffa-build gains .generate_reflection(true); protoc-gen-buffa gains
the reflection=true option. buffa-test/build.rs enables reflection for
basic.proto, and buffa-test now deps on buffa-descriptor as a regular
dependency. tests/reflectable.rs asserts against the codegen output.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Base automatically changed from reflect/06-conformance to main May 20, 2026 23:54
@iainmcgin iainmcgin marked this pull request as ready for review May 20, 2026 23:54
@iainmcgin iainmcgin requested a review from azdagron May 21, 2026 00:17
@iainmcgin iainmcgin merged commit fe49a9d into main May 21, 2026
7 checks passed
@iainmcgin iainmcgin deleted the reflect/07-codegen branch May 21, 2026 00:18
@github-actions github-actions Bot locked and limited conversation to collaborators May 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants