Skip to content

codegen: string_type knob for configurable string field types (#127)#144

Merged
iainmcgin merged 5 commits into
mainfrom
feat/string-type-codegen
May 22, 2026
Merged

codegen: string_type knob for configurable string field types (#127)#144
iainmcgin merged 5 commits into
mainfrom
feat/string-type-codegen

Conversation

@iainmcgin
Copy link
Copy Markdown
Collaborator

Summary

Implements #127 — the user-facing string_type codegen knob, letting proto string fields map to a small-string-optimized type instead of String.

Stacked on #143 (the runtime support). Review #143 first; this branch contains its commits plus the codegen + build API.

API

use buffa_build::StringRepr;
buffa_build::Config::new()
    .string_type(StringRepr::SmolStr)                              // broad default
    .string_type_in(StringRepr::CompactString, &[".my.pkg.Msg.body"]) // narrow override
    .files(&["proto/my_service.proto"])
    .includes(&["proto/"])
    .compile()?;

StringRepr is String (default), SmolStr, EcoString, or CompactString. Rules are ordered, last-match-wins (call the broad string_type first, then string_type_in overrides). The consumer enables the matching buffa feature (smol_str/ecow/compact_str). Mirrors the existing use_bytes_type machinery.

What changes in generated code

  • Owned struct field type for singular / optional / repeated string fields and oneof string variants.
  • Decode: String keeps the in-place merge_string fast path; other reprs use decode_string_to. clear() resets non-default reprs to Default::default() (the SSO types may be immutable). View→owned builds via From<&str>. Text format and proto2 [default = "..."] honor the repr. EcoString fields get an arbitrary shim.
  • Unchanged: wire format, view types (still &str), and map<_, string> keys/values (always String). Encode/size paths need no change — the SSO types deref-coerce to &str.

Default String output is byte-for-byte identical — verified by regenerating the checked-in WKT and bootstrap descriptor types (the latter is proto2 with string fields, oneofs, defaults, and text format): zero diff.

Testing

cargo test --workspace and clippy --workspace --all-targets -D warnings pass. An end-to-end buffa-test module compiles string_types.proto with a SmolStr default + CompactString/EcoString overrides and covers binary / JSON / text / view round-trips, clear(), field-type pinning across all string shapes (incl. oneof payload), and a proto2 [default] fixture. Built with and without the arbitrary feature.

Note: there is pre-existing repo-wide cargo fmt drift in this environment (every committed file differs under the local rustfmt versions); a canonical cargo fmt pass may be wanted before merge.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Base automatically changed from feat/configurable-string-type to main May 22, 2026 18:22
@iainmcgin iainmcgin force-pushed the feat/string-type-codegen branch from d14341d to aee1adb Compare May 22, 2026 22:35
iainmcgin added 4 commits May 22, 2026 22:37
Add a `StringRepr` knob (String default, plus SmolStr / EcoString /
CompactString) selectable per proto path through buffa_build's
`string_type` / `string_type_in` builder methods, mirroring the existing
`use_bytes_type` machinery.

Codegen:
- StringRepr enum + string_fields config (ordered path-prefix rules, last
  match wins) + a CodeGenContext::string_repr predicate.
- classify_field emits the chosen owned type for singular, optional, and
  repeated string fields. Map keys/values stay String, matching the bytes
  path. Encode/size paths are unchanged — &SmolStr/&EcoString/&CompactString
  deref-coerce to &str.
- Decode routes String through the in-place merge_string fast path and other
  reprs through decode_string_to; clear() resets non-default reprs to default
  (the small-string types may be immutable, like bytes::Bytes); view→owned
  builds via From<&str>; EcoString fields get the arbitrary_ecow shim.

Default String output is byte-for-byte unchanged (verified: regenerating the
checked-in WKT and bootstrap types produces no diff).

Tests: codegen integration assertions for each repr + the arbitrary shim, and
an end-to-end buffa-test module (string_types.proto compiled with a SmolStr
default and CompactString/EcoString overrides) covering binary/JSON/view
round-trips, clear, and the field-type pinning across all string shapes.
Three code paths still emitted String for non-default string representations:

- Oneof string variants: the variant type (and its JSON-deserialize seed type)
  ignored string_fields, so a configured SmolStr/EcoString/CompactString was
  silently dropped back to String. Fixed in oneof.rs and the custom-deserialize
  path in message.rs; EcoString oneof variants now also get the arbitrary shim.
- Text format: read_string()?.into_owned() is String — combined with a
  non-default repr it failed to compile. The text decoder now converts via
  From<String> for singular, optional, repeated, and oneof string fields.
- proto2 [default = "..."]: the default expression was hard-coded to
  String::from, breaking the generated Default impl and clear() for bare
  (required) string fields under a non-default repr.

Default String output is unchanged (verified: regenerating the WKT and
bootstrap descriptor types produces no diff).

Tests: enable generate_text on the string_variant build and add a text
round-trip; pin the oneof payload type; add a proto2 fixture exercising
[default] + string_type. Also: mark StringRepr #[non_exhaustive], document the
map-stays-String contract and the last-match-wins ordering on the builder
methods, and note immutability on the SmolStr/EcoString variants.
smol_str 0.3.4 raised its MSRV to 1.89, above buffa's 1.85, so enabling the
`smol_str` string-type feature broke `cargo check` on the MSRV toolchain
(msrv-check CI). Pin to `>=0.3, <0.3.4` (resolves to 0.3.2, which declares no
MSRV) — it keeps the `serde` and `arbitrary` features the codegen relies on.
Re-pin Cargo.lock accordingly. Relax the cap when buffa's MSRV reaches 1.89.
@iainmcgin iainmcgin force-pushed the feat/string-type-codegen branch from aee1adb to ce14a65 Compare May 22, 2026 22:38
@iainmcgin iainmcgin marked this pull request as ready for review May 22, 2026 22:42
@iainmcgin iainmcgin requested a review from rpb-ant May 22, 2026 22:42
@iainmcgin iainmcgin enabled auto-merge (squash) May 22, 2026 23:55
@iainmcgin iainmcgin merged commit 794c1da into main May 22, 2026
7 checks passed
@iainmcgin iainmcgin deleted the feat/string-type-codegen branch May 22, 2026 23:58
@github-actions github-actions Bot locked and limited conversation to collaborators May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants