Skip to content

codegen: resolve AIP-192 cross-references to intra-doc links (#26)#94

Merged
iainmcgin merged 3 commits intoanthropics:mainfrom
tejas-dharani:docs/aip192-intra-doc-links
May 5, 2026
Merged

codegen: resolve AIP-192 cross-references to intra-doc links (#26)#94
iainmcgin merged 3 commits intoanthropics:mainfrom
tejas-dharani:docs/aip192-intra-doc-links

Conversation

@tejas-dharani
Copy link
Copy Markdown
Contributor

Closes #26.

Proto comments often reference other types using AIP-192 syntax —
[Book][google.example.v1.Book], [Book][], [Genre.GENRE_SCI_FI] — but
the generated Rust doc comments were emitting escaped literals instead of
rustdoc intra-doc links, making the cross-references unclickable.

Problem

doc_attrs passed comment text through a sanitizer that escaped [ and ]
uniformly. There was no awareness of AIP-192 reference syntax, so
[Book][google.example.v1.Book] became \[Book\]\[google.example.v1.Book\]
in the output — a dead literal rather than a navigable link.

Fix

The sanitizer now recognises [display][ref] and [display][] patterns and
attempts to resolve them against the codegen type map (the same
HashMap<String, String> that maps proto FQNs to Rust paths). Resolution
walks proto lexical scoping: it tries the fully-qualified key first, then
strips scope segments one at a time. A successful hit emits
[display](crate::rust::path); an unresolvable or extern-crate ref falls
back silently to the escaped literal so downstream -D warnings builds are
not broken.

Rust keyword segments in proto package names (e.g. google.type) are escaped
with r# before the path is embedded in the doc string, so
[LatLng](crate::google::r#type::LatLng) is emitted rather than the invalid
crate::google::type::LatLng.

Inline Markdown links [text](url) and backtick code spans are left
untouched. < and > in display text are escaped outside code spans to
satisfy rustdoc::invalid_html_tags.

Call sites in message.rs, enumeration.rs, view.rs, and oneof.rs were
updated to pass scope_fqn and the type map through to the new
doc_attrs_resolved / doc_attrs_with_tag_resolved functions.

Tests

Unit tests in buffa-codegen/src/tests/comments.rs cover fully-qualified
resolution, implied-form [Book][], unresolvable fallback, extern-crate
suppression, keyword-segment escaping, and angle-bracket handling inside
backtick spans. The resolve_type_fqn and resolve_proto_ref helpers have
dedicated unit tests for scope-walking, enum-value member refs, and
crate::-prefixed extern paths.

Regenerated

buffa-types/src/generated/google.protobuf.any.rs and its view variant were
regenerated. The Duration cross-reference in Any's doc comment now
resolves to [google.protobuf.Duration](crate::google::protobuf::Duration).

Doc comments in generated Rust code now translate AIP-192 cross-references
([Book][google.example.v1.Book], [Book][], [Genre.GENRE_SCI_FI]) into rustdoc
intra-doc links. Unknown or extern-crate references fall back silently to
escaped literals. Keyword segments in proto package names (e.g. google.type)
are escaped with r# so the links are valid under -D rustdoc::broken_intra_doc_links.

Closes anthropics#26
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@iainmcgin
Copy link
Copy Markdown
Collaborator

[claude code] Thanks for taking this on - the overall design here is right, and the conservative escape-fallback contract is exactly what #26 asked for. The keyword-segment escaping (google.type -> google::r#type), the backtick-code-span preservation in display text, and the extern-path suppression are all well thought through. cargo test -p buffa-codegen passes locally on the PR head; CI is pending maintainer approval and we'll kick that off shortly.

Two changes before merge, plus some smaller polish items below.

1. Drop Type.member resolution from this PR

The dot-split branch in resolve_proto_ref (comments.rs ~line 653) resolves [label][Type.member] refs to crate::path::Type::member. The problem is that it confirms the type half resolves through the type map, but never verifies the member half, and the member segment is interpolated raw - no keyword escaping, no snake-casing. Three concrete ways this produces a broken intra-doc link where the prior code emitted a safe escaped literal:

  • Keyword field names. A proto field named type is generated as r#type by make_field_ident, so [x][Foo.type] links to crate::pkg::Foo::type, which doesn't exist.
  • Stale or typo'd refs. [x][Book.titel] resolves the type half and emits a link to a nonexistent field.
  • Non-conventional field naming. [x][Book.displayName] links to displayName while the generated field is display_name.

Each of these fails downstream cargo doc builds under -D rustdoc::broken_intra_doc_links - the exact failure mode the rest of this change is engineered to avoid. The escape-on-failure safety net doesn't catch them because resolution succeeds on the type half.

We'd rather land type-level resolution now (the bulk of the value, and verified safe because the type map is authoritative) and revisit member resolution in a follow-up that threads descriptor info through so the member can be existence-checked and routed through make_field_ident. Concretely:

  • Delete the if let Some(dot_pos) = effective_ref.rfind('.') branch in resolve_proto_ref.
  • Remove the test_resolve_proto_ref_enum_value and test_resolve_proto_ref_field unit tests.
  • Adjust the CHANGELOG entry to say type-level refs only.

2. Doubled doc comment on escape_angle_brackets

comments.rs ~lines 519-527 has two consecutive /// paragraphs - the first three lines are a stale earlier draft of the same comment. Drop them.

Smaller items (would take these in this PR if you're willing)

  • scope_fqn should be the enclosing-message FQN, not the element's. message.rs:1228, enumeration.rs:140, oneof.rs:266, and view.rs:369,385 pass field_fqn / value_fqn / oneof_fqn. Per protobuf scoping rules, a comment on a field/oneof/enum-value resolves in the message's lexical scope, not a per-element scope. The extra trailing segment adds one always-failing lookup per ref, and in the rare case of a nested message named the same as the field it could resolve to the wrong target. Pass proto_fqn instead.
  • is_extern_path doc comment. comments.rs ~line 580 says crate::-prefixed paths are "another crate re-exported under this crate's root", but crate:: always denotes the current crate. The actual reason for rejection is that the link is constructed as crate::{p}, so an already-crate::-prefixed type-map value would mangle to crate::crate::.... Worth a one-line clarification.

Optional / advisory (fine to defer)

  • find_ref_link test coverage: [][], [][a], chained [a][b][c], nested-bracket rejection [a[b]c], line ending mid-ref [foo][.
  • Direct unit tests for escape_angle_brackets (currently only exercised transitively), especially the unclosed-backtick branch.
  • is_extern_path independently mirrors a check in context.rs; a shared helper or a cross-checking test would prevent drift.
  • The implied form [google.protobuf.Duration][] renders the full dotted FQN as the link text, which is verbose in rendered docs. Could shorten the display to the trailing segment for the implied form.

- Drop Type.member resolution from resolve_proto_ref; member refs fall
  back to escaped literals until member existence can be verified against
  descriptor info. Removes test_resolve_proto_ref_enum_value and
  test_resolve_proto_ref_field; adds negative test asserting member refs
  return None.
- Fix doubled doc comment on escape_angle_brackets.
- Pass enclosing proto_fqn (not element FQN) as scope_fqn at all call
  sites in message.rs, enumeration.rs, oneof.rs, and view.rs.
- Clarify is_extern_path doc comment with the actual crate::crate::...
  mangling reason; restore context.rs cross-reference.
- Update doc_attrs_resolved and resolve_proto_ref doc comments to
  reflect the enclosing-type scope_fqn contract.
- Update CHANGELOG to say type-level refs only.
@tejas-dharani
Copy link
Copy Markdown
Contributor Author

All items addressed, ready for re-review.

Copy link
Copy Markdown
Collaborator

@iainmcgin iainmcgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude code] All four flagged items addressed in 762dc4dc:

  • Type.member resolution dropped from resolve_proto_ref; the two member-ref tests replaced with a negative test pinning the new behaviour. CHANGELOG scoped to type-level refs only.
  • ✅ Doubled doc comment on escape_angle_brackets fixed.
  • scope_fqn is now the enclosing-type FQN at all four call sites; doc_attrs_resolved/resolve_proto_ref docs updated to state the contract.
  • is_extern_path doc comment now explains the actual crate::crate::... mangling reason.

Verified locally on the PR head: cargo test -p buffa-codegen passes (346 tests), task gen-wkt-types produces no diff, RUSTDOCFLAGS=-D warnings cargo doc -p buffa-types builds clean, clippy clean.

The four advisory items (extra find_ref_link cases, direct escape_angle_brackets tests, shared is_extern_path helper, implied-form display shortening) weren't taken — that's fine, they were marked optional. Happy to see those as follow-ups if you're interested.

Thanks for the quick turnaround — merging.

@iainmcgin iainmcgin merged commit b57a106 into anthropics:main May 5, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Translate AIP-192 proto comment cross-references to rustdoc intra-doc links

2 participants