Skip to content

chore(tools): Fix rustdoc page parsing in web fetch tool#643

Merged
JeanMertz merged 1 commit into
mainfrom
prr214
May 16, 2026
Merged

chore(tools): Fix rustdoc page parsing in web fetch tool#643
JeanMertz merged 1 commit into
mainfrom
prr214

Conversation

@JeanMertz
Copy link
Copy Markdown
Collaborator

The HTML section extractor had two failure modes when used against rustdoc-generated pages (docs.rs or local cargo doc output).

First, the section listing was polluted with page-chrome IDs (rustdoc_body_wrapper, rustdoc-toc, rustdoc-modnav, main-content). These were picked up by the ancestor-id fallback in resolve_heading_id because rustdoc wraps the whole page in elements carrying those IDs. A new RUSTDOC_SCAFFOLDING_IDS filter drops them from the listing while still allowing explicit sections=[...] requests to target main-content on non-rustdoc pages.

Second, method and variant documentation was never returned. Rustdoc places the signature heading inside <section id="method.X"> and puts the prose in a sibling <div class="docblock"> outside the section — optionally further wrapped in a <details> toggle. The extractor was walking the section's own children, finding nothing, and returning an empty string. Two new helpers, extract_rustdoc_section and preview_from_section_docblock, resolve the sibling docblock in both the flat and toggle layouts. extract_section_html_from_doc now short-circuits into this path for any <section> target, and the preview walk gains a matching fallback so listing previews agree with extracted bodies.

The HTML section extractor had two failure modes when used against
rustdoc-generated pages (docs.rs or local `cargo doc` output).

First, the section listing was polluted with page-chrome IDs
(`rustdoc_body_wrapper`, `rustdoc-toc`, `rustdoc-modnav`,
`main-content`). These were picked up by the ancestor-id fallback in
`resolve_heading_id` because rustdoc wraps the whole page in elements
carrying those IDs. A new `RUSTDOC_SCAFFOLDING_IDS` filter drops them
from the listing while still allowing explicit `sections=[...]` requests
to target `main-content` on non-rustdoc pages.

Second, method and variant documentation was never returned. Rustdoc
places the signature heading inside `<section id="method.X">` and puts
the prose in a sibling `<div class="docblock">` outside the section —
optionally further wrapped in a `<details>` toggle. The extractor was
walking the section's own children, finding nothing, and returning an
empty string. Two new helpers, `extract_rustdoc_section` and
`preview_from_section_docblock`, resolve the sibling docblock in both
the flat and toggle layouts. `extract_section_html_from_doc` now
short-circuits into this path for any `<section>` target, and the
preview walk gains a matching fallback so listing previews agree with
extracted bodies.

Signed-off-by: Jean Mertz <git@jeanmertz.com>
@JeanMertz JeanMertz merged commit 6cc7b31 into main May 16, 2026
14 checks passed
@JeanMertz JeanMertz deleted the prr214 branch May 16, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant