Skip to content

chore(tools): Add markdown-first strategy to web_fetch#570

Merged
JeanMertz merged 2 commits intomainfrom
prr137
Apr 16, 2026
Merged

chore(tools): Add markdown-first strategy to web_fetch#570
JeanMertz merged 2 commits intomainfrom
prr137

Conversation

@JeanMertz
Copy link
Copy Markdown
Collaborator

The web_fetch tool previously fetched pages as HTML and converted to markdown. Many documentation platforms (Mintlify, GitBook, Fumadocs) publish a clean markdown twin at {path}.md, which is cheaper to process and produces better output than HTML-to-markdown conversion.

The tool now supports three strategies, controlled via tool.options:

  • auto (default): probe {url}.md first; fall back to HTML
  • markdown: require the .md variant; error if unavailable
  • html: skip the probe and fetch HTML directly (previous behavior)

Per-domain overrides are supported via options.domains, e.g.:

options.domains."github.com" = "html"
options.domains."*.mintlify.app" = "markdown"

Defaults ship with github.com, docs.rs, and crates.io pinned to html since those sites don't follow the .md convention.

The implementation splits the monolithic fetch.rs into three sub-modules: html (old pipeline), markdown (new pipeline), and options (strategy/domain rule parsing). Section listing and extraction now work for both HTML and markdown sources, using GitHub-style heading slugs as anchor IDs in both cases.

The `web_fetch` tool previously fetched pages as HTML and converted to
markdown. Many documentation platforms (Mintlify, GitBook, Fumadocs)
publish a clean markdown twin at `{path}.md`, which is cheaper to
process and produces better output than HTML-to-markdown conversion.

The tool now supports three strategies, controlled via `tool.options`:

- `auto` (default): probe `{url}.md` first; fall back to HTML
- `markdown`: require the `.md` variant; error if unavailable
- `html`: skip the probe and fetch HTML directly (previous behavior)

Per-domain overrides are supported via `options.domains`, e.g.:

```toml
options.domains."github.com" = "html"
options.domains."*.mintlify.app" = "markdown"
```

Defaults ship with `github.com`, `docs.rs`, and `crates.io` pinned to
`html` since those sites don't follow the `.md` convention.

The implementation splits the monolithic `fetch.rs` into three
sub-modules: `html` (old pipeline), `markdown` (new pipeline), and
`options` (strategy/domain rule parsing). Section listing and extraction
now work for both HTML and markdown sources, using GitHub-style heading
slugs as anchor IDs in both cases.

Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
@JeanMertz JeanMertz merged commit 78b7109 into main Apr 16, 2026
13 checks passed
@JeanMertz JeanMertz deleted the prr137 branch April 16, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant