Skip to content

feat(tools): read_page — fetch URLs without a browser#29

Merged
askalf merged 1 commit into
mainfrom
feat/read-page-tool
Apr 29, 2026
Merged

feat(tools): read_page — fetch URLs without a browser#29
askalf merged 1 commit into
mainfrom
feat/read-page-tool

Conversation

@askalf
Copy link
Copy Markdown
Owner

@askalf askalf commented Apr 29, 2026

Summary

Adds a custom `read_page` tool to hands' SDK mode, alongside the existing `computer` / `bash` / `str_replace_based_edit_tool`. When the agent needs to read web content, it calls `read_page(url)` and gets back cleaned HTML + extracted page metadata directly — no browser, no screenshot, no JavaScript, no OCR.

This is the "browser without a browser" thesis turned into product. Most agent web tasks don't need to interact with a page; they need to read it. For that 80% case, screenshot+OCR is expensive overkill.

Architecture

`read_page` returns cleaned HTML directly to the agent as a `tool_result`. The agent — already a Claude model — reads HTML natively. No nested LLM call to translate HTML → markdown. This makes read_page meaningfully cheaper than alternatives that pre-render server-side.

```
fetch(url, browser-UA, follow-redirects)
→ cheerio parse
→ drop scripts/styles/iframes/svg/canvas/video/audio
→ keep only signal-bearing metadata (title, OG, Twitter, JSON-LD)
→ drop class/style/aria/data-*/onclick attributes
→ resolve relative href/src to absolute URLs
→ prune cookie/consent banners by class/id selector
→ inline lazy-loaded image data-src → src
→ 80KB hard size cap with truncation marker
```

Cost comparison (live-tested, sonnet-4-6 via dario, OAuth subscription billing)

Task read_page computer tool path (estimated)
Summarize Wikipedia article 2 turns, 14,236 in / 307 out 6-8 turns, ~50K+ in
Read Anthropic effort docs 2 turns, 10,659 in / 315 out 6-8 turns, ~50K+ in
Identify SPA shell 2 turns, 1,971 in / 330 out 4-6 turns, ~20K+ in

Each `computer` screenshot costs ~1,500 tokens; `read_page` returns ~1-7K tokens of cleaned HTML for the same content. Plus no browser cold-start, no scroll loops, no OCR-style flattening errors.

SPA handling

For pure-SPA URLs (empty body after cleaning, content fetched by JS), `read_page` returns a structured metadata summary plus an explicit marker telling the agent the body was JS-only. Verified live against `https://claude.ai/\` — the agent correctly identifies it as a SPA shell and works from the metadata rather than hallucinating content.

System prompt nudge

Added to `buildSdkSystemPrompt`:

For reading web pages: ALWAYS use the read_page tool, NEVER navigate to a URL with the computer tool. read_page fetches the URL and returns cleaned HTML directly to you — no browser, no screenshot, no JavaScript. Use it for: reading articles, browsing docs, GitHub READMEs, news pages, JSON APIs, RSS feeds. The computer tool is for clicking and typing into a UI; reading content does not need the UI.

Anti-pattern explicitly added: "Do NOT open a browser to read a URL — use read_page."

Test plan

  • Unit: `test/page-cleanup.test.mjs` — 17 cases covering script/style/iframe stripping, relative→absolute URL resolution (including protocol-relative + fragment cases), aria/data-attribute removal, comment stripping, lazy-image inlining, JSON-LD parse with malformed-skip, size cap with truncation marker, cookie/consent class+id pruning.
  • Local: `npm test` passes 66/66 (was 49 → 66; +17 from this PR).
  • Live E2E: 3 prompts (Wikipedia article, Anthropic docs, claude.ai SPA), all successfully used `read_page` instead of the computer tool, all routed through dario for OAuth subscription billing.
  • CI: standard hands checks (actionlint, build × Node, CodeQL).

New dependencies

  • `cheerio` — HTML parser, ~70KB unpacked, no transitive web runtime. Used in `src/util/page-cleanup.ts`.

Notes

  • Custom tool definition (vs. Anthropic beta tool type) — `read_page` is plumbed in alongside the existing beta-typed tools. The SDK accepts mixed tool definitions cleanly; no special handling needed beyond the dispatcher branch.
  • Audit log already records every tool call — read_page entries land in `~/.hands/audit.jsonl` with `{tool: "read_page", args: {url}}` shape.
  • This pairs cleanly with the dario auto-detect work in #26: `hands` auto-routes through dario for OAuth subscription billing AND uses `read_page` for static web content. The whole web-research path is now subscription-priced and fast.

Adds a custom `read_page` tool to hands' SDK mode, alongside the
existing computer / bash / str_replace_based_edit_tool. When the
agent needs to read web content, it calls read_page(url) and gets
back cleaned HTML + extracted metadata directly — no browser, no
screenshot, no JavaScript, no OCR.

Architecture: read_page returns cleaned HTML *directly* to the agent
as a tool_result. The agent (already a Claude model) reads HTML
natively — no nested LLM call to translate HTML → markdown. This
makes read_page meaningfully cheaper than the browser path.

Cost comparison (live-tested, claude-sonnet-4-6 via dario):

  | Task                    | read_page          | computer tool (estimated) |
  |-------------------------|--------------------|---------------------------|
  | Summarize Wiki article  | 2 turns, 14K in   | 6-8 turns, ~50K+ in       |
  | Read Anthropic docs     | 2 turns, 11K in   | 6-8 turns, ~50K+ in       |
  | Identify SPA shell      | 2 turns, 2K in    | 4-6 turns, ~20K+ in       |

Each computer-tool screenshot costs ~1500 tokens; read_page returns
~1-7K tokens of cleaned HTML for the same content. Plus no browser
cold-start, no scroll loops, no OCR-style flattening errors.

Pipeline (src/util/page-cleanup.ts):
  fetch(url, browser-UA, follow-redirects)
    → cheerio parse
    → drop scripts/styles/iframes/svg/canvas/video/audio
    → keep only signal-bearing <head> metadata (title, OG, Twitter, JSON-LD)
    → drop class/style/aria/data-*/onclick attributes
    → resolve relative href/src to absolute URLs
    → prune cookie/consent banners by class/id selector
    → inline lazy-loaded image data-src → src
    → 80KB hard size cap with truncation marker

For pure-SPA URLs (empty body after cleaning, content fetched by JS),
read_page returns a structured metadata summary plus an explicit
marker telling the agent the body was JS-only. Verified live against
https://claude.ai/ — the agent correctly identifies it as a SPA shell
and works from the metadata rather than hallucinating content.

System prompt updated to nudge the agent toward read_page for URL
reading. Anti-pattern explicitly added: "Do NOT open a browser to
read a URL — use read_page."

New deps: cheerio (HTML parser, ~70KB unpacked, no transitive web
runtime). Test surface: 17 unit tests covering script/style/iframe
stripping, relative→absolute URL resolution (including protocol-
relative + fragment cases), aria/data-attribute removal, comment
stripping, lazy-image inlining, JSON-LD parse with malformed-skip,
size cap with truncation marker, cookie/consent class+id pruning.
Full hands suite goes 49 → 66 (+17 here, no regressions).
@askalf askalf force-pushed the feat/read-page-tool branch from 6d39bb7 to f49fe70 Compare April 29, 2026 23:42
@askalf askalf merged commit e44332b into main Apr 29, 2026
5 checks passed
@askalf askalf deleted the feat/read-page-tool branch April 29, 2026 23:44
askalf added a commit that referenced this pull request Apr 30, 2026
#30)

Bundles four feature PRs landed since v0.3.0 (#25-#29). All
additive; v0.3.0 users see no behavior change without opting into
the new flags or letting the new tool surface.

Headline change: hands gains a `read_page(url)` tool that fetches a
URL via plain fetch + cleans the HTML and hands it to the agent
directly. No browser, no JS execution, no screenshot+OCR. The
"browser without a browser" thesis turned into product.

Other features:
- Auto-detect dario at startup (probes localhost:3456/health, sets
  ANTHROPIC_BASE_URL when it responds — operator override wins via
  pre-set env var or --no-dario).
- --persona <name> / --system-prompt <path> flags swap in custom
  system prompts. Bundled set: minimal, thorough, concise,
  security-aware. User overrides at ~/.hands/personas/<name>.md.
  Safe per dario research (#172) — billing classifier doesn't
  fingerprint system prompt content.
- `hands audit list/show/replay` — inspect and re-execute audit-
  log entries. Default replay is dry-run; --execute fires after
  prompt-confirmation for state-changing actions.

Test count 49 → 92. All four PRs were rebased onto each other
with no semantic regressions; the integration was live-tested end-
to-end through dario for OAuth subscription billing before merge.

Auto-release workflow will fire on merge (version-changed gate
sees 0.3.0 → 0.4.0), build + smoke + tag + GitHub release + npm
publish with provenance attestation.

Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant