feat(tools): read_page — fetch URLs without a browser by askalf · Pull Request #29 · askalf/hands

askalf · 2026-04-29T22:41:33Z

Summary

Adds a custom `read_page` tool to hands' SDK mode, alongside the existing `computer` / `bash` / `str_replace_based_edit_tool`. When the agent needs to read web content, it calls `read_page(url)` and gets back cleaned HTML + extracted page metadata directly — no browser, no screenshot, no JavaScript, no OCR.

This is the "browser without a browser" thesis turned into product. Most agent web tasks don't need to interact with a page; they need to read it. For that 80% case, screenshot+OCR is expensive overkill.

Architecture

`read_page` returns cleaned HTML directly to the agent as a `tool_result`. The agent — already a Claude model — reads HTML natively. No nested LLM call to translate HTML → markdown. This makes read_page meaningfully cheaper than alternatives that pre-render server-side.

```
fetch(url, browser-UA, follow-redirects)
→ cheerio parse
→ drop scripts/styles/iframes/svg/canvas/video/audio
→ keep only signal-bearing metadata (title, OG, Twitter, JSON-LD)
→ drop class/style/aria/data-*/onclick attributes
→ resolve relative href/src to absolute URLs
→ prune cookie/consent banners by class/id selector
→ inline lazy-loaded image data-src → src
→ 80KB hard size cap with truncation marker
```

Cost comparison (live-tested, sonnet-4-6 via dario, OAuth subscription billing)

Task	read_page	computer tool path (estimated)
Summarize Wikipedia article	2 turns, 14,236 in / 307 out	6-8 turns, ~50K+ in
Read Anthropic effort docs	2 turns, 10,659 in / 315 out	6-8 turns, ~50K+ in
Identify SPA shell	2 turns, 1,971 in / 330 out	4-6 turns, ~20K+ in

Each `computer` screenshot costs ~1,500 tokens; `read_page` returns ~1-7K tokens of cleaned HTML for the same content. Plus no browser cold-start, no scroll loops, no OCR-style flattening errors.

SPA handling

For pure-SPA URLs (empty body after cleaning, content fetched by JS), `read_page` returns a structured metadata summary plus an explicit marker telling the agent the body was JS-only. Verified live against `https://claude.ai/\` — the agent correctly identifies it as a SPA shell and works from the metadata rather than hallucinating content.

System prompt nudge

Added to `buildSdkSystemPrompt`:

For reading web pages: ALWAYS use the read_page tool, NEVER navigate to a URL with the computer tool. read_page fetches the URL and returns cleaned HTML directly to you — no browser, no screenshot, no JavaScript. Use it for: reading articles, browsing docs, GitHub READMEs, news pages, JSON APIs, RSS feeds. The computer tool is for clicking and typing into a UI; reading content does not need the UI.

Anti-pattern explicitly added: "Do NOT open a browser to read a URL — use read_page."

Test plan

Unit: `test/page-cleanup.test.mjs` — 17 cases covering script/style/iframe stripping, relative→absolute URL resolution (including protocol-relative + fragment cases), aria/data-attribute removal, comment stripping, lazy-image inlining, JSON-LD parse with malformed-skip, size cap with truncation marker, cookie/consent class+id pruning.
Local: `npm test` passes 66/66 (was 49 → 66; +17 from this PR).
Live E2E: 3 prompts (Wikipedia article, Anthropic docs, claude.ai SPA), all successfully used `read_page` instead of the computer tool, all routed through dario for OAuth subscription billing.
CI: standard hands checks (actionlint, build × Node, CodeQL).

New dependencies

`cheerio` — HTML parser, ~70KB unpacked, no transitive web runtime. Used in `src/util/page-cleanup.ts`.

Notes

Custom tool definition (vs. Anthropic beta tool type) — `read_page` is plumbed in alongside the existing beta-typed tools. The SDK accepts mixed tool definitions cleanly; no special handling needed beyond the dispatcher branch.
Audit log already records every tool call — read_page entries land in `~/.hands/audit.jsonl` with `{tool: "read_page", args: {url}}` shape.
This pairs cleanly with the dario auto-detect work in #26: `hands` auto-routes through dario for OAuth subscription billing AND uses `read_page` for static web content. The whole web-research path is now subscription-priced and fast.

Adds a custom `read_page` tool to hands' SDK mode, alongside the existing computer / bash / str_replace_based_edit_tool. When the agent needs to read web content, it calls read_page(url) and gets back cleaned HTML + extracted metadata directly — no browser, no screenshot, no JavaScript, no OCR. Architecture: read_page returns cleaned HTML *directly* to the agent as a tool_result. The agent (already a Claude model) reads HTML natively — no nested LLM call to translate HTML → markdown. This makes read_page meaningfully cheaper than the browser path. Cost comparison (live-tested, claude-sonnet-4-6 via dario): | Task | read_page | computer tool (estimated) | |-------------------------|--------------------|---------------------------| | Summarize Wiki article | 2 turns, 14K in | 6-8 turns, ~50K+ in | | Read Anthropic docs | 2 turns, 11K in | 6-8 turns, ~50K+ in | | Identify SPA shell | 2 turns, 2K in | 4-6 turns, ~20K+ in | Each computer-tool screenshot costs ~1500 tokens; read_page returns ~1-7K tokens of cleaned HTML for the same content. Plus no browser cold-start, no scroll loops, no OCR-style flattening errors. Pipeline (src/util/page-cleanup.ts): fetch(url, browser-UA, follow-redirects) → cheerio parse → drop scripts/styles/iframes/svg/canvas/video/audio → keep only signal-bearing <head> metadata (title, OG, Twitter, JSON-LD) → drop class/style/aria/data-*/onclick attributes → resolve relative href/src to absolute URLs → prune cookie/consent banners by class/id selector → inline lazy-loaded image data-src → src → 80KB hard size cap with truncation marker For pure-SPA URLs (empty body after cleaning, content fetched by JS), read_page returns a structured metadata summary plus an explicit marker telling the agent the body was JS-only. Verified live against https://claude.ai/ — the agent correctly identifies it as a SPA shell and works from the metadata rather than hallucinating content. System prompt updated to nudge the agent toward read_page for URL reading. Anti-pattern explicitly added: "Do NOT open a browser to read a URL — use read_page." New deps: cheerio (HTML parser, ~70KB unpacked, no transitive web runtime). Test surface: 17 unit tests covering script/style/iframe stripping, relative→absolute URL resolution (including protocol- relative + fragment cases), aria/data-attribute removal, comment stripping, lazy-image inlining, JSON-LD parse with malformed-skip, size cap with truncation marker, cookie/consent class+id pruning. Full hands suite goes 49 → 66 (+17 here, no regressions).

#30) Bundles four feature PRs landed since v0.3.0 (#25-#29). All additive; v0.3.0 users see no behavior change without opting into the new flags or letting the new tool surface. Headline change: hands gains a `read_page(url)` tool that fetches a URL via plain fetch + cleans the HTML and hands it to the agent directly. No browser, no JS execution, no screenshot+OCR. The "browser without a browser" thesis turned into product. Other features: - Auto-detect dario at startup (probes localhost:3456/health, sets ANTHROPIC_BASE_URL when it responds — operator override wins via pre-set env var or --no-dario). - --persona <name> / --system-prompt <path> flags swap in custom system prompts. Bundled set: minimal, thorough, concise, security-aware. User overrides at ~/.hands/personas/<name>.md. Safe per dario research (#172) — billing classifier doesn't fingerprint system prompt content. - `hands audit list/show/replay` — inspect and re-execute audit- log entries. Default replay is dry-run; --execute fires after prompt-confirmation for state-changing actions. Test count 49 → 92. All four PRs were rebased onto each other with no semantic regressions; the integration was live-tested end- to-end through dario for OAuth subscription billing before merge. Auto-release workflow will fire on merge (version-changed gate sees 0.3.0 → 0.4.0), build + smoke + tag + GitHub release + npm publish with provenance attestation. Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>

askalf force-pushed the feat/read-page-tool branch from 6d39bb7 to f49fe70 Compare April 29, 2026 23:42

askalf merged commit e44332b into main Apr 29, 2026
5 checks passed

askalf deleted the feat/read-page-tool branch April 29, 2026 23:44

askalf mentioned this pull request Apr 30, 2026

release: v0.4.0 — read_page, auto-detect dario, personas, audit replay #30

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): read_page — fetch URLs without a browser#29

feat(tools): read_page — fetch URLs without a browser#29
askalf merged 1 commit into
mainfrom
feat/read-page-tool

askalf commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

askalf commented Apr 29, 2026

Summary

Architecture

Cost comparison (live-tested, sonnet-4-6 via dario, OAuth subscription billing)

SPA handling

System prompt nudge

Test plan

New dependencies

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant