feat(tools): read_page — fetch URLs without a browser#29
Merged
Conversation
Adds a custom `read_page` tool to hands' SDK mode, alongside the
existing computer / bash / str_replace_based_edit_tool. When the
agent needs to read web content, it calls read_page(url) and gets
back cleaned HTML + extracted metadata directly — no browser, no
screenshot, no JavaScript, no OCR.
Architecture: read_page returns cleaned HTML *directly* to the agent
as a tool_result. The agent (already a Claude model) reads HTML
natively — no nested LLM call to translate HTML → markdown. This
makes read_page meaningfully cheaper than the browser path.
Cost comparison (live-tested, claude-sonnet-4-6 via dario):
| Task | read_page | computer tool (estimated) |
|-------------------------|--------------------|---------------------------|
| Summarize Wiki article | 2 turns, 14K in | 6-8 turns, ~50K+ in |
| Read Anthropic docs | 2 turns, 11K in | 6-8 turns, ~50K+ in |
| Identify SPA shell | 2 turns, 2K in | 4-6 turns, ~20K+ in |
Each computer-tool screenshot costs ~1500 tokens; read_page returns
~1-7K tokens of cleaned HTML for the same content. Plus no browser
cold-start, no scroll loops, no OCR-style flattening errors.
Pipeline (src/util/page-cleanup.ts):
fetch(url, browser-UA, follow-redirects)
→ cheerio parse
→ drop scripts/styles/iframes/svg/canvas/video/audio
→ keep only signal-bearing <head> metadata (title, OG, Twitter, JSON-LD)
→ drop class/style/aria/data-*/onclick attributes
→ resolve relative href/src to absolute URLs
→ prune cookie/consent banners by class/id selector
→ inline lazy-loaded image data-src → src
→ 80KB hard size cap with truncation marker
For pure-SPA URLs (empty body after cleaning, content fetched by JS),
read_page returns a structured metadata summary plus an explicit
marker telling the agent the body was JS-only. Verified live against
https://claude.ai/ — the agent correctly identifies it as a SPA shell
and works from the metadata rather than hallucinating content.
System prompt updated to nudge the agent toward read_page for URL
reading. Anti-pattern explicitly added: "Do NOT open a browser to
read a URL — use read_page."
New deps: cheerio (HTML parser, ~70KB unpacked, no transitive web
runtime). Test surface: 17 unit tests covering script/style/iframe
stripping, relative→absolute URL resolution (including protocol-
relative + fragment cases), aria/data-attribute removal, comment
stripping, lazy-image inlining, JSON-LD parse with malformed-skip,
size cap with truncation marker, cookie/consent class+id pruning.
Full hands suite goes 49 → 66 (+17 here, no regressions).
6d39bb7 to
f49fe70
Compare
7 tasks
askalf
added a commit
that referenced
this pull request
Apr 30, 2026
#30) Bundles four feature PRs landed since v0.3.0 (#25-#29). All additive; v0.3.0 users see no behavior change without opting into the new flags or letting the new tool surface. Headline change: hands gains a `read_page(url)` tool that fetches a URL via plain fetch + cleans the HTML and hands it to the agent directly. No browser, no JS execution, no screenshot+OCR. The "browser without a browser" thesis turned into product. Other features: - Auto-detect dario at startup (probes localhost:3456/health, sets ANTHROPIC_BASE_URL when it responds — operator override wins via pre-set env var or --no-dario). - --persona <name> / --system-prompt <path> flags swap in custom system prompts. Bundled set: minimal, thorough, concise, security-aware. User overrides at ~/.hands/personas/<name>.md. Safe per dario research (#172) — billing classifier doesn't fingerprint system prompt content. - `hands audit list/show/replay` — inspect and re-execute audit- log entries. Default replay is dry-run; --execute fires after prompt-confirmation for state-changing actions. Test count 49 → 92. All four PRs were rebased onto each other with no semantic regressions; the integration was live-tested end- to-end through dario for OAuth subscription billing before merge. Auto-release workflow will fire on merge (version-changed gate sees 0.3.0 → 0.4.0), build + smoke + tag + GitHub release + npm publish with provenance attestation. Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a custom `read_page` tool to hands' SDK mode, alongside the existing `computer` / `bash` / `str_replace_based_edit_tool`. When the agent needs to read web content, it calls `read_page(url)` and gets back cleaned HTML + extracted page metadata directly — no browser, no screenshot, no JavaScript, no OCR.
This is the "browser without a browser" thesis turned into product. Most agent web tasks don't need to interact with a page; they need to read it. For that 80% case, screenshot+OCR is expensive overkill.
Architecture
`read_page` returns cleaned HTML directly to the agent as a `tool_result`. The agent — already a Claude model — reads HTML natively. No nested LLM call to translate HTML → markdown. This makes read_page meaningfully cheaper than alternatives that pre-render server-side.
```
fetch(url, browser-UA, follow-redirects)
→ cheerio parse
→ drop scripts/styles/iframes/svg/canvas/video/audio
→ keep only signal-bearing metadata (title, OG, Twitter, JSON-LD)
→ drop class/style/aria/data-*/onclick attributes
→ resolve relative href/src to absolute URLs
→ prune cookie/consent banners by class/id selector
→ inline lazy-loaded image data-src → src
→ 80KB hard size cap with truncation marker
```
Cost comparison (live-tested, sonnet-4-6 via dario, OAuth subscription billing)
Each `computer` screenshot costs ~1,500 tokens; `read_page` returns ~1-7K tokens of cleaned HTML for the same content. Plus no browser cold-start, no scroll loops, no OCR-style flattening errors.
SPA handling
For pure-SPA URLs (empty body after cleaning, content fetched by JS), `read_page` returns a structured metadata summary plus an explicit marker telling the agent the body was JS-only. Verified live against `https://claude.ai/\` — the agent correctly identifies it as a SPA shell and works from the metadata rather than hallucinating content.
System prompt nudge
Added to `buildSdkSystemPrompt`:
Anti-pattern explicitly added: "Do NOT open a browser to read a URL — use read_page."
Test plan
New dependencies
Notes