Token-efficient web browser for LLM agents
A typical web page is 50,000+ tokens. The useful content? 2,000–5,000 tokens.
BotBrowser strips the bloat and gives your agents clean markdown — saving 90–95% of tokens.
Raw HTML: 52,000 tokens ████████████████████████████████████████████████████
BotBrowser: 3,200 tokens ██████
↑ 94% savings
npm install botbrowser # JavaScript / TypeScriptpip install botbrowser # PythonNo API key. No server. No config. Just install and extract.
// JavaScript / TypeScript
import { extract } from 'botbrowser';
const result = await extract('https://example.com/article');
console.log(result.content); // clean markdown
console.log(result.metadata.tokenSavingsPercent); // 94# Python
from botbrowser import extract
result = extract("https://example.com/article")
print(result.content) # clean markdown
print(result.metadata.token_savings_percent) # 94{
"url": "https://example.com/article",
"title": "Article Title",
"description": "Meta description",
"content": "# Article Title\n\nClean markdown content...",
"textContent": "Plain text version...",
"links": [
{ "text": "Related Article", "href": "https://example.com/related" }
],
"metadata": {
"rawTokenEstimate": 52000,
"cleanTokenEstimate": 3200,
"tokenSavingsPercent": 94,
"wordCount": 1250,
"fetchedAt": "2026-02-26T10:30:00.000Z"
}
}- Token-first — Built specifically to minimize LLM token usage. Every design decision optimizes for fewer tokens while preserving meaning.
- Dual native SDKs — Real implementations in both JS and Python, not thin wrappers. Use whichever fits your stack.
- Zero setup —
npm installorpip install. No API key, no account, no server to run. Works offline. - Battle-tested extraction — Mozilla Readability (JS) and Trafilatura (Python) — the same engines powering Firefox Reader View and academic web research.
- Open source — MIT licensed. Self-host, fork, embed, do what you want.
URL → Fetch → Extract → Clean → Markdown
- Fetch — Smart HTTP with user-agent rotation, redirect handling, timeouts
- Extract — Identifies main content using Readability (JS) / Trafilatura (Python)
- Clean — Strips scripts, styles, ads, nav, footers, cookie banners, tracking, hidden elements
- Convert — Clean Markdown preserving headings, lists, links, tables, code blocks
const result = await extract({
url: 'https://example.com',
format: 'text', // "markdown" (default) or "text"
timeout: 10000, // request timeout in ms (default: 15000)
includeLinks: false, // extract links (default: true)
});result = extract(
"https://example.com",
format="text",
timeout=10000,
include_links=False,
)For language-agnostic access or shared infrastructure:
docker compose up
# or: cd js && pnpm install && pnpm build && pnpm devcurl -X POST http://localhost:3000/extract \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com"}'Python client for the REST API:
from botbrowser import BotBrowserClient
client = BotBrowserClient("http://localhost:3000")
result = client.extract("https://example.com")# JS
cd js && pnpm install && pnpm build && pnpm test
# Python
cd python && pip install -e ".[dev]" && pytest tests/ -vMIT