Typed web tools for the Halfax AI agent. Replaces ad-hoc bash → curl
with safer, smaller-model-friendly primitives.
| Tool | Purpose |
|---|---|
web_get(url, max_bytes?, accept?, follow_redirects?, as_markdown?) |
Fetch a URL. HTML auto-converts to markdown by default. |
web_head(url, follow_redirects?) |
HEAD-style probe — status + content-type, no body. |
web_search(query, n?) |
Federated search via SearXNG. |
web_extract(url, max_chars?) |
Fetch + Readability-style article extraction. |
web_cache_clear(prefix?) |
Flush the response cache. |
web_config() |
Diagnostic: dump effective config + warmup state. |
- SSRF guard: every URL is DNS-resolved up front and rejected if any resolved IP is loopback / RFC1918 / link-local / CGNAT / IPv6-private, or if the resolution returns a mix of public and private IPs (DNS rebinding setup).
- Scheme allowlist:
httpandhttpsonly.file:,javascript:,gopher:,ftp:,data:are all rejected. - Size cap: streaming read with byte counter; hard-kills the
connection at the limit instead of trusting
Content-Length. - Redirect handling: bounded chain (default 5), each hop re-validated, https→http downgrade refused.
- Per-domain rate limit: 1 req/s burst 5, defends against feedback loops where the agent hammers a single host.
- Cache: GET/HEAD only, honors
Cache-Control: no-storeandprivate, TTL + LRU bound.
cd hal-web-mcp
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python -m pytest tests/ # 31 testsThe Halfax extension's MCP manager auto-spawns this server. To make search work, point the env at a SearXNG instance.
The production instance runs on Cygnus (http://192.168.0.241:8888, Netbird
http://100.87.21.129:8888; moved from Betelgeuse 2026-05-30). To stand one up
elsewhere:
# docker-compose.yml
services:
searxng:
# Digest-pinned tag, NEVER :latest (DECISIONS §12) — engine plugins break on
# upstream changes. Bumps are quarterly + manual; test a new tag in a sidecar
# against the JSON probe below before swapping production.
image: searxng/searxng:2026.5.6-330d56bba
container_name: searxng
restart: unless-stopped
ports:
- "8888:8080"
volumes:
- ./searxng:/etc/searxng
environment:
- SEARXNG_BASE_URL=http://<host-ip>:8888/
- INSTANCE_NAME=halfaxAfter first run, edit ./searxng/settings.yml:
- Add
jsontosearch.formats(default ships HTML-only) - Set
server.secret_keyto a random hex string
Sanity-check:
curl 'http://<host>:8888/search?q=test&format=json' | jq '.results[0].title'Then set HALFAX_WEB_SEARXNG_URL (see env table). The MCP fires a
warmup ping at startup and every 10 minutes, so the agent's first
real search doesn't pay the cold-start cost.
| Var | Default | Purpose |
|---|---|---|
HALFAX_WEB_SEARCH_BACKEND |
searxng |
searxng | none |
HALFAX_WEB_SEARXNG_URL |
(unset) | e.g. http://100.87.21.129:8888 (Cygnus) |
HALFAX_WEB_TIMEOUT |
15 |
Total request timeout (s) |
HALFAX_WEB_CONNECT_TIMEOUT |
5 |
Connect timeout (s) |
HALFAX_WEB_MAX_BYTES |
2000000 |
Per-response size cap |
HALFAX_WEB_MAX_REDIRECTS |
5 |
Per-fetch redirect ceiling |
HALFAX_WEB_MAX_CONCURRENT |
4 |
Parallel-fetch semaphore |
HALFAX_WEB_CACHE_DIR |
~/.halfax-web-cache |
Disk cache root |
HALFAX_WEB_CACHE_TTL |
600 |
Cache entry TTL (s); 0 disables |
HALFAX_WEB_CACHE_MAX_BYTES |
100000000 |
LRU quota |
HALFAX_WEB_ALLOW_DOMAINS |
(unset) | Comma list. When set, only these domains. |
HALFAX_WEB_DENY_DOMAINS |
(unset) | Comma list. Always rejected. Overrides allow. |
HALFAX_WEB_USER_AGENT |
Halfax-AI-Agent/<v> (+local) |
UA header |
HALFAX_WEB_WARMUP_INTERVAL |
600 |
SearXNG keepalive interval (s) |
HALFAX_WEB_LOG_LEVEL |
INFO |
Stderr log verbosity |
- No JavaScript rendering. SPA-only sites that don't ship server-rendered
HTML produce empty/garbage output.
web_extract(Readability) is the best fallback for those. - No request-body POST.
web_get/web_headare forGET/HEADonly. Usebash+curlfor POST/PUT to public APIs. - Cert validation is via the original hostname. We DNS-validate before the fetch but let httpx connect normally; a hostile DNS server could rebind between our resolve and httpx's, though that requires the attacker control DNS for a target the user already chose to query. A future pass can swap in a custom transport that uses the validated IP directly with proper SNI.