Lightweight web search capability via HTTP requests — no browser needed. Two tiers: free (DuckDuckGo) and premium (Perplexity). This is the fast, cheap option for research, daily briefs, and information gathering.
src/gaia/agents/tools/web_search_tools.py (NEW — WebSearchToolsMixin)
├── search_web() — DuckDuckGo search (free, default)
├── search_web_premium() — Perplexity search (optional)
├── fetch_page() — HTTP GET + content extraction
└── download_file() — Download with size limits + path validation
src/gaia/web/client.py (from PR #495)
├── Rate limiting per domain
├── SSRF prevention (blocked schemes, ports, private IPs)
├── Content extraction (BeautifulSoup, boilerplate removal)
└── Table extraction (HTML → JSON)
Summary
Lightweight web search capability via HTTP requests — no browser needed. Two tiers: free (DuckDuckGo) and premium (Perplexity). This is the fast, cheap option for research, daily briefs, and information gathering.
Why Two Options for Web Access
GAIA needs two distinct web capabilities:
Tools
Tier 1: DuckDuckGo (free, no API key)
search_web(query, num_results)— Search the web, return titles + snippets + URLsfetch_page(url, mode)— Fetch a page and extract: readable text, raw HTML, links, or tablesTier 2: Perplexity (premium, requires API key)
search_web_premium(query)— Higher quality search via PerplexitysonarmodelPERPLEXITY_API_KEYis set, uses Perplexity; otherwise falls back to DuckDuckGoExisting Code
browser_tools.pywithfetch_pageandsearch_web(DuckDuckGo) — already implementedsrc/gaia/web/client.pyin PR Enhance ChatAgent with file navigation, web browsing, scratchpad tools, and write security guardrails #495 has HTTP client with rate limiting, SSRF prevention, content extractionArchitecture
Use Cases Enabled
Dependencies
Acceptance Criteria
search_web()returns structured results from DuckDuckGofetch_page()extracts readable content from any URL