Skip to content

Web search tool: DuckDuckGo + Perplexity for research and daily briefs (lightweight) #669

@kovtcharov

Description

@kovtcharov

Summary

Lightweight web search capability via HTTP requests — no browser needed. Two tiers: free (DuckDuckGo) and premium (Perplexity). This is the fast, cheap option for research, daily briefs, and information gathering.

Why Two Options for Web Access

GAIA needs two distinct web capabilities:

Web Search Tool (this issue) Playwright Computer Use (#458)
What HTTP-based search + content extraction Full browser automation
Use for Research, news, daily briefs, fact-checking Gmail, Calendar, web apps, form filling
Requires Nothing (DuckDuckGo) or API key (Perplexity) Chromium install (~150MB)
Speed Fast (~1-2s per query) Slow (~5-15s per page interaction)
Auth No login capability Can log into web apps
Cost Free (DDG) or ~$0.005/query (Perplexity) Free (local Chromium)

Tools

Tier 1: DuckDuckGo (free, no API key)

  • search_web(query, num_results) — Search the web, return titles + snippets + URLs
  • fetch_page(url, mode) — Fetch a page and extract: readable text, raw HTML, links, or tables

Tier 2: Perplexity (premium, requires API key)

  • search_web_premium(query) — Higher quality search via Perplexity sonar model
  • Auto-selects: if PERPLEXITY_API_KEY is set, uses Perplexity; otherwise falls back to DuckDuckGo

Existing Code

Architecture

src/gaia/agents/tools/web_search_tools.py  (NEW — WebSearchToolsMixin)
├── search_web() — DuckDuckGo search (free, default)
├── search_web_premium() — Perplexity search (optional)
├── fetch_page() — HTTP GET + content extraction
└── download_file() — Download with size limits + path validation

src/gaia/web/client.py  (from PR #495)
├── Rate limiting per domain
├── SSRF prevention (blocked schemes, ports, private IPs)
├── Content extraction (BeautifulSoup, boilerplate removal)
└── Table extraction (HTML → JSON)

Use Cases Enabled

  • Daily briefs — "What's in the news about AI today?"
  • Research — "Find the latest benchmarks for Qwen3-8B"
  • Fact-checking — "Is this claim accurate?"
  • Price comparison — "What's the cheapest flight to Denver next week?"
  • Documentation lookup — "How do I configure Home Assistant automations?"

Dependencies

Acceptance Criteria

  • search_web() returns structured results from DuckDuckGo
  • fetch_page() extracts readable content from any URL
  • Perplexity auto-selected when API key is present
  • SSRF prevention blocks private IPs and dangerous protocols
  • Rate limiting prevents abuse
  • No browser or Chromium install required

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentdomain:surfacesAgent UI, Telegram, WhatsApp, Slack/Discord, mobileenhancementNew feature or requesttrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions