Skip to content

feat(agents): NewsAgent — RSS feeds, web search, and LLM-powered news digests #862

@kovtcharov-amd

Description

@kovtcharov-amd

Summary

Users already run news summarizers through external tools via Lemonade. GAIA should provide this natively: subscribe to RSS feeds, search the web for topic news, extract article content, and produce LLM-summarized digests — all running locally on AMD hardware. The NewsAgent plugs into the Daily Briefs pipeline (#663) as the "news" content source and works standalone via gaia news.

Features

Capability Tools Reuses
Feed management — add/remove/list RSS/Atom feeds add_feed, remove_feed, list_feeds, refresh_feeds New; feedparser library
OPML import/export CLI gaia news import/export New
Web search for news search_news(query, max_results) Brave Search MCP or Perplexity (ExternalToolsMixin pattern)
Article extraction fetch_article(url) Fetch MCP or httpx + trafilatura
Summarization summarize_articles(articles, style) SummarizeAgent pipeline
Digest compilation compile_digest(topics, style, format) New DigestCompiler class
Topic tracking — interest profiles, dedup, trending add_topic, remove_topic, list_topics New

Digest Styles

  • headlines — one-liner per article
  • brief — 2-3 sentence summaries
  • deep_dive — full analysis of a single article

Storage

  • ~/.gaia/news/feeds.yaml — feed subscriptions (YAML for human editability)
  • ~/.gaia/news/topics.yaml — interest profiles with keywords
  • ~/.gaia/news/digests/YYYY-MM-DD.md — generated digests (30-day retention)

Non-Goals

  • Social media monitoring (Twitter/X, Reddit, Mastodon) — RSS covers most use cases for v1
  • Real-time streaming ticker — digests are batch operations
  • Paywall bypass — extract freely available content only
  • Original content creation — the agent summarizes and curates, it does not write articles
  • Manga/image generation — SDAgent territory

Implementation Approach

  • src/gaia/agents/news/agent.py, tools.py, feed_manager.py, content_extractor.py, digest_compiler.py
  • NewsAgent(MCPAgent) — inherits MCPAgent for Fetch MCP and Brave Search MCP access
  • FeedManager handles CRUD on feeds.yaml, RSS/Atom parsing via feedparser, OPML import/export, ETag/Last-Modified caching
  • ContentExtractor wraps Fetch MCP (primary) with httpx + trafilatura fallback for HTML-to-text
  • DigestCompiler orchestrates: feed refresh → extraction → deduplication → topic assignment → summarization (via SummarizeAgent) → output formatting
  • New dependencies: feedparser, trafilatura in setup.py extras [news]
  • Register news in KNOWN_TOOLS in registry.py
  • CLI: gaia news with sub-subcommands: digest, feeds, topics, import, search
  • Docs: docs/guides/news.mdx
  • Expose get_news_brief(topics, max_articles) for Daily Briefs integration (Personalized daily briefs: morning digest of email, calendar, news, and tasks #663)

Dependencies

Dependency Issue Blocking?
SummarizeAgent Shipped (v0.15) No
Fetch MCP Available No (fallback to httpx+trafilatura)
Brave Search / Perplexity Available No (RSS works without web search)
Autonomy Engine (scheduled digests) #634 No — manual gaia news digest works without it
Daily Briefs integration #663 No — NewsAgent works standalone first
Messaging adapters (push to Discord) #635 No — digest delivery to messaging is a follow-up

Security Considerations

  • Outbound HTTP only; respect robots.txt and include responsible User-Agent header (GAIA/version)
  • HTML → plain text conversion before LLM prompt (no raw HTML injection)
  • Feed URLs restricted to HTTP/HTTPS schemes only
  • API keys from env vars (existing pattern), never persisted by agent
  • Digest retention: 30-day default, configurable, old digests auto-pruned

Test Plan

  • Unit: FeedManager CRUD, OPML import/export, malformed YAML handling
  • Unit: ContentExtractor from HTML fixtures, timeout/404 handling
  • Unit: Deduplication fingerprinting, topic-to-article matching
  • Unit: DigestCompiler grouping and formatted output
  • Integration: add real RSS feed → refresh → extract → produce digest
  • CLI: gaia news feeds add <url>, gaia news digest, gaia news topics add <name>

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentdomain:multimodalVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAenhancementNew feature or requesttrack:consumer-appConsumer product track — mobile-first: voice + messaging + memory + skills

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions