Summary
Decouple Watcher from the Archiver SDK at runtime by storing all pipeline-critical state (URL, extraction specs, change history) locally. Archiver becomes a client of Watcher's API (control plane) rather than a runtime dependency Watcher calls each pipeline cycle. Non-Archiver clients can define WatchedItems with a URL and source specs directly. Closes #184 as resolved-by-architecture.
Design doc
docs/plans/2026-06-07-watcher-standalone-service-design.md
Scope
Phase A — Pipeline decoupling (ships first)
effective_url + source_specs + archiver_info_source_id added to WatchedItem; info_item_id → nullable
health_status / last_checked_at / last_changed_at moved from Watch → WatchedItem
- Watch drops
target_info_source_id, info_item_id, effective_url
- New
change_revisions table (Watcher-local change history keyed by watched_item_id)
pending_source_revisions → pending_archiver_sync (rekeyed, Archiver-backed items only)
last_known_revisions table dropped (replaced by change_revisions query)
fetch_info_item_bindings / InfoItemBindings / InfoSourceProto deleted
- Pipeline reads URL + specs from WatchedItem; no Archiver SDK call at runtime
- Drain worker rekeyed to
change_revision_id; _resolve_sub_aspect_watch deleted
- New
GET /api/v1/watched-items/{id}/revisions endpoint
POST /watched-items accepts {url, source_specs} directly; info_item_id optional
Phase B — Control plane inversion
- Auth mechanism for Archiver to call Watcher's API as a trusted caller
- Dashboard stripped of InfoItem-coupled routes (picker, binding tree)
POST /watched-items becomes Archiver's primary WatchedItem creation surface
Out of scope
- Phase C (Redis state sync) — Archiver will PATCH Watcher directly on URL/spec changes
- Selector variance detection beyond zero-content fallback
- Per-Watch spec overrides
Summary
Decouple Watcher from the Archiver SDK at runtime by storing all pipeline-critical state (URL, extraction specs, change history) locally. Archiver becomes a client of Watcher's API (control plane) rather than a runtime dependency Watcher calls each pipeline cycle. Non-Archiver clients can define WatchedItems with a URL and source specs directly. Closes #184 as resolved-by-architecture.
Design doc
docs/plans/2026-06-07-watcher-standalone-service-design.mdScope
Phase A — Pipeline decoupling (ships first)
effective_url+source_specs+archiver_info_source_idadded to WatchedItem;info_item_id→ nullablehealth_status/last_checked_at/last_changed_atmoved from Watch → WatchedItemtarget_info_source_id,info_item_id,effective_urlchange_revisionstable (Watcher-local change history keyed bywatched_item_id)pending_source_revisions→pending_archiver_sync(rekeyed, Archiver-backed items only)last_known_revisionstable dropped (replaced bychange_revisionsquery)fetch_info_item_bindings/InfoItemBindings/InfoSourceProtodeletedchange_revision_id;_resolve_sub_aspect_watchdeletedGET /api/v1/watched-items/{id}/revisionsendpointPOST /watched-itemsaccepts{url, source_specs}directly;info_item_idoptionalPhase B — Control plane inversion
POST /watched-itemsbecomes Archiver's primary WatchedItem creation surfaceOut of scope