Skip to content

Phase 2c follow-up: InfoSpec schema extension for fetch_config knobs #142

@gregoryfoster

Description

@gregoryfoster

Context

Phase 2c dropped the `watches.fetch_config` JSONB column. The InfoSpec v1 JSON Schema doesn't yet model the knobs it replaced — `headers`, `ignore_patterns`, `viewport_width`, `viewport_height`, browser fetcher hints, etc. Watcher currently fail-fasts (or silently ignores) operator attempts to set them.

Listed under "Open follow-ups (deferred)" in docs/plans/2026-05-04-watcher-phase2c-cutover-plan.md wrap-up.

What to do

  • Audit the pre-Phase-2c `fetch_config` shape (see legacy migration files for the exact JSON). Inventory of fields that need to come back: at minimum `headers`, `ignore_patterns`, `viewport_width`, `viewport_height`, `render` (for the future Playwright fetcher).
  • Extend the InfoSpec v1 JSON Schema's `target.fetch` block to accept these. Keep field-level defaults at the consumer (per design doc decision §5: "DEFAULT_FETCH_RENDER = False" etc.).
  • Wire each knob through Watcher's fetcher / extractor:
    • `headers` → `HttpFetcher` outbound headers
    • `ignore_patterns` → already partially wired via `HtmlExtractor` ignore selectors; verify post-Phase-2c
    • `viewport_width` / `viewport_height` → screenshot capture (`src/core/screenshot.py`)
    • `render` → reserved for Add Playwright headless browser fetcher #3 (Playwright fetcher); document but don't implement consumption
  • Validation: existing single-URL specs without any `target.fetch` block must remain valid (all fields optional).
  • Tests: round-trip each new knob; verify the consumer applies it.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions