Agent ingest so I can delegate new items. by darron · Pull Request #10 · darron/ff-workers

darron · 2026-05-05T00:37:09Z

This pull request introduces comprehensive support for agent-assisted story ingestion, including new bearer-token protected APIs, supporting documentation, and staging/testing workflows. It adds a new approval flow where remote agents propose story ingestions, the Worker analyzes and proposes actions, and agents approve or reject before any database writes. The changes also include updates to documentation, changelogs, and new Taskfile targets to facilitate staging operations and smoke testing.

Agent-assisted story ingestion and API support:

Added agent-assisted story ingestion endpoints under /admin/api/ingest/proposals, including proposal creation, approval, rejection, and record search, protected by bearer tokens.
Implemented a Worker proposal and agent approval flow, including deduplication, audit checks, and support for new incident creation when no match exists.
Added event-date-aware matching to improve candidate record selection and reduce false matches across years.
Created a new story_ingest_proposals audit table and enforced duplicate URL checks, confidence thresholds, and review fallback for disagreements.

Documentation and onboarding:

Added an AGENTS.md file with detailed instructions for coding agents, repo orientation, safety, ingestion flow, database migrations, mapping, testing, and changelog practices.
Updated README.md and docs/ADMIN_SETUP.md to document agent ingestion APIs, usage, and links to new ingestion documentation. [1] [2] [3] [4] [5]

Staging, testing, and operational tooling:

Added a Taskfile.yml with staging-only tasks for deploy, migration, secret rotation, ingest proposal creation/approval/rejection, and record deletion, enabling easier smoke testing and operational workflows.
Documented and implemented procedures for safe secret handling, staging token rotation, and testing requirements before handoff. [1] [2]

Changelog and historical anchors:

Updated docs/CHANGELOG.md with detailed entries for agent story ingestion, ingestion hardening, and previous major features.

Security and ingestion hardening:

Enforced bounded public-source fetching, canonical URL persistence, batch size limits, and locked approvals to prevent unauthorized metadata changes.

These changes collectively establish a robust, auditable, and secure workflow for agent-assisted ingestion and lay the groundwork for future enhancements.

Add a bearer-token protected ingestion API under /admin/api/ingest for remote agents to submit source URLs, review Worker-generated proposals, and approve or reject story attachments. Store each proposal in the new story_ingest_proposals audit table, including extracted facts, confidence scores, decisions, duplicate detection, and applied story IDs. The ingest flow now gates writes through Worker and agent agreement, confidence thresholds, URL dedupe, and final validation before mutating news_stories or creating records. It also adds read-only structured record search so agents can verify uncertain matches without using generic story CRUD endpoints. Share public URL safety checks between admin story CRUD and ingestion, add scoped ingest bearer-token auth, wire the new routes into the admin API, and queue record summary refreshes after applied ingest writes. Add Taskfile staging helpers, remote-agent skill instructions, setup docs, security notes, changelog entries, and focused node:test coverage for extraction, event-date matching, guardrails, and search ranking. Behavior changes: - /admin/api/ingest/* can authenticate with INGEST_API_TOKEN or INGEST_API_TOKENS instead of an admin browser session. - Existing admin story URL validation now uses the shared URL safety helper. - New schema-dependent ingest routes return 412 until migration 0004_ingest_proposals.sql is applied. Risks: - Source extraction and AI fact matching remain probabilistic, so low confidence, duplicate, and disagreement cases intentionally fall back to needs_review. - Create-record approval can add canonical records, so launch thresholds and staging smoke tests should be reviewed before broad production use. Follow-ups: - Add a reviewed record_patch proposal flow for later canonical metadata updates discovered through newly attached stories. - Wire notification or review tooling for pending needs_review proposals.

Add canonical story URL persistence with D1 uniqueness checks for saved stories and active ingest proposals, plus migrations to backfill existing URLs and normalize common variants. This moves duplicate protection into the database so concurrent or repeated agent submissions cannot silently create duplicate story rows. Route ingest and summary source fetches through shared public-fetch safety: normalized public HTTP(S) URLs, manual redirect validation, DNS checks, timeouts, and response-size caps. Disable third-party reader fallbacks by default to avoid disclosing source URLs unless explicitly configured. Tighten the ingest approval path by reducing agent batches to 5 URLs, adding a KV-backed per-minute rate limit, rejecting bearer-token record field overrides for create-record approvals, requiring valid Canadian province codes, and parsing model JSON only when it is strict object JSON or a complete fenced object. Also add Sentry capture points with default PII disabled, add noopener attributes for admin story links, make staging token rotation atomic with 0600 permissions, and update ingestion/security/setup docs and agent instructions. Behavior changes: - Duplicate story URLs now return conflicts or duplicate proposal state based on canonical URL matching. - Agents can no longer correct create-record fields during approval. - Third-party extraction fallbacks are off unless enabled by env vars. - Ingest batch size is now 5 and may return 429 when the KV limiter is configured. Risks: - Existing environments must apply migrations 0005 and 0006 before using schema-dependent ingest paths. - DNS-over-HTTPS checks add latency and may fail closed for sources with broken DNS responses. - Disabling fallback readers may reduce extraction success for difficult publisher pages until a daemon or explicit fallback is configured. Follow-ups: - Add the reviewed record_patch proposal flow for later metadata updates. - Add an edge/WAF rate-limit rule for /admin/api/ingest/* before broad remote-agent use.

Return a 409 when admin story updates hit the canonical URL uniqueness constraint, and skip duplicate story URLs during bulk record updates instead of failing the whole record save. This keeps the admin API compatible with the new canonical URL dedupe behavior while preserving successful non-duplicate story edits in the same request. Add a pre-public release review for the agent-assisted ingestion work, covering security, correctness, operational risks, and recommended follow-ups before the repo is published. Behavior change: duplicate story URL updates now surface as a client conflict or are ignored in bulk record saves, rather than bubbling up as generic server errors. Risk: silently skipping duplicates in the bulk path may hide operator input mistakes; a future admin UI should make skipped stories visible.

darron added 3 commits May 4, 2026 14:02

darron self-assigned this May 5, 2026

darron added the enhancement New feature or request label May 5, 2026

darron merged commit dd485ff into main May 5, 2026

darron deleted the agent-ingest branch May 5, 2026 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent ingest so I can delegate new items.#10

Agent ingest so I can delegate new items.#10
darron merged 3 commits into
mainfrom
agent-ingest

darron commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

darron commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant