Agent ingest so I can delegate new items.#10
Merged
Merged
Conversation
Add a bearer-token protected ingestion API under /admin/api/ingest for remote agents to submit source URLs, review Worker-generated proposals, and approve or reject story attachments. Store each proposal in the new story_ingest_proposals audit table, including extracted facts, confidence scores, decisions, duplicate detection, and applied story IDs. The ingest flow now gates writes through Worker and agent agreement, confidence thresholds, URL dedupe, and final validation before mutating news_stories or creating records. It also adds read-only structured record search so agents can verify uncertain matches without using generic story CRUD endpoints. Share public URL safety checks between admin story CRUD and ingestion, add scoped ingest bearer-token auth, wire the new routes into the admin API, and queue record summary refreshes after applied ingest writes. Add Taskfile staging helpers, remote-agent skill instructions, setup docs, security notes, changelog entries, and focused node:test coverage for extraction, event-date matching, guardrails, and search ranking. Behavior changes: - /admin/api/ingest/* can authenticate with INGEST_API_TOKEN or INGEST_API_TOKENS instead of an admin browser session. - Existing admin story URL validation now uses the shared URL safety helper. - New schema-dependent ingest routes return 412 until migration 0004_ingest_proposals.sql is applied. Risks: - Source extraction and AI fact matching remain probabilistic, so low confidence, duplicate, and disagreement cases intentionally fall back to needs_review. - Create-record approval can add canonical records, so launch thresholds and staging smoke tests should be reviewed before broad production use. Follow-ups: - Add a reviewed record_patch proposal flow for later canonical metadata updates discovered through newly attached stories. - Wire notification or review tooling for pending needs_review proposals.
Add canonical story URL persistence with D1 uniqueness checks for saved stories and active ingest proposals, plus migrations to backfill existing URLs and normalize common variants. This moves duplicate protection into the database so concurrent or repeated agent submissions cannot silently create duplicate story rows. Route ingest and summary source fetches through shared public-fetch safety: normalized public HTTP(S) URLs, manual redirect validation, DNS checks, timeouts, and response-size caps. Disable third-party reader fallbacks by default to avoid disclosing source URLs unless explicitly configured. Tighten the ingest approval path by reducing agent batches to 5 URLs, adding a KV-backed per-minute rate limit, rejecting bearer-token record field overrides for create-record approvals, requiring valid Canadian province codes, and parsing model JSON only when it is strict object JSON or a complete fenced object. Also add Sentry capture points with default PII disabled, add noopener attributes for admin story links, make staging token rotation atomic with 0600 permissions, and update ingestion/security/setup docs and agent instructions. Behavior changes: - Duplicate story URLs now return conflicts or duplicate proposal state based on canonical URL matching. - Agents can no longer correct create-record fields during approval. - Third-party extraction fallbacks are off unless enabled by env vars. - Ingest batch size is now 5 and may return 429 when the KV limiter is configured. Risks: - Existing environments must apply migrations 0005 and 0006 before using schema-dependent ingest paths. - DNS-over-HTTPS checks add latency and may fail closed for sources with broken DNS responses. - Disabling fallback readers may reduce extraction success for difficult publisher pages until a daemon or explicit fallback is configured. Follow-ups: - Add the reviewed record_patch proposal flow for later metadata updates. - Add an edge/WAF rate-limit rule for /admin/api/ingest/* before broad remote-agent use.
Return a 409 when admin story updates hit the canonical URL uniqueness constraint, and skip duplicate story URLs during bulk record updates instead of failing the whole record save. This keeps the admin API compatible with the new canonical URL dedupe behavior while preserving successful non-duplicate story edits in the same request. Add a pre-public release review for the agent-assisted ingestion work, covering security, correctness, operational risks, and recommended follow-ups before the repo is published. Behavior change: duplicate story URL updates now surface as a client conflict or are ignored in bulk record saves, rather than bubbling up as generic server errors. Risk: silently skipping duplicates in the bulk path may hide operator input mistakes; a future admin UI should make skipped stories visible.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces comprehensive support for agent-assisted story ingestion, including new bearer-token protected APIs, supporting documentation, and staging/testing workflows. It adds a new approval flow where remote agents propose story ingestions, the Worker analyzes and proposes actions, and agents approve or reject before any database writes. The changes also include updates to documentation, changelogs, and new Taskfile targets to facilitate staging operations and smoke testing.
Agent-assisted story ingestion and API support:
/admin/api/ingest/proposals, including proposal creation, approval, rejection, and record search, protected by bearer tokens.story_ingest_proposalsaudit table and enforced duplicate URL checks, confidence thresholds, and review fallback for disagreements.Documentation and onboarding:
AGENTS.mdfile with detailed instructions for coding agents, repo orientation, safety, ingestion flow, database migrations, mapping, testing, and changelog practices.README.mdanddocs/ADMIN_SETUP.mdto document agent ingestion APIs, usage, and links to new ingestion documentation. [1] [2] [3] [4] [5]Staging, testing, and operational tooling:
Taskfile.ymlwith staging-only tasks for deploy, migration, secret rotation, ingest proposal creation/approval/rejection, and record deletion, enabling easier smoke testing and operational workflows.Changelog and historical anchors:
docs/CHANGELOG.mdwith detailed entries for agent story ingestion, ingestion hardening, and previous major features.Security and ingestion hardening:
These changes collectively establish a robust, auditable, and secure workflow for agent-assisted ingestion and lay the groundwork for future enhancements.