Skip to content

Agent ingest so I can delegate new items.#10

Merged
darron merged 3 commits into
mainfrom
agent-ingest
May 5, 2026
Merged

Agent ingest so I can delegate new items.#10
darron merged 3 commits into
mainfrom
agent-ingest

Conversation

@darron
Copy link
Copy Markdown
Owner

@darron darron commented May 5, 2026

This pull request introduces comprehensive support for agent-assisted story ingestion, including new bearer-token protected APIs, supporting documentation, and staging/testing workflows. It adds a new approval flow where remote agents propose story ingestions, the Worker analyzes and proposes actions, and agents approve or reject before any database writes. The changes also include updates to documentation, changelogs, and new Taskfile targets to facilitate staging operations and smoke testing.

Agent-assisted story ingestion and API support:

  • Added agent-assisted story ingestion endpoints under /admin/api/ingest/proposals, including proposal creation, approval, rejection, and record search, protected by bearer tokens.
  • Implemented a Worker proposal and agent approval flow, including deduplication, audit checks, and support for new incident creation when no match exists.
  • Added event-date-aware matching to improve candidate record selection and reduce false matches across years.
  • Created a new story_ingest_proposals audit table and enforced duplicate URL checks, confidence thresholds, and review fallback for disagreements.

Documentation and onboarding:

  • Added an AGENTS.md file with detailed instructions for coding agents, repo orientation, safety, ingestion flow, database migrations, mapping, testing, and changelog practices.
  • Updated README.md and docs/ADMIN_SETUP.md to document agent ingestion APIs, usage, and links to new ingestion documentation. [1] [2] [3] [4] [5]

Staging, testing, and operational tooling:

  • Added a Taskfile.yml with staging-only tasks for deploy, migration, secret rotation, ingest proposal creation/approval/rejection, and record deletion, enabling easier smoke testing and operational workflows.
  • Documented and implemented procedures for safe secret handling, staging token rotation, and testing requirements before handoff. [1] [2]

Changelog and historical anchors:

  • Updated docs/CHANGELOG.md with detailed entries for agent story ingestion, ingestion hardening, and previous major features.

Security and ingestion hardening:

  • Enforced bounded public-source fetching, canonical URL persistence, batch size limits, and locked approvals to prevent unauthorized metadata changes.

These changes collectively establish a robust, auditable, and secure workflow for agent-assisted ingestion and lay the groundwork for future enhancements.

darron added 3 commits May 4, 2026 14:02
Add a bearer-token protected ingestion API under /admin/api/ingest for
remote agents to submit source URLs, review Worker-generated proposals,
and approve or reject story attachments. Store each proposal in the new
story_ingest_proposals audit table, including extracted facts, confidence
scores, decisions, duplicate detection, and applied story IDs.

The ingest flow now gates writes through Worker and agent agreement,
confidence thresholds, URL dedupe, and final validation before mutating
news_stories or creating records. It also adds read-only structured record
search so agents can verify uncertain matches without using generic story
CRUD endpoints.

Share public URL safety checks between admin story CRUD and ingestion,
add scoped ingest bearer-token auth, wire the new routes into the admin
API, and queue record summary refreshes after applied ingest writes. Add
Taskfile staging helpers, remote-agent skill instructions, setup docs,
security notes, changelog entries, and focused node:test coverage for
extraction, event-date matching, guardrails, and search ranking.

Behavior changes:
- /admin/api/ingest/* can authenticate with INGEST_API_TOKEN or
  INGEST_API_TOKENS instead of an admin browser session.
- Existing admin story URL validation now uses the shared URL safety
  helper.
- New schema-dependent ingest routes return 412 until migration
  0004_ingest_proposals.sql is applied.

Risks:
- Source extraction and AI fact matching remain probabilistic, so low
  confidence, duplicate, and disagreement cases intentionally fall back to
  needs_review.
- Create-record approval can add canonical records, so launch thresholds
  and staging smoke tests should be reviewed before broad production use.

Follow-ups:
- Add a reviewed record_patch proposal flow for later canonical metadata
  updates discovered through newly attached stories.
- Wire notification or review tooling for pending needs_review proposals.
Add canonical story URL persistence with D1 uniqueness checks for saved
stories and active ingest proposals, plus migrations to backfill existing
URLs and normalize common variants. This moves duplicate protection into
the database so concurrent or repeated agent submissions cannot silently
create duplicate story rows.

Route ingest and summary source fetches through shared public-fetch safety:
normalized public HTTP(S) URLs, manual redirect validation, DNS checks,
timeouts, and response-size caps. Disable third-party reader fallbacks by
default to avoid disclosing source URLs unless explicitly configured.

Tighten the ingest approval path by reducing agent batches to 5 URLs,
adding a KV-backed per-minute rate limit, rejecting bearer-token record
field overrides for create-record approvals, requiring valid Canadian
province codes, and parsing model JSON only when it is strict object JSON
or a complete fenced object.

Also add Sentry capture points with default PII disabled, add noopener
attributes for admin story links, make staging token rotation atomic with
0600 permissions, and update ingestion/security/setup docs and agent
instructions.

Behavior changes:
- Duplicate story URLs now return conflicts or duplicate proposal state
  based on canonical URL matching.
- Agents can no longer correct create-record fields during approval.
- Third-party extraction fallbacks are off unless enabled by env vars.
- Ingest batch size is now 5 and may return 429 when the KV limiter is
  configured.

Risks:
- Existing environments must apply migrations 0005 and 0006 before using
  schema-dependent ingest paths.
- DNS-over-HTTPS checks add latency and may fail closed for sources with
  broken DNS responses.
- Disabling fallback readers may reduce extraction success for difficult
  publisher pages until a daemon or explicit fallback is configured.

Follow-ups:
- Add the reviewed record_patch proposal flow for later metadata updates.
- Add an edge/WAF rate-limit rule for /admin/api/ingest/* before broad
  remote-agent use.
Return a 409 when admin story updates hit the canonical URL uniqueness
constraint, and skip duplicate story URLs during bulk record updates instead of
failing the whole record save. This keeps the admin API compatible with the new
canonical URL dedupe behavior while preserving successful non-duplicate story
edits in the same request.

Add a pre-public release review for the agent-assisted ingestion work, covering
security, correctness, operational risks, and recommended follow-ups before the
repo is published.

Behavior change: duplicate story URL updates now surface as a client conflict
or are ignored in bulk record saves, rather than bubbling up as generic server
errors. Risk: silently skipping duplicates in the bulk path may hide operator
input mistakes; a future admin UI should make skipped stories visible.
@darron darron self-assigned this May 5, 2026
@darron darron added the enhancement New feature or request label May 5, 2026
@darron darron merged commit dd485ff into main May 5, 2026
@darron darron deleted the agent-ingest branch May 5, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant