Skip to content

feat(aao): define directory inverse-lookup endpoint spec (#4823)#4828

Merged
bokelley merged 1 commit into
mainfrom
bokelley/4823-aao-directory-inverse-lookup-spec
May 20, 2026
Merged

feat(aao): define directory inverse-lookup endpoint spec (#4823)#4828
bokelley merged 1 commit into
mainfrom
bokelley/4823-aao-directory-inverse-lookup-spec

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Spec for GET /v1/agents/{agent_url}/publishers — the AAO directory's inverse-lookup endpoint. Returns the set of publishers whose adagents.json authorizes a given agent_url, with provenance, per-publisher property counts, and lifecycle status.

Scope of this PR is spec-only. Server implementation is non-trivial (new discovery_method / manager_domain columns, per-publisher selector resolution to compute counts, signing_keys_pinned lookup) and tracked separately — see Follow-ups below.

Files

  • static/schemas/source/aao/agent-publishers.json — response envelope schema. New static/schemas/source/aao/ subdirectory.
  • docs/aao/directory-api.mdx — full API reference: endpoint shape, query params, response, field reference, HTTP semantics, pagination, relationship to existing SDK primitives.
  • docs.json — adds docs/aao/directory-api to the "Using AAO" nav group.
  • .changeset/4823-aao-directory-inverse-lookup-endpoint.md.

Design decisions (from issue triage)

Status enum scoped to authorized + revoked for v1

Triage Decision 1. The richer unbound / pending states presuppose crawler state the directory doesn't have today — and pending in particular requires snapshot semantics around publishers whose files we've fetched but don't list this agent (a separate indexing pass). Ship what's queryable; leave room to add states later.

revoked is cheap to surface: when a publisher's adagents.json newly lists this publisher_domain in revoked_publisher_domains[], the directory emits one tombstone row on the next sync, then drops it.

Counts kept, scoped per-publisher, renamed

Triage Decision 2. The "12/12 reads as full auth but really means 12-of-6800-network" critique is real, but the directory's whole value-add is that it ran the resolution once so operators don't all hammer 6,800 publishers in parallel. Fix is naming + scoping, not dropping counts.

properties_authorized — properties under THIS publisher_domain the agent's selector resolves to
properties_total     — properties under THIS publisher_domain in the publisher's file

Both scoped to the row's publisher, never network-wide. recipeswithessentialoils.com returns { properties_authorized: 1, properties_total: 1 }, and the operator sees 6,800 such rows summing to the network total — correct mental model, no ambiguity.

discovery_method distinguishes four trust profiles

Per the protocol expert: direct, authoritative_location, adagents_authoritative, ads_txt_managerdomain. The managerdomain path is the weakest (the safety rule is the only positive cross-check); the directory does that bilateral check once, for everyone — main value-add over a per-operator ads.txt crawl.

signing_keys_pinned: bool per row

Cheap to surface, useful at sync time: tells the operator whether this publisher pins their JWKS, which is the signal that the agent's published keys must match. No new crawler state required.

No auth in v1

Publishers are public; the inverse map is public. Rate-limiting keyed on agent_url + IP. Identity-bound limits via RFC 9421 request signing arrive in a separate RFC if needed.

Dependency on #4825

properties_total on managed-network-shape parent files depends on #4825's inline-resolution rule (PR #4827). Strict federation at managed-network scale requires N HTTP fetches per directory refresh per publisher — the same scale problem operators have, moved one layer up. With inline resolution endorsed, the directory computes per-publisher counts from the parent file's inline properties[] filtered by matching publisher_domain.

The docs page links to #4825 as a soft dependency; the schema is independent (works with either resolution path).

Follow-ups

  • Server implementation in server/src/routes/registry-api.ts, backed by server/src/db/federated-index-db.ts. The existing getDomainsForAgent returns (agent_url, publisher_domain, authorized_for, property_ids, source, discovered_at, last_validated) — needs extension to surface discovery_method (vs the existing source evidence enum), manager_domain, per-publisher property counts (via expandPublisherPropertiesToIdentifiers), and signing_keys_pinned. Will file as a separate issue.
  • SDK companion: fetch_agent_authorizations_from_directory in adcp-client-python (#746) and TS/Go/Java mirrors.
  • ?include=properties for inline property detail — out of scope for v1; add later if operators ask.

Resolves #4823.

Test plan

🤖 Generated with Claude Code

GET /v1/agents/{agent_url}/publishers returns the set of publishers whose
adagents.json authorizes a given agent. Solves "what publishers have
authorized my agent?" at the directory layer, instead of forcing every
operator to either maintain the publisher list manually or crawl the open
web themselves.

This changeset defines the spec only — server implementation tracked
separately. The endpoint shape, response envelope, discovery_method enum,
per-publisher scoped counts, lifecycle status, and HTTP semantics are
documented.

- New schema at static/schemas/source/aao/agent-publishers.json defines
  the response envelope (agent_url, directory_indexed_at, publishers[],
  next_cursor) and the PublisherEntry shape with discovery_method,
  manager_domain, properties_authorized, properties_total,
  signing_keys_pinned, status, last_verified_at.
- New docs page at docs/aao/directory-api.mdx walks through endpoint
  shape, fields, HTTP semantics, pagination, and the recommended workflow
  for chaining against verify_agent_authorization.
- Lifecycle status enum scoped to authorized + revoked for v1.
  unbound/pending deferred — directory does not have crawler state to
  emit them honestly.
- Counts are per-publisher scoped, never network-wide — avoids the "12/12
  full auth vs 12-of-6800-network" misread.
- properties_total on managed-network-shape parent files depends on
  adcp#4825 inline resolution rule.

Resolves #4823.
Copy link
Copy Markdown
Contributor

@aao-release-bot aao-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Follow-ups noted below. Right shape for an inverse-lookup: discovery, not authorization — the publisher's adagents.json stays the trust root and the directory tells you which trust roots to hit.

Things I checked

  • Schema validity (static/schemas/source/aao/agent-publishers.json). Draft-07, additionalProperties: false at both envelope and PublisherEntry, next_cursor correctly typed ["string", "null"], properties_authorized/properties_total integer with minimum: 0, conditional if/then enforces manager_domain on the three non-direct discovery methods.
  • Schema-vs-docs parity. Required sets, enum values (4 discovery methods, 2 statuses), and optional-ness flags (signing_keys_pinned, next_cursor) all match docs/aao/directory-api.mdx:80-116 field reference tables.
  • Wire dependencies resolve against existing AdCP spec. signing_keys[] exists at static/schemas/source/core/authorized-agent-base.json:19. revoked_publisher_domains[] exists at static/schemas/source/adagents.json:93-126 with matching tombstone semantics. managerdomain safety rule anchor #safety-rules-for-this-fallback resolves at docs/governance/property/adagents.mdx:237. No hidden cross-PR dependency.
  • Soft dep on #4825 (inline resolution) is disclosed in prose (directory-api.mdx:160-162, changeset). Schema works either way; only properties_total count semantics differ.
  • Changeset categorization. Empty frontmatter is consistent with sibling AAO endpoint changesets (aao-hosted-adagents-json-route.md, aao-measurement-vendor-discovery.md). Spec-only directory addition, no SDK wire surface today.
  • No oneOf added — no audit-walker regression. Confirmed static/schemas/source/aao/agent-publishers.json has zero oneOf.

Follow-ups (non-blocking — file as issues)

  • Schema accepts {discovery_method: "direct", manager_domain: "evil.com"}. agent-publishers.json:93-109 has only the non-direct if/then; no else clause restricts manager_domain to null on the direct case. Field description and docs both say "Null or absent when direct." Add a second if/then with then: { properties: { manager_domain: { type: "null" } } }, or convert to a oneOf over the four discovery_method values. Small validator drift today, easy to close before any SDK starts consuming this.
  • Dead anchor. directory-api.mdx:107 and :162 link to /docs/governance/property/adagents#resolution-paths. No such heading in adagents.mdx. Either add ### Resolution paths to adagents.mdx as part of #4825 landing, or repoint to #authorization-patterns / the existing fan-out section.
  • discovery_method naming overlap. authoritative_location is both an enum value here and a field name in static/schemas/source/adagents.json:15. adagents_authoritative reads as near-synonym to authoritative_location for skim-readers. You flagged this in your own test plan. Worth considering authoritative_location_pointer / manager_inline / similar — non-blocking; happy to leave to a follow-up RFC if you'd rather not churn the enum before the server lands.
  • since is under-specified for incremental sync. directory-api.mdx:29 says "filter by last_verified_at ≥ since" but documents no sort order, no relationship to cursor (do they compose? does pagination preserve since?). Tighten before clients write resume logic.
  • URL canonicalization. directory-api.mdx:23 and the schema description compress the canonicalization rule in prose. There's already docs/reference/url-canonicalization.mdx — link to it rather than re-paraphrasing, otherwise server/SDK drift as that doc evolves.
  • Enum extensibility policy. status (2 values) and discovery_method (4 values) are closed enums. Brand-new endpoint, no cost today. Before adding unbound/pending/etc., decide whether enum extension is additive-only (clients tolerate unknowns) or requires a minor bump. Wire it into the changeset bar before the second wave.
  • Nav placement. docs.json puts this under "Using AAO" with the org/user how-tos. It reads like an API reference. Reasonable to graduate to its own subgroup (mirroring the Signals "Reference" subgroup at docs.json:1106-1114) once a second endpoint lands.

Minor nits (non-blocking)

  1. 404-vs-200-empty distinction (directory-api.mdx:126). Sound but commits the directory to "have I ever indexed this agent_url" state. Worth a sentence acknowledging the storage commitment, or relax to "directory MAY return 404 to signal never-indexed" so non-stateful implementations stay conforming.
  2. status query param is comma-separated and not modeled in any request schema. Fine for now; flag for the server-implementation PR to add request validation.

Safe to merge. Schema is additive, no implementation, no consumer to break — the if/then gap and the dead anchor are both one-line fixes that can chase in the server-implementation PR.

@bokelley bokelley merged commit 1340943 into main May 20, 2026
19 checks passed
@bokelley bokelley deleted the bokelley/4823-aao-directory-inverse-lookup-spec branch May 20, 2026 12:40
bokelley added a commit that referenced this pull request May 20, 2026
…4836) (#4838)

* feat(aao): implement GET /v1/agents/{agent_url}/publishers endpoint (#4836)

Server-side implementation of the AAO directory inverse-lookup endpoint
specified in #4828. Returns the publishers whose adagents.json
authorizes a given agent_url, with provenance, per-publisher property
counts, signing-key pin status, and lifecycle state.

Single SQL query joins authorization edges to the publishers overlay
for discovery_method + manager_domain + the cached adagents_json blob;
all derivations (signing_keys_pinned, status: authorized/revoked,
property counts) run in SQL via JSONB ops and correlated subqueries.

- New FederatedIndexDatabase.getPublishersForAgentDetail with cursor,
  since, and includeRevoked options
- New router.get("/v1/agents/:encodedUrl/publishers") with ETag /
  If-None-Match, 200 vs 404 disambiguation, opaque base64url cursors
- Per-publisher count scoping (never network-wide) avoids the
  "12/12 vs 12-of-6800" misread on managed-network parent files
- Integration tests cover: empty, direct discovery, managerdomain,
  signing_keys_pinned true/false, revoked status, JSONB-side
  canonicalization, cursor pagination, since filter

Adagents_authoritative discovery method deferred — crawler doesn't
emit it yet. Out of scope for v1.

Resolves #4836.

* fix(aao): address code review on #4838 — SQL semantics, no-fallback violations, OpenAPI registration, HTTP tests

Findings from code-reviewer-deep + ad-tech-protocol-expert on #4838:

- SQL: replace UNION + DISTINCT ON (publisher_domain, source) with
  UNION ALL + src_priority pattern matching getDomainsForAgent /
  getAgentsForDomain. Tiebreaker becomes (src_priority, authz_last_validated DESC)
  so legacy wins on collision and the surviving timestamp is deterministic.
- SQL: extract JSONB walks into an `enriched` CTE so signing_keys_pinned
  and is_revoked are computed once per row, not three times (SELECT,
  status CASE, WHERE NOT EXISTS).
- Route: decodeURIComponent wrapped — malformed percent-encoding now
  returns 400 instead of letting the outer catch 500.
- Route: drop `discovery_method ?? 'direct'` fallback. Only emit 'direct'
  when manager_domain IS NULL; otherwise skip the row (don't invent the
  strongest trust profile for ambiguous rows).
- Route: drop `last_verified_at ?? new Date()` fallback. Skip rows with
  no freshness anchor instead of lying.
- Route: drop `directory_indexed_at ?? new Date()` fallback. Return null
  on empty pages — schema and docs page updated to allow + document.
- Route: Cache-Control set before the 304 short-circuit so caches get
  freshness signal on 304 too.
- Route: limit cap consolidated to single source (route only); DB layer
  passes through.
- OpenAPI: register the route per openapi-coverage.test.ts (CI failure).
- Tests: new HTTP-level integration test covering 404, 200+empty,
  malformed percent-encoding, invalid status / since / cursor, ETag/304,
  cursor pagination, revoked filtering.

* chore(openapi): regenerate static/openapi/registry.yaml after registerPath addition

Picks up the new /api/v1/agents/{encodedUrl}/publishers route added in
the prior commit. CI's "OpenAPI freshness" check requires the static
yaml to match the Zod-source-of-truth.
bokelley added a commit that referenced this pull request May 20, 2026
…r-child rows (#4840)

* feat(crawler): fan publisher_properties[].publisher_domains[] into per-child rows

The directory endpoint (#4838) returns one cafemedia.com row with zero
counts for any agent listed in a publisher_properties selector, because
the crawler only writes the manager-host edge to
agent_publisher_authorizations. The 6,800 represented publishers stay
invisible. This PR closes the gap.

- Migration 486 widens publishers.discovery_method CHECK to include
  'adagents_authoritative' (the spec value from #4828 for inline-
  resolution-discovered publishers).
- DiscoveryMethod TS union widened to four values across adagents-manager
  and federated-index-db.
- PublisherDatabase.recordChildPublisherFromManager upserts the child
  publishers row with source_type='community', discovery_method=
  'adagents_authoritative', manager_domain=<host>, no blob. If the child
  already has its own adagents_json cached (was independently crawled),
  the upsert preserves the direct row — direct crawl wins over
  manager-file attribution.
- CrawlerService.fanOutPublisherPropertiesAuthorizations walks each
  authorized_agents[] entry with authorization_type='publisher_properties',
  resolves the singular publisher_domain + compact publisher_domains[]
  forms, and writes per-child rows in both publishers and
  agent_publisher_authorizations. Called from both crawler loops so it
  fires regardless of which discovery path landed the manager file.
  Idempotent.
- Integration tests cover: child row provenance, no self-attribution,
  direct-wins-over-attribution upsert behavior, domain canonicalization,
  singular vs compact selector forms, non-publisher_properties skip, and
  idempotency.

After this lands, the directory endpoint returns the 6,800 expected
rows for interchange.io against cafemedia.com.

* fix(aao): expert-review fixes on #4840 — manager-side revocation, XOR enforcement, by_id guard, transactional migration

Findings from code-reviewer-deep + ad-tech-protocol-expert on PR #4840:

- **Manager-side revocation gap** (critical, protocol-expert). The directory
  endpoint's is_revoked CTE walked the CHILD's adagents_json blob for
  revoked_publisher_domains[]. Fan-out children have adagents_json IS NULL,
  so manager-side revocation of a managed-network child was silently
  ignored. Adds a LEFT JOIN to the manager's publishers row and ORs the
  child-side check with the manager-side check. Spec rule "Revocation
  under inline resolution" is now enforced for the cafemedia case.

- **by_id + publisher_domains[] guard** (important, protocol-expert).
  Schema rejects this combo (property IDs are publisher-scoped — fanning
  a fixed ID set across N publishers silently cross-authorizes whichever
  inventory shares an ID). Hand-rolled validator doesn't enforce it; the
  fan-out now refuses defensively. Singular publisher_domain on by_id
  stays honored (schema-conformant).

- **XOR enforcement on selectors** (important, code-reviewer). Schema
  requires exactly one of publisher_domain / publisher_domains[]. Both-
  populated or neither-populated selectors are now skipped in the fan-out
  helper, mirroring the catalog projection's refuse-both invariant.

- **Migration 486 transaction** (important, code-reviewer). DROP and ADD
  CONSTRAINT now wrapped in BEGIN/COMMIT so they share one lock window.
  Closes the writer race in the brief gap between the two ALTERs.

- **Migration 486 doc comment** (nit, code-reviewer). The "backfilled to
  'direct' for previously-validated rows" sentence was migration 470's
  job, not 486's. Replaced with the trust profile description.

Tests added:
  - manager-side revocation via the directory endpoint (the SQL fix)
  - by_id + publisher_domains[] refusal (no fan-out for malformed selector)
  - by_id + singular publisher_domain (schema-conformant, fan-out works)
  - XOR refusal: both publisher_domain and publisher_domains[] populated
  - XOR refusal: neither populated

Deferred to follow-up (code-reviewer noted, agreed):
  - Bulk INSERT … VALUES batching for 6,800-child fan-outs (perf, not
    correctness)
  - Pure-function extraction of fanOutPublisherPropertiesAuthorizations
    (test ergonomics; current prototype-binding works)
  - upsertAdagentsCache COALESCE-defense for null discoveryMethod
    (defensive vs intentional-overwrite trade-off; current call sites
    are correct)
bokelley added a commit that referenced this pull request May 20, 2026
PR #4828 left the encoding of multi-value status= ambiguous between
comma-separated (?status=authorized,revoked) and repeated-key
(?status=authorized&status=revoked). Two interpretations silently
mis-filter against each other.

Repeated-key wins: URLSearchParams.append, OpenAPI explode:true default,
and adcp-client#1892 (shipped) already produce it. Comma-separated input
is rejected at the directory with 400 rather than silently coerced.

Resolves #4855.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(aao): inverse-lookup endpoint — given agent_url, return authorizing publishers

1 participant