feat(aao): define directory inverse-lookup endpoint spec (#4823)#4828
Merged
Conversation
GET /v1/agents/{agent_url}/publishers returns the set of publishers whose
adagents.json authorizes a given agent. Solves "what publishers have
authorized my agent?" at the directory layer, instead of forcing every
operator to either maintain the publisher list manually or crawl the open
web themselves.
This changeset defines the spec only — server implementation tracked
separately. The endpoint shape, response envelope, discovery_method enum,
per-publisher scoped counts, lifecycle status, and HTTP semantics are
documented.
- New schema at static/schemas/source/aao/agent-publishers.json defines
the response envelope (agent_url, directory_indexed_at, publishers[],
next_cursor) and the PublisherEntry shape with discovery_method,
manager_domain, properties_authorized, properties_total,
signing_keys_pinned, status, last_verified_at.
- New docs page at docs/aao/directory-api.mdx walks through endpoint
shape, fields, HTTP semantics, pagination, and the recommended workflow
for chaining against verify_agent_authorization.
- Lifecycle status enum scoped to authorized + revoked for v1.
unbound/pending deferred — directory does not have crawler state to
emit them honestly.
- Counts are per-publisher scoped, never network-wide — avoids the "12/12
full auth vs 12-of-6800-network" misread.
- properties_total on managed-network-shape parent files depends on
adcp#4825 inline resolution rule.
Resolves #4823.
Contributor
There was a problem hiding this comment.
LGTM. Follow-ups noted below. Right shape for an inverse-lookup: discovery, not authorization — the publisher's adagents.json stays the trust root and the directory tells you which trust roots to hit.
Things I checked
- Schema validity (
static/schemas/source/aao/agent-publishers.json). Draft-07,additionalProperties: falseat both envelope andPublisherEntry,next_cursorcorrectly typed["string", "null"],properties_authorized/properties_totalintegerwithminimum: 0, conditionalif/thenenforcesmanager_domainon the three non-direct discovery methods. - Schema-vs-docs parity. Required sets, enum values (4 discovery methods, 2 statuses), and optional-ness flags (
signing_keys_pinned,next_cursor) all matchdocs/aao/directory-api.mdx:80-116field reference tables. - Wire dependencies resolve against existing AdCP spec.
signing_keys[]exists atstatic/schemas/source/core/authorized-agent-base.json:19.revoked_publisher_domains[]exists atstatic/schemas/source/adagents.json:93-126with matching tombstone semantics.managerdomainsafety rule anchor#safety-rules-for-this-fallbackresolves atdocs/governance/property/adagents.mdx:237. No hidden cross-PR dependency. - Soft dep on #4825 (inline resolution) is disclosed in prose (
directory-api.mdx:160-162, changeset). Schema works either way; onlyproperties_totalcount semantics differ. - Changeset categorization. Empty frontmatter is consistent with sibling AAO endpoint changesets (
aao-hosted-adagents-json-route.md,aao-measurement-vendor-discovery.md). Spec-only directory addition, no SDK wire surface today. - No oneOf added — no audit-walker regression. Confirmed
static/schemas/source/aao/agent-publishers.jsonhas zerooneOf.
Follow-ups (non-blocking — file as issues)
- Schema accepts
{discovery_method: "direct", manager_domain: "evil.com"}.agent-publishers.json:93-109has only the non-directif/then; noelseclause restrictsmanager_domaintonullon the direct case. Field description and docs both say "Null or absent whendirect." Add a secondif/thenwiththen: { properties: { manager_domain: { type: "null" } } }, or convert to aoneOfover the four discovery_method values. Small validator drift today, easy to close before any SDK starts consuming this. - Dead anchor.
directory-api.mdx:107and:162link to/docs/governance/property/adagents#resolution-paths. No such heading inadagents.mdx. Either add### Resolution pathsto adagents.mdx as part of #4825 landing, or repoint to#authorization-patterns/ the existing fan-out section. discovery_methodnaming overlap.authoritative_locationis both an enum value here and a field name instatic/schemas/source/adagents.json:15.adagents_authoritativereads as near-synonym toauthoritative_locationfor skim-readers. You flagged this in your own test plan. Worth consideringauthoritative_location_pointer/manager_inline/ similar — non-blocking; happy to leave to a follow-up RFC if you'd rather not churn the enum before the server lands.sinceis under-specified for incremental sync.directory-api.mdx:29says "filter bylast_verified_at ≥ since" but documents no sort order, no relationship tocursor(do they compose? does pagination preservesince?). Tighten before clients write resume logic.- URL canonicalization.
directory-api.mdx:23and the schema description compress the canonicalization rule in prose. There's alreadydocs/reference/url-canonicalization.mdx— link to it rather than re-paraphrasing, otherwise server/SDK drift as that doc evolves. - Enum extensibility policy.
status(2 values) anddiscovery_method(4 values) are closed enums. Brand-new endpoint, no cost today. Before addingunbound/pending/etc., decide whether enum extension is additive-only (clients tolerate unknowns) or requires aminorbump. Wire it into the changeset bar before the second wave. - Nav placement.
docs.jsonputs this under "Using AAO" with the org/user how-tos. It reads like an API reference. Reasonable to graduate to its own subgroup (mirroring the Signals "Reference" subgroup atdocs.json:1106-1114) once a second endpoint lands.
Minor nits (non-blocking)
- 404-vs-200-empty distinction (
directory-api.mdx:126). Sound but commits the directory to "have I ever indexed this agent_url" state. Worth a sentence acknowledging the storage commitment, or relax to "directory MAY return 404 to signal never-indexed" so non-stateful implementations stay conforming. statusquery param is comma-separated and not modeled in any request schema. Fine for now; flag for the server-implementation PR to add request validation.
Safe to merge. Schema is additive, no implementation, no consumer to break — the if/then gap and the dead anchor are both one-line fixes that can chase in the server-implementation PR.
This was referenced May 20, 2026
4 tasks
bokelley
added a commit
that referenced
this pull request
May 20, 2026
…4836) (#4838) * feat(aao): implement GET /v1/agents/{agent_url}/publishers endpoint (#4836) Server-side implementation of the AAO directory inverse-lookup endpoint specified in #4828. Returns the publishers whose adagents.json authorizes a given agent_url, with provenance, per-publisher property counts, signing-key pin status, and lifecycle state. Single SQL query joins authorization edges to the publishers overlay for discovery_method + manager_domain + the cached adagents_json blob; all derivations (signing_keys_pinned, status: authorized/revoked, property counts) run in SQL via JSONB ops and correlated subqueries. - New FederatedIndexDatabase.getPublishersForAgentDetail with cursor, since, and includeRevoked options - New router.get("/v1/agents/:encodedUrl/publishers") with ETag / If-None-Match, 200 vs 404 disambiguation, opaque base64url cursors - Per-publisher count scoping (never network-wide) avoids the "12/12 vs 12-of-6800" misread on managed-network parent files - Integration tests cover: empty, direct discovery, managerdomain, signing_keys_pinned true/false, revoked status, JSONB-side canonicalization, cursor pagination, since filter Adagents_authoritative discovery method deferred — crawler doesn't emit it yet. Out of scope for v1. Resolves #4836. * fix(aao): address code review on #4838 — SQL semantics, no-fallback violations, OpenAPI registration, HTTP tests Findings from code-reviewer-deep + ad-tech-protocol-expert on #4838: - SQL: replace UNION + DISTINCT ON (publisher_domain, source) with UNION ALL + src_priority pattern matching getDomainsForAgent / getAgentsForDomain. Tiebreaker becomes (src_priority, authz_last_validated DESC) so legacy wins on collision and the surviving timestamp is deterministic. - SQL: extract JSONB walks into an `enriched` CTE so signing_keys_pinned and is_revoked are computed once per row, not three times (SELECT, status CASE, WHERE NOT EXISTS). - Route: decodeURIComponent wrapped — malformed percent-encoding now returns 400 instead of letting the outer catch 500. - Route: drop `discovery_method ?? 'direct'` fallback. Only emit 'direct' when manager_domain IS NULL; otherwise skip the row (don't invent the strongest trust profile for ambiguous rows). - Route: drop `last_verified_at ?? new Date()` fallback. Skip rows with no freshness anchor instead of lying. - Route: drop `directory_indexed_at ?? new Date()` fallback. Return null on empty pages — schema and docs page updated to allow + document. - Route: Cache-Control set before the 304 short-circuit so caches get freshness signal on 304 too. - Route: limit cap consolidated to single source (route only); DB layer passes through. - OpenAPI: register the route per openapi-coverage.test.ts (CI failure). - Tests: new HTTP-level integration test covering 404, 200+empty, malformed percent-encoding, invalid status / since / cursor, ETag/304, cursor pagination, revoked filtering. * chore(openapi): regenerate static/openapi/registry.yaml after registerPath addition Picks up the new /api/v1/agents/{encodedUrl}/publishers route added in the prior commit. CI's "OpenAPI freshness" check requires the static yaml to match the Zod-source-of-truth.
4 tasks
bokelley
added a commit
that referenced
this pull request
May 20, 2026
…r-child rows (#4840) * feat(crawler): fan publisher_properties[].publisher_domains[] into per-child rows The directory endpoint (#4838) returns one cafemedia.com row with zero counts for any agent listed in a publisher_properties selector, because the crawler only writes the manager-host edge to agent_publisher_authorizations. The 6,800 represented publishers stay invisible. This PR closes the gap. - Migration 486 widens publishers.discovery_method CHECK to include 'adagents_authoritative' (the spec value from #4828 for inline- resolution-discovered publishers). - DiscoveryMethod TS union widened to four values across adagents-manager and federated-index-db. - PublisherDatabase.recordChildPublisherFromManager upserts the child publishers row with source_type='community', discovery_method= 'adagents_authoritative', manager_domain=<host>, no blob. If the child already has its own adagents_json cached (was independently crawled), the upsert preserves the direct row — direct crawl wins over manager-file attribution. - CrawlerService.fanOutPublisherPropertiesAuthorizations walks each authorized_agents[] entry with authorization_type='publisher_properties', resolves the singular publisher_domain + compact publisher_domains[] forms, and writes per-child rows in both publishers and agent_publisher_authorizations. Called from both crawler loops so it fires regardless of which discovery path landed the manager file. Idempotent. - Integration tests cover: child row provenance, no self-attribution, direct-wins-over-attribution upsert behavior, domain canonicalization, singular vs compact selector forms, non-publisher_properties skip, and idempotency. After this lands, the directory endpoint returns the 6,800 expected rows for interchange.io against cafemedia.com. * fix(aao): expert-review fixes on #4840 — manager-side revocation, XOR enforcement, by_id guard, transactional migration Findings from code-reviewer-deep + ad-tech-protocol-expert on PR #4840: - **Manager-side revocation gap** (critical, protocol-expert). The directory endpoint's is_revoked CTE walked the CHILD's adagents_json blob for revoked_publisher_domains[]. Fan-out children have adagents_json IS NULL, so manager-side revocation of a managed-network child was silently ignored. Adds a LEFT JOIN to the manager's publishers row and ORs the child-side check with the manager-side check. Spec rule "Revocation under inline resolution" is now enforced for the cafemedia case. - **by_id + publisher_domains[] guard** (important, protocol-expert). Schema rejects this combo (property IDs are publisher-scoped — fanning a fixed ID set across N publishers silently cross-authorizes whichever inventory shares an ID). Hand-rolled validator doesn't enforce it; the fan-out now refuses defensively. Singular publisher_domain on by_id stays honored (schema-conformant). - **XOR enforcement on selectors** (important, code-reviewer). Schema requires exactly one of publisher_domain / publisher_domains[]. Both- populated or neither-populated selectors are now skipped in the fan-out helper, mirroring the catalog projection's refuse-both invariant. - **Migration 486 transaction** (important, code-reviewer). DROP and ADD CONSTRAINT now wrapped in BEGIN/COMMIT so they share one lock window. Closes the writer race in the brief gap between the two ALTERs. - **Migration 486 doc comment** (nit, code-reviewer). The "backfilled to 'direct' for previously-validated rows" sentence was migration 470's job, not 486's. Replaced with the trust profile description. Tests added: - manager-side revocation via the directory endpoint (the SQL fix) - by_id + publisher_domains[] refusal (no fan-out for malformed selector) - by_id + singular publisher_domain (schema-conformant, fan-out works) - XOR refusal: both publisher_domain and publisher_domains[] populated - XOR refusal: neither populated Deferred to follow-up (code-reviewer noted, agreed): - Bulk INSERT … VALUES batching for 6,800-child fan-outs (perf, not correctness) - Pure-function extraction of fanOutPublisherPropertiesAuthorizations (test ergonomics; current prototype-binding works) - upsertAdagentsCache COALESCE-defense for null discoveryMethod (defensive vs intentional-overwrite trade-off; current call sites are correct)
This was referenced May 20, 2026
bokelley
added a commit
that referenced
this pull request
May 20, 2026
PR #4828 left the encoding of multi-value status= ambiguous between comma-separated (?status=authorized,revoked) and repeated-key (?status=authorized&status=revoked). Two interpretations silently mis-filter against each other. Repeated-key wins: URLSearchParams.append, OpenAPI explode:true default, and adcp-client#1892 (shipped) already produce it. Comma-separated input is rejected at the directory with 400 rather than silently coerced. Resolves #4855. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Spec for
GET /v1/agents/{agent_url}/publishers— the AAO directory's inverse-lookup endpoint. Returns the set of publishers whoseadagents.jsonauthorizes a givenagent_url, with provenance, per-publisher property counts, and lifecycle status.Scope of this PR is spec-only. Server implementation is non-trivial (new
discovery_method/manager_domaincolumns, per-publisher selector resolution to compute counts,signing_keys_pinnedlookup) and tracked separately — see Follow-ups below.Files
static/schemas/source/aao/agent-publishers.json— response envelope schema. Newstatic/schemas/source/aao/subdirectory.docs/aao/directory-api.mdx— full API reference: endpoint shape, query params, response, field reference, HTTP semantics, pagination, relationship to existing SDK primitives.docs.json— addsdocs/aao/directory-apito the "Using AAO" nav group..changeset/4823-aao-directory-inverse-lookup-endpoint.md.Design decisions (from issue triage)
Status enum scoped to
authorized+revokedfor v1Triage Decision 1. The richer
unbound/pendingstates presuppose crawler state the directory doesn't have today — andpendingin particular requires snapshot semantics around publishers whose files we've fetched but don't list this agent (a separate indexing pass). Ship what's queryable; leave room to add states later.revokedis cheap to surface: when a publisher'sadagents.jsonnewly lists thispublisher_domaininrevoked_publisher_domains[], the directory emits one tombstone row on the next sync, then drops it.Counts kept, scoped per-publisher, renamed
Triage Decision 2. The "12/12 reads as full auth but really means 12-of-6800-network" critique is real, but the directory's whole value-add is that it ran the resolution once so operators don't all hammer 6,800 publishers in parallel. Fix is naming + scoping, not dropping counts.
Both scoped to the row's publisher, never network-wide.
recipeswithessentialoils.comreturns{ properties_authorized: 1, properties_total: 1 }, and the operator sees 6,800 such rows summing to the network total — correct mental model, no ambiguity.discovery_methoddistinguishes four trust profilesPer the protocol expert:
direct,authoritative_location,adagents_authoritative,ads_txt_managerdomain. Themanagerdomainpath is the weakest (the safety rule is the only positive cross-check); the directory does that bilateral check once, for everyone — main value-add over a per-operatorads.txtcrawl.signing_keys_pinned: boolper rowCheap to surface, useful at sync time: tells the operator whether this publisher pins their JWKS, which is the signal that the agent's published keys must match. No new crawler state required.
No auth in v1
Publishers are public; the inverse map is public. Rate-limiting keyed on
agent_url+ IP. Identity-bound limits via RFC 9421 request signing arrive in a separate RFC if needed.Dependency on #4825
properties_totalon managed-network-shape parent files depends on #4825's inline-resolution rule (PR #4827). Strict federation at managed-network scale requires N HTTP fetches per directory refresh per publisher — the same scale problem operators have, moved one layer up. With inline resolution endorsed, the directory computes per-publisher counts from the parent file's inlineproperties[]filtered by matchingpublisher_domain.The docs page links to #4825 as a soft dependency; the schema is independent (works with either resolution path).
Follow-ups
server/src/routes/registry-api.ts, backed byserver/src/db/federated-index-db.ts. The existinggetDomainsForAgentreturns(agent_url, publisher_domain, authorized_for, property_ids, source, discovered_at, last_validated)— needs extension to surfacediscovery_method(vs the existingsourceevidence enum),manager_domain, per-publisher property counts (viaexpandPublisherPropertiesToIdentifiers), andsigning_keys_pinned. Will file as a separate issue.fetch_agent_authorizations_from_directoryin adcp-client-python (#746) and TS/Go/Java mirrors.?include=propertiesfor inline property detail — out of scope for v1; add later if operators ask.Resolves #4823.
Test plan
npm run build:schemasclean — schema bundles todist/schemas/latest/aao/agent-publishers.jsonvitest run, typecheck, dynamic-imports) passesdiscovery_methodenum semantics against existing crawler's actual discovery paths to ensure naming alignsproperties_totalon managed-network files🤖 Generated with Claude Code