Skip to content

feat(adagents): publisher_domains compact form, revoked_publisher_domains, streaming fetch caps (closes #729)#753

Merged
bokelley merged 2 commits into
mainfrom
bokelley/issue-729-adagents-scope
May 20, 2026
Merged

feat(adagents): publisher_domains compact form, revoked_publisher_domains, streaming fetch caps (closes #729)#753
bokelley merged 2 commits into
mainfrom
bokelley/issue-729-adagents-scope

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Implements the full scope of issue #729 (adcp#4504 — publisher_domains[] compact form on publisher_properties selectors) plus the scope-expansion item from the issue thread (top-level revoked_publisher_domains[] on adagents.json with a Reason enum), and rewrites the outbound fetch path with two-tier streaming size caps and an If-None-Match / If-Modified-Since cache.

Built in two commits:

  1. 5160ab0b — feature implementation (schemas, generated types, validator, resolver, fetch rewrite, 24 new tests + 122 ported).
  2. fcb967c1 — review fixups after running the diff through three expert subagents (protocol, security, code review).

What's in scope

Schema (cached, hand-patched for 3.0):

  • publisher-property-selector.json gains optional publisher_domains[] (XOR with publisher_domain, by_id selectors excluded).
  • adagents.json gains top-level revoked_publisher_domains[] with Reason enum.

The cached schemas will be overwritten on the next make regenerate-schemas once upstream 3.0.10+ ships and ADCP_VERSION is bumped — the runtime / test changes are independent of the cached schema text.

Runtime validation (validation/legacy.py):

  • validate_publisher_properties_item enforces both XORs (selector and publisher_domain).
  • validate_revoked_publisher_domain_entry enforces shape + Reason enum.
  • validate_adagents plumbing fixed (was keyed off agents instead of authorized_agents — pre-existing typo that made the new XOR validator dead at the integration level until this PR).

Resolver (adagents.py):

  • _fanout_publisher_properties expands compact entries one-per-domain, preserving order across mixed compact+expanded arrays.
  • filter_revoked_selectors + _get_revoked_publisher_domains apply revocation precedence to both publisher_properties[] selectors and top-level properties[].

Fetch path (adagents.py):

  • Rewritten onto httpx.AsyncClient.stream with two-tier caps (MAX_POINTER_BYTES=5MiB first hop, MAX_AUTHORITATIVE_BYTES=20MiB second hop).
  • Content-Length pre-check + per-chunk running-total guard.
  • follow_redirects=False so HTTP 30x cannot bypass the SSRF gate that protects authoritative_location (the only sanctioned cross-host delegation path).
  • _validate_publisher_domain now calls the same SSRF gate as _validate_redirect_url (factored into _check_safe_host), so first-hop IP literals and internal hostnames (localhost, .local, .internal, metadata.google.internal, RFC 1918, link-local, multicast, reserved) are rejected.
  • New AdagentsCacheEntry / AdagentsFetchResult types + fetch_adagents_with_cache send If-None-Match / If-Modified-Since; 304 → cache-lifetime refresh. Cache validators truncated above 256 bytes before persisting.

Expert review — closed in this PR

Severity Finding Resolution
Security · Critical follow_redirects=True bypasses SSRF gate on HTTP 30x Set False; only authoritative_location follows.
Security · Critical First-hop publisher domain has no IP-literal gate _check_safe_host called from _validate_publisher_domain.
Security · High Etag/Last-Modified replayed unbounded 256-byte cap on persisted validators.
Security · Medium except Exception + unbounded body in error except json.JSONDecodeError, message truncated to 200 chars.
Protocol · Blocker validate_adagents walked agents not authorized_agents Fixed; integration validator now actually runs.
Code · Must-fix Dead "304/" token in redirect-hop error Removed.
Code · Must-fix Dead is_redirect ternary in not_modified return Removed; only first-hop 304 is reachable.
Code · Must-fix Misleading "original retained" comment on _fanout Comment fixed.
Code · Should-fix _REVOCATION_REASONS drift risk vs generated enum Drift-detection unit test added.
Code · Should-fix Streaming-cap test passed even if running-total guard deleted Multi-chunk test added.
Code · Should-fix No mixed-compact-and-expanded test Added.
Code · Should-fix No last_modified-only cache test Added.

Deferred follow-ups (tracked, not blocking)

  • DNS pinning to harden against rebinding (security H1) — needs a custom httpx transport.
  • Pydantic model_validator for selector XOR — datamodel-codegen can't emit allOf[not[required[both]]] + anyOf[required[either]], so the typed Pydantic surface is laxer than the JSON Schema. The runtime validator covers the dict-parsing path; typed consumers will need an explicit gate.
  • Revocation filtering on the inline_properties branchfilter_revoked_selectors covers publisher_properties[] and the top-level properties[]; the inline path is currently unfiltered. Worth confirming spec intent before adding.
  • AdagentsCacheEntry.body vs AdagentsFetchResult.data field-name inconsistency — defer to a follow-up rename to avoid bouncing the public API in this PR.

Test plan

  • ruff check src/ tests/test_adagents.py — passes
  • mypy src/adcp/ — 807 source files, no issues
  • pytest tests/ — 4,821 passed, 34 skipped, 1 xfailed (no failures, no flakes)
  • 146 adagents tests pass: 122 ported onto the streaming mock helper, 24 new across four classes (compact form, revoked domains, fetch-with-cache, size caps)
  • CI green across Python 3.10–3.13

Notes for reviewers

  • per-authorized_agents[] last_updated is intentionally omitted — the merged upstream spec PR does not carry that field, confirmed against the latest upstream schema. The original issue thread discussed it; the spec community removed it.
  • The new public exports in adcp.__init__.py: AdagentsCacheEntry, AdagentsFetchResult, fetch_adagents_with_cache, filter_revoked_selectors. Public-API snapshot updated.

🤖 Generated with Claude Code

bokelley and others added 2 commits May 20, 2026 07:41
…ains, streaming fetch caps (#729)

Implements full scope of issue #729 plus the scope-expansion comment:

- publisher-property-selector schema gains optional publisher_domains[] compact
  form (XOR with publisher_domain; by_id selectors excluded from compact form).
- adagents.json gains top-level revoked_publisher_domains[] with Reason enum.
- Resolver: _fanout_publisher_properties expands compact entries one-per-domain;
  filter_revoked_selectors applies revocation precedence over both
  publisher_properties[] selectors and top-level properties[].
- Fetch path rewritten onto httpx.AsyncClient.stream with two-tier size caps
  (MAX_POINTER_BYTES=5MiB first hop, MAX_AUTHORITATIVE_BYTES=20MiB second hop)
  and Content-Length pre-check. New fetch_adagents_with_cache sends
  If-None-Match / If-Modified-Since; 304 treated as cache-lifetime refresh.
- Runtime validation: two XORs enforced on publisher_properties[] entries;
  by_id rejects publisher_domains; validate_revoked_publisher_domain_entry
  plumbed into validate_adagents.
- 24 new tests across four classes; 122 existing adagents tests ported onto
  the new streaming mock helper.

Cached schemas under schemas/cache/3.0/ are hand-patched and will be
overwritten by the next \`make regenerate-schemas\` once upstream 3.0.10+
ships and ADCP_VERSION is bumped — that's fine; the runtime/test changes
are independent of the cached schema text.

Per-authorized_agents[] last_updated (also discussed in the spec PR thread)
is intentionally not included — the merged upstream spec PR does not carry
that field, confirmed against the latest schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three reviewer passes (protocol, security, code) on 5160ab0 surfaced one
must-fix per area; this commit closes them.

Security:
- `_stream_capped` now passes `follow_redirects=False`. HTTP 30x cannot be
  used to slip past the SSRF gate that protects `authoritative_location`
  (which is the only sanctioned cross-host delegation path).
- `_validate_publisher_domain` calls the same SSRF gate (`_check_safe_host`)
  used for redirect targets, so IP literals and internal hostnames are
  rejected on the first hop too. `is_multicast` added to the IP class set.
- Cache validators (ETag / Last-Modified) are truncated above 256 bytes
  before being persisted, so a hostile origin cannot inflate every
  subsequent conditional request.
- The JSON-decode error path catches `json.JSONDecodeError` (not bare
  `Exception`) and truncates the upstream-derived message to 200 chars
  to bound log injection.

Protocol:
- `validate_adagents` keyed off `agents`; the schema field is
  `authorized_agents`. The new `validate_publisher_properties_item` XOR
  enforcer was therefore dead at the integration level. Now lit up.

Code review:
- Drop dead "304/" token from the redirect-hop error message.
- Drop dead ternary in the 304-on-first-hop return — `is_redirect` can
  never be true there.
- Fix misleading comment on `_fanout_publisher_properties` (the original
  compact entry is *not* retained — the docstring claimed otherwise).
- Add drift-detection test that asserts `_REVOCATION_REASONS` matches
  the generated `Reason` enum.

Tests:
- Multi-chunk streaming-cap test (exercises the running-total guard
  without relying on a single oversized chunk).
- Mixed compact + expanded entries in the same `publisher_properties[]`
  array, with order preservation.
- Cache entry carrying only `Last-Modified` (no ETag) — verify the
  conditional headers and the 304 cache-hit path.
- Three existing test fixtures updated to be schema-compliant
  (`authorization_type` + `property_ids`) so the now-live integration
  validator no longer rejects them as pre-discriminator legacy.

Deferred follow-ups (not blocking this PR):
- DNS pinning to harden against rebinding attacks (security H1).
- Pydantic `model_validator` for selector XOR — datamodel-codegen can't
  emit `allOf[not[required[both]]]`; the runtime validator covers the
  dict path, but typed Pydantic consumers still need an explicit gate.
- Revocation filtering on the `inline_properties` branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley merged commit 352d1bb into main May 20, 2026
16 checks passed
@bokelley bokelley deleted the bokelley/issue-729-adagents-scope branch May 20, 2026 17:12
bokelley added a commit that referenced this pull request May 20, 2026
#754)

Mirrors the adagents.json streaming-fetch posture (`_stream_capped` uses
`follow_redirects=False`): ads.txt is a publisher-controlled URL and a
transparent 30x by httpx would bypass `_check_safe_host` on the
resolved Location, re-opening the same SSRF surface the main path
already closed.

Surfaced as a defense-in-depth follow-up by the round-2 security
review on PR #753 — was pre-existing behavior, not regressed by that
PR, but worth closing.

ads.txt fetch is best-effort by construction (any non-200 returns an
empty MANAGERDOMAIN list); publishers who 30x their ads.txt now fall
through to "no managerdomain found", and the SDK surfaces the
publisher's original 404 — the same outcome as if the publisher had
no ads.txt at all. Acceptable for a fallback path.

Adds `test_ads_txt_30x_is_not_followed` covering the new behavior.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bokelley added a commit that referenced this pull request May 20, 2026
… models (#756)

Closes the documented Pydantic-XOR gap surfaced by the round-1 protocol
review on #729 / PR #753.

datamodel-codegen cannot translate the publisher-property-selector
JSON Schema's `allOf[not[required[both]]] + anyOf[required[either]]`
construct into Pydantic field constraints, so direct instantiation of
`PublisherPropertySelector1` / `…3` silently accepts payloads the schema
rejects (notably `{selection_type: "all"}` with neither
`publisher_domain` nor `publisher_domains` set).

This change lets Pydantic consumers close the gap with a single call:

    selector = PublisherPropertySelector1.model_validate(payload)
    validate_publisher_properties_item(selector)  # raises if XOR fails

The helper now accepts either a dict (existing wire-form path) or any
object with a `.model_dump()` method (i.e. a Pydantic model). For
anything else it raises a clear error.

Tests cover both the model and the dict input forms plus the
type-error path. Auto-enforcement at parse time (a `model_validator`
attached post-class to the generated selectors) would require either
hand-patching generated_poc/ — forbidden by the codebase's strict
layering rule — or hooking Pydantic core-schema internals, which is
fragile. A separate follow-up issue tracks that work.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant