Skip to content

feat(registry): manager fan-out re-validation endpoint#4251

Merged
bokelley merged 3 commits intomainfrom
claude/issue-4200-finish
May 8, 2026
Merged

feat(registry): manager fan-out re-validation endpoint#4251
bokelley merged 3 commits intomainfrom
claude/issue-4200-finish

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 8, 2026

Closes #4200 item 5. New `POST /api/registry/manager-revalidation-request` short-circuits the 60-minute organic crawl cycle so ops can propagate a manager-side `adagents.json` rotation immediately.

What lands

  • Endpoint thin-wraps `enqueueManagerRevalidation` (landed in feat(crawler): queue-backed manager revalidation fan-out #4210). The worker tick (`processManagerRevalidationQueue`) drains the queue at a bounded rate, so a Raptive-scale rotation propagates within ~10 hours of this request rather than the worst-case 60-minute crawl × N-publisher-rotations to converge organically.
  • Body: `{ manager_domain }`. Lower-cased and trimmed.
  • Returns `202` with `publishers_enqueued` (rows touched in the queue, zero when nobody delegates).
  • Rate-limited via the shared `validateAndRateLimitCrawl` machinery. Key is namespaced (`manager:` prefix) so a manager-recrawl request doesn't bypass an in-window publisher recrawl on the same domain or vice-versa. Hourly per-member limit is shared with the other crawl endpoints.

Tests

Integration coverage in `manager-revalidation-endpoint.test.ts`: enqueue happy path with multiple delegating publishers, zero-count when none delegate, 400 on missing `manager_domain`, lower-case + trim normalization, per-domain rate-limit window.

#4200 source-enum extension: closing as won't-fix

The other follow-up tracked under #4200 was extending the per-agent `source` enum to `adagents_json_via_manager`. Not shipping, deliberate:

  • `publishers.discovery_method` (landed in feat(registry): persist managerdomain discovery provenance on publisher rows #4204) already lets consumers discriminate via a join — `agent_publisher_authorizations ⨝ publishers ON publisher_domain`.
  • Minting a separate per-agent value would be denormalization (duplicate signal at two layers).
  • Worse, every existing reader filtering on `source='adagents_json'` (e.g. `federated-index.ts:131`) would silently exclude managerdomain-discovered rows. That's a behavior change disguised as an additive enum extension.

The publisher-side discovery_method is the right discriminator. Closing that thread on the issue.

Refs #4200, #4173, #4204, #4210.

bokelley added 3 commits May 8, 2026 15:08
Closes #4200 item 5.

POST /api/registry/manager-revalidation-request short-circuits the
60-minute organic crawl cycle: when a manager rotates its adagents.json,
ops can hit this endpoint and have every delegating publisher enqueued
immediately rather than waiting for a routine sweep to detect drift.

Thin wrapper around enqueueManagerRevalidation (#4210). The crawler's
worker tick (processManagerRevalidationQueue) drains the queue at a
bounded rate, so a Raptive-scale rotation propagates within ~10 hours
of this request.

- Body: { manager_domain }. Lower-cased and trimmed.
- Returns 202 with publishers_enqueued (rows touched in the queue).
- Rate-limited via the shared validateAndRateLimitCrawl machinery.
  Key is namespaced ("manager:" prefix) so a manager request doesn't
  bypass an in-window publisher recrawl on the same domain or
  vice-versa. Hourly per-member limit is shared with other crawl
  endpoints.

Per-agent source enum extension to 'adagents_json_via_manager' is
NOT shipping in this PR. Re-evaluating: publishers.discovery_method
(landed in #4204) already lets consumers join through and discriminate
direct vs. managerdomain-discovered authorizations. A separate
per-agent enum value would be denormalization and would silently
exclude managerdomain rows from every existing reader filtering on
source='adagents_json'. Closing that follow-up as won't-fix on the
issue.
.example.com subdomains don't resolve in CI; mock validateCrawlDomain
to a pass-through to exercise the handler logic. Mirrors the pattern
used in registry-publisher-brand-json-hydration.test.ts.
…mit state

The crawl-rate-limit Map lives in createRegistryApiRouter's closure;
sharing one app across cases meant the second test on the same manager
domain hit the 5-minute window from the first test's request. Rebuild
in beforeEach so each test gets a fresh limit state.
@bokelley bokelley merged commit 7bf68b6 into main May 8, 2026
13 checks passed
@bokelley bokelley deleted the claude/issue-4200-finish branch May 8, 2026 19:23
bokelley added a commit that referenced this pull request May 9, 2026
…n scope gate (#4283)

Extends the explicit-publisher-scoping gate from #4173 to accept property-level publisher_domain declarations in addition to the existing per-agent paths. Per-agent (publisher_properties[].publisher_domain, collections[].publisher_domain) unchanged; new path requires a properties[] entry with publisher_domain matching the source AND an authorized_agents[] entry that reaches that property via property_ids or property_tags.

Surfaced via real-world probe of homestratosphere.com / mediavine.com after #4251 merged. Mediavine's actual production manifest scopes via property-level publisher_domain + tag-based agent references, which the original gate rejected. The cross-publisher commitment is still expressly declared — just routed through the property layer.

Cross-publisher confusion attacks still fail closed. Sent design question to @patmmccann on #4173 about canonical shape; will iterate if his read is stricter.

Refs #4173, #4200, #4251.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AAO crawler/API: persist managerdomain discovery provenance and reverse index

1 participant