Skip to content

docs(governance): audit-trail internal-vs-shareable views + Addie anonymous knowledge tools#3175

Merged
bokelley merged 2 commits intomainfrom
bokelley/gov-wg-response
Apr 25, 2026
Merged

docs(governance): audit-trail internal-vs-shareable views + Addie anonymous knowledge tools#3175
bokelley merged 2 commits intomainfrom
bokelley/gov-wg-response

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Three related changes from a governance WG question on Slack about how to design audit logging.

What's in this PR

1. New doc: docs/governance/campaign/audit-trail.mdx

Explains the internal-vs-shareable view split for get_plan_audit_logs with:

  • Field-by-field tagging table (which fields stay buyer-side, which are seller-shareable, which are regulator-shareable)
  • A 4-field minimum compliance attestation pattern
  • Four worked examples generated by scripts/gen-governance-audit-examples.ts:
    • Clean buy (full audit response, $150K committed against a $500K plan)
    • Security-shaped denial (seller_compliance finding when an unauthorized seller tries to commit)
    • Coaching-shaped denial (Annex III data_subject_contestation missing — denial doubles as actionable guidance)
    • Mode comparison (same payload, three different outcomes under enforce/advisory/audit)

All 6 schema-tagged JSON blocks validate against the canonical schemas via tests/json-schema-validation.test.cjs.

2. Anonymous web-chat gets search_docs/get_doc/search_repos/search_resources/get_recent_news

Previously: Addie's system prompt told her to call search_docs to ground answers, but the web-chat anonymous path didn't register the tool. Result: she'd call a non-existent tool, get "Unknown tool", and fall through to in-prompt speculation. The MCP chat path already exposed these read-only tools to anonymous callers; this aligns the web path.

Anonymous tool count: 7 → 13. The full knowledge-search surface (Slack history, bookmarking) still requires authentication.

Verified the fix end-to-end: production Addie answering this WG question produced speculation about Policy Registry versioning ("are policies versioned by timestamp, version number, or content hash?"); local Addie with the fix produced a grounded answer naming policy_id, version, effective_date, enforcement — fields she retrieved via search_docs.

3. Constraints rule: "Tool Unavailable Is Not 'No Result'"

Adds a section to server/src/addie/rules/constraints.md distinguishing three outcomes:

  1. Tool returned results → cite and answer
  2. Tool returned empty → "I searched and didn't find that in the spec" (per existing No Speculative Answers rule)
  3. Tool was unavailable → state the limitation, offer the path forward, don't retry more than once

Generalizes beyond search_docs to every tool. Caps the failure mode where Addie tries the same broken tool three times in a row.

4. Skill update: skills/adcp-governance/SKILL.md

Adds the campaign-governance task surface (sync_plans, check_governance, report_plan_outcome, get_plan_audit_logs) and three operator-facing invariants:

  • Inline policies can only ADD restrictions over registry policies — they cannot relax enforcement levels
  • effective_date enables informational-before-enforcement (the "minimal restrictions initially" pattern)
  • governance_context is the seller-visible correlation token; plan-level data (budget aggregates, drift metrics, channel allocation) is buyer-side

Issues filed and resolved during this work

Test plan

  • npm run test:docs-nav passes
  • npm run test:json-schema passes (255 schema-tagged blocks validate)
  • npm run typecheck passes
  • Targeted vitest run on server/tests/unit/training-agent.test.ts — all governance audit-log tests pass
  • node tests/json-schema-validation.test.cjs --file docs/governance/campaign/audit-trail.mdx — 6/6 schema-tagged blocks valid
  • Manual: Addie before/after captured in .context/addie-before.json and .context/addie-after-real-fix.json (anonymous chat went from speculation to grounded retrieval)

🤖 Generated with Claude Code

bokelley and others added 2 commits April 25, 2026 11:47
…nymous knowledge tools

Three related changes from a governance WG question on how to design
audit logging:

1. New doc page docs/governance/campaign/audit-trail.mdx — explains the
   internal-vs-shareable view split for get_plan_audit_logs with a
   field-by-field tagging table and four worked examples generated by
   scripts/gen-governance-audit-examples.ts: clean buy, security-shaped
   denial (seller_compliance), coaching-shaped denial (Annex III prerequisite),
   and the enforce/advisory/audit mode comparison. All schema-tagged JSON
   blocks validate against the canonical schemas via tests/json-schema-validation.

2. Wire anonymous web-chat callers to receive search_docs, get_doc,
   search_repos, search_resources, and get_recent_news. Anonymous web chat
   previously had directory tools only; the system prompt told Addie to
   call search_docs and the tool wasn't registered, producing speculative
   answers instead of grounded ones. The MCP chat path already exposed the
   same set; this aligns the web path. Anonymous tool count: 7 → 13.

3. Constraints rule "Tool Unavailable Is Not 'No Result'" — distinguishes
   tool-returned-empty (say "I didn't find it") from tool-unavailable
   (say "I couldn't reach docs search; sign in for grounded answers"),
   and caps retries at one. Generalizes beyond search_docs to every tool.

4. skills/adcp-governance/SKILL.md gains the campaign-governance task
   surface (sync_plans, check_governance, report_plan_outcome,
   get_plan_audit_logs) and three operator-facing invariants:
   inline policies cannot relax registry policies, effective_date enables
   informational-before-enforcement, governance_context is the
   seller-visible correlation token while plan-level data is buyer-side.

Filed and resolved upstream from this work: #3139, #3140, #3156 (→ #3160
merged), #3162 (→ #3163 merged), #3169 merged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rom anonymous knowledge tools

Security fixes from expert review of PR #3175:

1. **Private working-group docs leaking via search_docs**
   server/src/db/working-group-db.ts: getIndexedDocumentsWithContent now
   filters wg.is_private = false. Without this, exposing search_docs to
   anonymous web-chat (the headline change in this PR) widens the surface
   from "anyone with an MCP client" to "anyone on the public web" for
   committee minutes, brand-confidential briefs, and draft policy entries
   indexed from private working groups. Authenticated WG members still
   access their content via the WG-specific pages — that path doesn't
   ride this index.

2. **Prompt-injection pipeline via user-bookmarked URLs**
   bookmark_resource (Slack-authenticated) queues arbitrary URLs that get
   fetched, summarized into addie_notes, and surfaced via search_resources/
   get_recent_news. Anonymous Addie inherits that surface and the notes
   land in its prompt as "Addie's Take." Two-layer fix:
   - DB layer: searchCuratedResources / getRecentNews accept
     excludeUserSubmitted to drop source_type='web_search' (and 'community'
     for news) from results.
   - Handler layer: createKnowledgeToolHandlers gains an `anonymous` option;
     when true, passes excludeUserSubmitted through and strips addie_notes
     from formatted output. Both addie-chat.ts (web) and chat-tool.ts (MCP)
     pass anonymous: true on the global registration. Authenticated callers
     get the full handler via per-request override (claude-client.ts:594
     "last wins" merge).

Doc / prompt-engineering nits:

3. audit-trail.mdx: add entries[].mode and entries[].purchase_type rows
   to the field-tagging table (mode is the whole point of #3160 #3156);
   note that budget.utilization_pct is a one-step inverse problem (just
   as leaky as raw amounts); soften "on request" wording for drift_metrics
   to match the §103 "protocol does not define a regulator API" caveat.

4. constraints.md: tighten "Tool Unavailable Is Not 'No Result'" rule —
   drop fragile back-reference, replace upsell script with a behavioral
   shape (don't pitch; one line is enough), make the no-retry rule
   unambiguous ("Do not retry. One failure is the signal.").

5. SKILL.md: rename "Three invariants to lead with" to "Three invariants
   for audit and disclosure decisions" (the skill loads into orchestrator
   context, not a chat persona); demote effective_date to a doc-link note
   and promote plan_hash as the third load-bearing invariant for
   counterparty disclosure.

6. addie-chat.ts: anonymousTools log line now uses
   claudeClient.getRegisteredTools().length as source of truth instead
   of a hand-rolled sum that would silently drift.

7. gen-governance-audit-examples.ts: FIXME annotation referencing
   FRAMEWORK_MIGRATION.md so the in-flight server.server._requestHandlers
   migration sweep catches this script.

Verified end-to-end: anonymous local Addie answering the original WG
question still produces grounded retrieval (6 tool calls, 0 errors,
cites the audit-trail doc) — the security scoping doesn't break the
legitimate use case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

Pushed e0d1cfa97 addressing all expert-review feedback. Summary:

Must-Fix (security)

  • Private working-group docs were leaking through search_docs/get_doc to anonymous callers — getIndexedDocumentsWithContent now filters wg.is_private = false. The PR didn't introduce the bug but exposing search_docs anonymously widened the surface; this closes that.
  • bookmark_resource (Slack-authenticated) → addie_notes (LLM summary of arbitrary URL content) → anonymous search_resources/get_recent_news was a prompt-injection pipeline. Two-layer fix: DB methods take excludeUserSubmitted to drop source_type='web_search'/'community'; handler factory takes anonymous to also strip addie_notes from formatted output. Both addie-chat.ts (web) and chat-tool.ts (MCP) pass anonymous: true on the global registration. Authenticated callers get the full handler via per-request override (last-wins merge in claude-client.ts:594).

Must-Fix (doc gap)

Should-Fix

  • budget.utilization_pct flagged as a one-step inverse problem in §1 (just as leaky as raw amounts).
  • Regulator "on request" wording for drift_metrics softened to match the existing §103 caveat that the protocol does not define a regulator API.
  • Constraint rule "Tool Unavailable Is Not 'No Result'": dropped the fragile back-reference, replaced the upsell script with a behavioral shape (don't pitch; one line is enough), made the no-retry rule unambiguous.
  • SKILL.md: renamed "Three invariants to lead with" → "for audit and disclosure decisions" (the skill loads into orchestrator context, not chat persona); demoted effective_date to a doc-link note; promoted plan_hash as the third load-bearing invariant.
  • anonymousTools log line now uses claudeClient.getRegisteredTools().length as source of truth.
  • Added FIXME annotation in gen-governance-audit-examples.ts referencing FRAMEWORK_MIGRATION.md so the in-flight migration sweep catches the script.

Nits skipped (per reviewer note)

  • .context/ is gitignored at the conductor level (system docs note this).
  • The "Sign in at agenticadvertising.org" example phrasing was already softened to a behavioral shape; no infrastructure leak per security review.

Changeset bump — keeping --empty. The doc is purely descriptive (restates existing schema invariants and spec; cites specifications.mdx and policy-entry.json); no normative claims, no protocol-spec changes. Per .agents/playbook.md: "Only use patch/minor/major for changes to the published AdCP protocol spec."

Verified end-to-end: anonymous local Addie answering the original WG question produces grounded retrieval (6 tool calls, 0 errors, cites the audit-trail doc) — the security scoping doesn't break the legitimate use case.

CI re-running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant