refactor: move governance metadata from typed core schema to `agentv-compliance` skill

## Objective

Replace the typed governance metadata schema in `packages/core/` (introduced in #1165) with an **AI-facing skill** that teaches agents how to author OWASP / MITRE ATLAS / EU AI Act / ISO 42001 governance blocks correctly, plus a **reference GitHub Action** that loads the same skill to provide CI enforcement. Core retains only a **generic** `metadata: Record<string, unknown>` pass-through that already exists in `EvalMetadata` / `EvalTest`.

The skill is the single source of truth for vocabulary and validation rules — used interactively for AI authoring and non-interactively (via the Action) for CI lint. **No SDK package is needed**: a Claude invocation with the skill loaded subsumes both use cases.

Three phases — ideally one PR per phase, all land in this issue:

1. **Phase 1 — additive: create the skill.** `plugins/agentv-dev/skills/agentv-compliance/` with vocabulary, examples, and AI-authoring guidance. No core changes.
2. **Phase 2 — subtractive: slim core.** Once Phase 1 is merged, remove `GovernanceMetadataSchema`, `KNOWN_GOVERNANCE_FIELDS`, `EU_AI_ACT_RISK_TIERS`, `validateGovernance`, and `isWellFormedControlId` from `packages/core/`. Make the suite-↔-case `metadata` merge in `yaml-parser.ts` generic (no `governance` special case). Existing red-team suites in #1166/#1168 continue to run unchanged because core still pass-throughs `metadata`.
3. **Phase 3 — additive: reference CI Action.** Ship a small GitHub Action template under `examples/governance/compliance-lint/` that invokes Claude with the `agentv-compliance` skill loaded against changed `*.eval.yaml` files and reports pass/fail per governance block.

## Why

PR #1165 added typed fields (`owasp_llm_top_10_2025`, `owasp_agentic_top_10_2025`, `mitre_atlas`, `controls`, `risk_tier`, `owner`) plus a hardcoded vocabulary lint that emits **soft warnings only**. Investigation found:

- **Core does not consume any governance value.** It parses, lints typo'd keys, and pass-throughs to JSONL. No grader, scorer, scheduler, or aggregator branches on the field values — they're text labels destined for downstream tooling (jq pipelines, the `.ai-register.yaml` aggregator from #1167, attestation reports).
- **AGENTS.md design-principle test fails 3 of 4 critical checks:**
  - #1 Lightweight Core — could be achieved with the existing generic `metadata` map. **Fail.**
  - #2 Built-ins for primitives only — \"needed by majority of users\"? Compliance tagging is enterprise/regulated-industry territory. Most users won't tag suites with EU AI Act articles. **Fail.**
  - #4 Align with Industry Standards — promptfoo (the named reference) keeps these as data, not typed core fields. **Fail.**
  - #5 YAGNI — six named typed fields + vocabulary lint is a \"bigger X than asked for\". **Fail.**
- **AGENTS.md #7 (AI-First Design)** explicitly prefers \"Skills over rigid commands\" — \"Skills should cover most use cases; rigid commands trade off AI intelligence.\" Soft warnings on typo'd OWASP keys are exactly the kind of guidance a skill replaces more elegantly than runtime code. And because a skill can be loaded by a CI Action, the skill subsumes both authoring and enforcement — no separate runtime package is needed.
- **Promptfoo precedent.** Generic `metadata: Record<string, any>` in core ([config reference](https://www.promptfoo.dev/docs/configuration/reference/)). All framework presets (OWASP_LLM_TOP_10_MAPPING, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO 42001, GDPR) are static data tables in `src/redteam/constants/frameworks.ts`, opt-in via `redteam.frameworks: [owasp:llm, mitre:atlas]` — never typed onto the core test schema.
- **Release decoupling.** EU AI Act delegated acts, OWASP versions, MITRE ATLAS revisions all churn on regulator/security cadence. With the skill model, content updates ship without touching the engine, and CI consumers automatically pick up new vocabulary on the next Action run.

---

## Phase 1 — Create `agentv-compliance` skill

**Location:** `plugins/agentv-dev/skills/agentv-compliance/`

**Skill structure (follow conventions in sibling skills under `plugins/agentv-dev/skills/`):**

- `SKILL.md` — top-level skill description with `description:` frontmatter that triggers when an AI agent is authoring or editing a `*.eval.yaml` file with `governance:` metadata. Also describe the dual mode: \"interactively this skill helps you author governance blocks; non-interactively (e.g., from a GitHub Action) it lints them.\"
- `references/owasp-llm-top-10-2025.md` — the 10 categories with brief descriptions + canonical IDs (LLM01..LLM10), official source link.
- `references/owasp-agentic-top-10-2025.md` — the agentic categories (T1..Tn) with descriptions.
- `references/mitre-atlas.md` — common AML.Txxxx techniques relevant to LLM/agent eval, with link to https://atlas.mitre.org/.
- `references/eu-ai-act-risk-tiers.md` — the four risk tiers (`prohibited` / `high_risk` / `limited_risk` / `minimal_risk`) with one-line definitions and EU AI Act article references.
- `references/iso-42001-controls.md` — common controls relevant to AI eval (a curated subset, not exhaustive — point to the standard for completeness).
- `references/governance-yaml-shape.md` — the YAML shape of a governance block, with at least three example blocks copied from the suites landed in #1166/#1168 (`examples/red-team/suites/llm01-prompt-injection.eval.yaml`, `examples/red-team/archetypes/coding-agent/suites/destructive-git.eval.yaml`). Document the `<FRAMEWORK>-<VERSION>:<ID>` control-id convention currently enforced in `isWellFormedControlId`.
- `references/lint-rules.md` — explicit rules an AI applies when asked to lint a governance block: known-key allowlist, allowed values per key, control-id shape regex, expected interactions between fields. This is what Phase 3's Action prompts the skill to apply.

**Acceptance signals (Phase 1):**

- Skill loads cleanly when an AI agent edits a `*.eval.yaml` file with a `governance:` block (verify with the skill discovery mechanism the repo already uses).
- A novice AI agent following only the skill can author a syntactically correct governance block with valid OWASP / MITRE / EU AI Act values, matching the shape currently produced by the #1166/#1168 suites.
- Given a malformed governance block as input, the skill produces a structured lint report (pass/fail per rule + offending value + suggested fix). This rehearses the contract Phase 3 depends on.
- All five reference files cite their official source (OWASP, MITRE, EU regulation text, ISO standard).

**Non-goals (Phase 1):**

- Do **not** ship runtime code in this phase. The skill is markdown-only.
- Do **not** ship curated mapping tables (\"OWASP category → suggested attack patterns\"). That's a separate content milestone.
- Do **not** create `examples/` content for this skill — the existing #1166/#1168 suites are the canonical examples.

---

## Phase 2 — Slim core

**Files to delete or substantially rewrite (in `packages/core/src/evaluation/`):**

- `metadata.ts` — delete `GovernanceMetadataSchema` and the typed `governance` field on whatever schema currently embeds it. Keep only a generic `metadata: Record<string, unknown>` (already present on `EvalMetadata` and `EvalTest`; verify and consolidate).
- `validation/eval-validator.ts` — delete `KNOWN_GOVERNANCE_FIELDS`, `EU_AI_ACT_RISK_TIERS`, `validateGovernance`, `isWellFormedControlId`, and the lint call sites. Run `bun run typecheck` to surface dangling references.
- `yaml-parser.ts` (lines 936–993, suite-↔-case `metadata` merge) — generalise: arrays concatenate-and-dedupe, scalars on the case override. Remove any `governance`-specific branches; the same rules apply to all keys under `metadata`.
- `orchestrator.ts` — the metadata pass-through stays. Verify it still surfaces the merged `metadata` onto `EvaluationResult` exactly as today.
- `types.ts` — `metadata?: Record<string, unknown>` on `EvaluationResult` stays unchanged.

**Acceptance signals (Phase 2):**

- All red-team suites in `examples/red-team/` run unchanged: `bun apps/cli/src/cli.ts eval examples/red-team/suites/llm01-prompt-injection.eval.yaml --target azure --test-id direct-ignore-previous --output .agentv/results/uat-phase2 --budget-usd 0.05`.
- JSONL output is **byte-identical** for the merged `metadata` block compared to a baseline captured from `main` before this PR. The only behavioural diff: a typo'd OWASP key no longer produces a soft warning at load time (instead, lint moves to Phase 3's Action).
- `bun run typecheck` and `bun run test` are clean.
- Pre-push hook passes (depends on #1170 + the input.test.ts timeout follow-up landing first).

**Non-goals (Phase 2):**

- Do **not** add a generic `registerMetadataValidator(name, fn)` hook in this phase. YAGNI — the skill + Action covers enforcement; nothing in the runtime needs a hook.
- Do **not** modify `examples/governance/ai-register/` (from #1167); it consumes the JSONL output, not the schema.

---

## Phase 3 — Reference CI Action (`compliance-lint`)

**Goal:** demonstrate that the same skill that powers AI authoring also powers CI enforcement. Users wanting hard CI gating drop the Action into their workflow; users who don't want gating ignore it.

**Location:** `examples/governance/compliance-lint/`

**Deliverable:**

- `compliance-lint.yml` — a reusable GitHub Action workflow file. On `pull_request` events affecting `**/*.eval.yaml`, it:
  1. Checks out the repo + the `agentv-compliance` skill (from `plugins/agentv-dev/skills/agentv-compliance/`).
  2. For each changed `*.eval.yaml`, extracts the `governance:` block and passes it to a Claude invocation with the skill loaded.
  3. The Claude invocation returns a structured lint report (JSON: `{ pass: bool, violations: [{ rule, key, value, message, suggestion }] }`).
  4. The Action posts violations as PR review comments and exits non-zero on `pass: false`.
- `README.md` — documents how a downstream consumer adopts this in their own repo: copy the workflow, point it at their skill location (or use the upstream skill via the `claude-plugins-official` marketplace mechanism the repo already uses), set `ANTHROPIC_API_KEY` secret, optionally narrow trigger paths.
- A `script/` step (small Python or TypeScript) that does the diff extraction and Claude invocation. Use the Claude API skill conventions documented in the repo. Cache aggressively to keep CI cost low.

**Acceptance signals (Phase 3):**

- The Action runs against this repo's own `examples/red-team/` suites in CI and reports `pass: true` for all of them.
- A deliberately malformed governance block (e.g., `risk_tier: super_high`, `owasp_llm_top_10_2025: [LLM99]`, malformed control id) produces a violation with a clear message and suggestion.
- README walks a fresh consumer through adoption in under five minutes.
- Per-run cost on a 10-file PR is documented (target: under 5 cents using a cheap Claude model).

**Non-goals (Phase 3):**

- Do **not** make this Action mandatory on the agentv repo's own CI. It's a reference / opt-in. The core test gate stays as-is.
- Do **not** build a full SDK around the Action. The script is intentionally minimal; users who want richer reporting can fork.
- Do **not** support languages other than YAML governance blocks in v1.

---

## Design latitude

- Skill folder name (`agentv-compliance` proposed) is open. Mirror naming conventions in sibling skills.
- Phase 3 can use Claude SDK, the `claude` CLI, or direct Anthropic API — pick whichever has the lightest setup in GitHub Actions and best caching behaviour.
- One PR per phase preferred, but bundling Phases 1+2 in one PR is acceptable if the diff stays focused. Phase 3 should ideally land in its own PR so the Action can be iterated on without blocking the core slim.
- Reference files in the skill can be split or merged differently than the breakdown above as long as the AI-authoring and lint-rule contracts are clear.

## Related

- #1165 — introduced the typed governance fields (the work this refactors)
- #1166 — first red-team suite consuming the schema
- #1167 — `.ai-register.yaml` aggregator (canonical downstream consumer of `governance.*`)
- #1168 — archetype suites
- AGENTS.md design principles #1, #2, #4, #5, #7
- Promptfoo references: [OWASP LLM Top 10](https://www.promptfoo.dev/docs/red-team/owasp-llm-top-10/), [Red Team Configuration](https://www.promptfoo.dev/docs/red-team/configuration/), [Configuration Reference](https://www.promptfoo.dev/docs/configuration/reference/)
- DeepTeam: https://www.trydeepteam.com/docs/frameworks-owasp-top-10-for-llms
- Inspect AI: https://inspect.aisi.org.uk/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move governance metadata from typed core schema to `agentv-compliance` skill #1172

Objective

Why

Phase 1 — Create `agentv-compliance` skill

Phase 2 — Slim core

Phase 3 — Reference CI Action (`compliance-lint`)

Design latitude

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

refactor: move governance metadata from typed core schema to agentv-compliance skill #1172

Description

Objective

Why

Phase 1 — Create agentv-compliance skill

Phase 2 — Slim core

Phase 3 — Reference CI Action (compliance-lint)

Design latitude

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

refactor: move governance metadata from typed core schema to `agentv-compliance` skill #1172

Phase 1 — Create `agentv-compliance` skill

Phase 3 — Reference CI Action (`compliance-lint`)