Skip to content

refactor: move governance metadata from typed core schema to agentv-compliance skill #1172

@christso

Description

@christso

Objective

Replace the typed governance metadata schema in packages/core/ (introduced in #1165) with an AI-facing skill that teaches agents how to author OWASP / MITRE ATLAS / EU AI Act / ISO 42001 governance blocks correctly, plus a reference GitHub Action that loads the same skill to provide CI enforcement. Core retains only a generic metadata: Record<string, unknown> pass-through that already exists in EvalMetadata / EvalTest.

The skill is the single source of truth for vocabulary and validation rules — used interactively for AI authoring and non-interactively (via the Action) for CI lint. No SDK package is needed: a Claude invocation with the skill loaded subsumes both use cases.

Three phases — ideally one PR per phase, all land in this issue:

  1. Phase 1 — additive: create the skill. plugins/agentv-dev/skills/agentv-compliance/ with vocabulary, examples, and AI-authoring guidance. No core changes.
  2. Phase 2 — subtractive: slim core. Once Phase 1 is merged, remove GovernanceMetadataSchema, KNOWN_GOVERNANCE_FIELDS, EU_AI_ACT_RISK_TIERS, validateGovernance, and isWellFormedControlId from packages/core/. Make the suite-↔-case metadata merge in yaml-parser.ts generic (no governance special case). Existing red-team suites in feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack #1166/feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes #1168 continue to run unchanged because core still pass-throughs metadata.
  3. Phase 3 — additive: reference CI Action. Ship a small GitHub Action template under examples/governance/compliance-lint/ that invokes Claude with the agentv-compliance skill loaded against changed *.eval.yaml files and reports pass/fail per governance block.

Why

PR #1165 added typed fields (owasp_llm_top_10_2025, owasp_agentic_top_10_2025, mitre_atlas, controls, risk_tier, owner) plus a hardcoded vocabulary lint that emits soft warnings only. Investigation found:

  • Core does not consume any governance value. It parses, lints typo'd keys, and pass-throughs to JSONL. No grader, scorer, scheduler, or aggregator branches on the field values — they're text labels destined for downstream tooling (jq pipelines, the .ai-register.yaml aggregator from docs(examples): AI system register convention (.ai-register.yaml) + aggregator Action template #1167, attestation reports).
  • AGENTS.md design-principle test fails 3 of 4 critical checks:
  • AGENTS.md Feature: Workspace Path V3 #7 (AI-First Design) explicitly prefers "Skills over rigid commands" — "Skills should cover most use cases; rigid commands trade off AI intelligence." Soft warnings on typo'd OWASP keys are exactly the kind of guidance a skill replaces more elegantly than runtime code. And because a skill can be loaded by a CI Action, the skill subsumes both authoring and enforcement — no separate runtime package is needed.
  • Promptfoo precedent. Generic metadata: Record<string, any> in core (config reference). All framework presets (OWASP_LLM_TOP_10_MAPPING, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO 42001, GDPR) are static data tables in src/redteam/constants/frameworks.ts, opt-in via redteam.frameworks: [owasp:llm, mitre:atlas] — never typed onto the core test schema.
  • Release decoupling. EU AI Act delegated acts, OWASP versions, MITRE ATLAS revisions all churn on regulator/security cadence. With the skill model, content updates ship without touching the engine, and CI consumers automatically pick up new vocabulary on the next Action run.

Phase 1 — Create agentv-compliance skill

Location: plugins/agentv-dev/skills/agentv-compliance/

Skill structure (follow conventions in sibling skills under plugins/agentv-dev/skills/):

  • SKILL.md — top-level skill description with description: frontmatter that triggers when an AI agent is authoring or editing a *.eval.yaml file with governance: metadata. Also describe the dual mode: "interactively this skill helps you author governance blocks; non-interactively (e.g., from a GitHub Action) it lints them."
  • references/owasp-llm-top-10-2025.md — the 10 categories with brief descriptions + canonical IDs (LLM01..LLM10), official source link.
  • references/owasp-agentic-top-10-2025.md — the agentic categories (T1..Tn) with descriptions.
  • references/mitre-atlas.md — common AML.Txxxx techniques relevant to LLM/agent eval, with link to https://atlas.mitre.org/.
  • references/eu-ai-act-risk-tiers.md — the four risk tiers (prohibited / high_risk / limited_risk / minimal_risk) with one-line definitions and EU AI Act article references.
  • references/iso-42001-controls.md — common controls relevant to AI eval (a curated subset, not exhaustive — point to the standard for completeness).
  • references/governance-yaml-shape.md — the YAML shape of a governance block, with at least three example blocks copied from the suites landed in feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack #1166/feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes #1168 (examples/red-team/suites/llm01-prompt-injection.eval.yaml, examples/red-team/archetypes/coding-agent/suites/destructive-git.eval.yaml). Document the <FRAMEWORK>-<VERSION>:<ID> control-id convention currently enforced in isWellFormedControlId.
  • references/lint-rules.md — explicit rules an AI applies when asked to lint a governance block: known-key allowlist, allowed values per key, control-id shape regex, expected interactions between fields. This is what Phase 3's Action prompts the skill to apply.

Acceptance signals (Phase 1):

Non-goals (Phase 1):


Phase 2 — Slim core

Files to delete or substantially rewrite (in packages/core/src/evaluation/):

  • metadata.ts — delete GovernanceMetadataSchema and the typed governance field on whatever schema currently embeds it. Keep only a generic metadata: Record<string, unknown> (already present on EvalMetadata and EvalTest; verify and consolidate).
  • validation/eval-validator.ts — delete KNOWN_GOVERNANCE_FIELDS, EU_AI_ACT_RISK_TIERS, validateGovernance, isWellFormedControlId, and the lint call sites. Run bun run typecheck to surface dangling references.
  • yaml-parser.ts (lines 936–993, suite-↔-case metadata merge) — generalise: arrays concatenate-and-dedupe, scalars on the case override. Remove any governance-specific branches; the same rules apply to all keys under metadata.
  • orchestrator.ts — the metadata pass-through stays. Verify it still surfaces the merged metadata onto EvaluationResult exactly as today.
  • types.tsmetadata?: Record<string, unknown> on EvaluationResult stays unchanged.

Acceptance signals (Phase 2):

  • All red-team suites in examples/red-team/ run unchanged: bun apps/cli/src/cli.ts eval examples/red-team/suites/llm01-prompt-injection.eval.yaml --target azure --test-id direct-ignore-previous --output .agentv/results/uat-phase2 --budget-usd 0.05.
  • JSONL output is byte-identical for the merged metadata block compared to a baseline captured from main before this PR. The only behavioural diff: a typo'd OWASP key no longer produces a soft warning at load time (instead, lint moves to Phase 3's Action).
  • bun run typecheck and bun run test are clean.
  • Pre-push hook passes (depends on fix(test): raise pipeline-e2e timeout to 30s (#1169) #1170 + the input.test.ts timeout follow-up landing first).

Non-goals (Phase 2):


Phase 3 — Reference CI Action (compliance-lint)

Goal: demonstrate that the same skill that powers AI authoring also powers CI enforcement. Users wanting hard CI gating drop the Action into their workflow; users who don't want gating ignore it.

Location: examples/governance/compliance-lint/

Deliverable:

  • compliance-lint.yml — a reusable GitHub Action workflow file. On pull_request events affecting **/*.eval.yaml, it:
    1. Checks out the repo + the agentv-compliance skill (from plugins/agentv-dev/skills/agentv-compliance/).
    2. For each changed *.eval.yaml, extracts the governance: block and passes it to a Claude invocation with the skill loaded.
    3. The Claude invocation returns a structured lint report (JSON: { pass: bool, violations: [{ rule, key, value, message, suggestion }] }).
    4. The Action posts violations as PR review comments and exits non-zero on pass: false.
  • README.md — documents how a downstream consumer adopts this in their own repo: copy the workflow, point it at their skill location (or use the upstream skill via the claude-plugins-official marketplace mechanism the repo already uses), set ANTHROPIC_API_KEY secret, optionally narrow trigger paths.
  • A script/ step (small Python or TypeScript) that does the diff extraction and Claude invocation. Use the Claude API skill conventions documented in the repo. Cache aggressively to keep CI cost low.

Acceptance signals (Phase 3):

  • The Action runs against this repo's own examples/red-team/ suites in CI and reports pass: true for all of them.
  • A deliberately malformed governance block (e.g., risk_tier: super_high, owasp_llm_top_10_2025: [LLM99], malformed control id) produces a violation with a clear message and suggestion.
  • README walks a fresh consumer through adoption in under five minutes.
  • Per-run cost on a 10-file PR is documented (target: under 5 cents using a cheap Claude model).

Non-goals (Phase 3):

  • Do not make this Action mandatory on the agentv repo's own CI. It's a reference / opt-in. The core test gate stays as-is.
  • Do not build a full SDK around the Action. The script is intentionally minimal; users who want richer reporting can fork.
  • Do not support languages other than YAML governance blocks in v1.

Design latitude

  • Skill folder name (agentv-compliance proposed) is open. Mirror naming conventions in sibling skills.
  • Phase 3 can use Claude SDK, the claude CLI, or direct Anthropic API — pick whichever has the lightest setup in GitHub Actions and best caching behaviour.
  • One PR per phase preferred, but bundling Phases 1+2 in one PR is acceptable if the diff stays focused. Phase 3 should ideally land in its own PR so the Action can be iterated on without blocking the core slim.
  • Reference files in the skill can be split or merged differently than the breakdown above as long as the AI-authoring and lint-rule contracts are clear.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions