You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace the typed governance metadata schema in packages/core/ (introduced in #1165) with an AI-facing skill that teaches agents how to author OWASP / MITRE ATLAS / EU AI Act / ISO 42001 governance blocks correctly, plus a reference GitHub Action that loads the same skill to provide CI enforcement. Core retains only a genericmetadata: Record<string, unknown> pass-through that already exists in EvalMetadata / EvalTest.
The skill is the single source of truth for vocabulary and validation rules — used interactively for AI authoring and non-interactively (via the Action) for CI lint. No SDK package is needed: a Claude invocation with the skill loaded subsumes both use cases.
Three phases — ideally one PR per phase, all land in this issue:
Phase 1 — additive: create the skill.plugins/agentv-dev/skills/agentv-compliance/ with vocabulary, examples, and AI-authoring guidance. No core changes.
Phase 3 — additive: reference CI Action. Ship a small GitHub Action template under examples/governance/compliance-lint/ that invokes Claude with the agentv-compliance skill loaded against changed *.eval.yaml files and reports pass/fail per governance block.
Why
PR #1165 added typed fields (owasp_llm_top_10_2025, owasp_agentic_top_10_2025, mitre_atlas, controls, risk_tier, owner) plus a hardcoded vocabulary lint that emits soft warnings only. Investigation found:
Core does not consume any governance value. It parses, lints typo'd keys, and pass-throughs to JSONL. No grader, scorer, scheduler, or aggregator branches on the field values — they're text labels destined for downstream tooling (jq pipelines, the .ai-register.yaml aggregator from docs(examples): AI system register convention (.ai-register.yaml) + aggregator Action template #1167, attestation reports).
AGENTS.md design-principle test fails 3 of 4 critical checks:
Improve YAML eval format based on cross-repo analysis #2 Built-ins for primitives only — "needed by majority of users"? Compliance tagging is enterprise/regulated-industry territory. Most users won't tag suites with EU AI Act articles. Fail.
Fix eval yaml parser #4 Align with Industry Standards — promptfoo (the named reference) keeps these as data, not typed core fields. Fail.
refactor: workspace path #5 YAGNI — six named typed fields + vocabulary lint is a "bigger X than asked for". Fail.
AGENTS.md Feature: Workspace Path V3 #7 (AI-First Design) explicitly prefers "Skills over rigid commands" — "Skills should cover most use cases; rigid commands trade off AI intelligence." Soft warnings on typo'd OWASP keys are exactly the kind of guidance a skill replaces more elegantly than runtime code. And because a skill can be loaded by a CI Action, the skill subsumes both authoring and enforcement — no separate runtime package is needed.
Promptfoo precedent. Generic metadata: Record<string, any> in core (config reference). All framework presets (OWASP_LLM_TOP_10_MAPPING, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO 42001, GDPR) are static data tables in src/redteam/constants/frameworks.ts, opt-in via redteam.frameworks: [owasp:llm, mitre:atlas] — never typed onto the core test schema.
Release decoupling. EU AI Act delegated acts, OWASP versions, MITRE ATLAS revisions all churn on regulator/security cadence. With the skill model, content updates ship without touching the engine, and CI consumers automatically pick up new vocabulary on the next Action run.
Skill structure (follow conventions in sibling skills under plugins/agentv-dev/skills/):
SKILL.md — top-level skill description with description: frontmatter that triggers when an AI agent is authoring or editing a *.eval.yaml file with governance: metadata. Also describe the dual mode: "interactively this skill helps you author governance blocks; non-interactively (e.g., from a GitHub Action) it lints them."
references/owasp-llm-top-10-2025.md — the 10 categories with brief descriptions + canonical IDs (LLM01..LLM10), official source link.
references/owasp-agentic-top-10-2025.md — the agentic categories (T1..Tn) with descriptions.
references/mitre-atlas.md — common AML.Txxxx techniques relevant to LLM/agent eval, with link to https://atlas.mitre.org/.
references/eu-ai-act-risk-tiers.md — the four risk tiers (prohibited / high_risk / limited_risk / minimal_risk) with one-line definitions and EU AI Act article references.
references/iso-42001-controls.md — common controls relevant to AI eval (a curated subset, not exhaustive — point to the standard for completeness).
references/lint-rules.md — explicit rules an AI applies when asked to lint a governance block: known-key allowlist, allowed values per key, control-id shape regex, expected interactions between fields. This is what Phase 3's Action prompts the skill to apply.
Acceptance signals (Phase 1):
Skill loads cleanly when an AI agent edits a *.eval.yaml file with a governance: block (verify with the skill discovery mechanism the repo already uses).
Given a malformed governance block as input, the skill produces a structured lint report (pass/fail per rule + offending value + suggested fix). This rehearses the contract Phase 3 depends on.
All five reference files cite their official source (OWASP, MITRE, EU regulation text, ISO standard).
Non-goals (Phase 1):
Do not ship runtime code in this phase. The skill is markdown-only.
Do not ship curated mapping tables ("OWASP category → suggested attack patterns"). That's a separate content milestone.
Files to delete or substantially rewrite (in packages/core/src/evaluation/):
metadata.ts — delete GovernanceMetadataSchema and the typed governance field on whatever schema currently embeds it. Keep only a generic metadata: Record<string, unknown> (already present on EvalMetadata and EvalTest; verify and consolidate).
validation/eval-validator.ts — delete KNOWN_GOVERNANCE_FIELDS, EU_AI_ACT_RISK_TIERS, validateGovernance, isWellFormedControlId, and the lint call sites. Run bun run typecheck to surface dangling references.
yaml-parser.ts (lines 936–993, suite-↔-case metadata merge) — generalise: arrays concatenate-and-dedupe, scalars on the case override. Remove any governance-specific branches; the same rules apply to all keys under metadata.
orchestrator.ts — the metadata pass-through stays. Verify it still surfaces the merged metadata onto EvaluationResult exactly as today.
types.ts — metadata?: Record<string, unknown> on EvaluationResult stays unchanged.
Acceptance signals (Phase 2):
All red-team suites in examples/red-team/ run unchanged: bun apps/cli/src/cli.ts eval examples/red-team/suites/llm01-prompt-injection.eval.yaml --target azure --test-id direct-ignore-previous --output .agentv/results/uat-phase2 --budget-usd 0.05.
JSONL output is byte-identical for the merged metadata block compared to a baseline captured from main before this PR. The only behavioural diff: a typo'd OWASP key no longer produces a soft warning at load time (instead, lint moves to Phase 3's Action).
Do not add a generic registerMetadataValidator(name, fn) hook in this phase. YAGNI — the skill + Action covers enforcement; nothing in the runtime needs a hook.
Goal: demonstrate that the same skill that powers AI authoring also powers CI enforcement. Users wanting hard CI gating drop the Action into their workflow; users who don't want gating ignore it.
Location:examples/governance/compliance-lint/
Deliverable:
compliance-lint.yml — a reusable GitHub Action workflow file. On pull_request events affecting **/*.eval.yaml, it:
Checks out the repo + the agentv-compliance skill (from plugins/agentv-dev/skills/agentv-compliance/).
For each changed *.eval.yaml, extracts the governance: block and passes it to a Claude invocation with the skill loaded.
The Claude invocation returns a structured lint report (JSON: { pass: bool, violations: [{ rule, key, value, message, suggestion }] }).
The Action posts violations as PR review comments and exits non-zero on pass: false.
README.md — documents how a downstream consumer adopts this in their own repo: copy the workflow, point it at their skill location (or use the upstream skill via the claude-plugins-official marketplace mechanism the repo already uses), set ANTHROPIC_API_KEY secret, optionally narrow trigger paths.
A script/ step (small Python or TypeScript) that does the diff extraction and Claude invocation. Use the Claude API skill conventions documented in the repo. Cache aggressively to keep CI cost low.
Acceptance signals (Phase 3):
The Action runs against this repo's own examples/red-team/ suites in CI and reports pass: true for all of them.
A deliberately malformed governance block (e.g., risk_tier: super_high, owasp_llm_top_10_2025: [LLM99], malformed control id) produces a violation with a clear message and suggestion.
README walks a fresh consumer through adoption in under five minutes.
Per-run cost on a 10-file PR is documented (target: under 5 cents using a cheap Claude model).
Non-goals (Phase 3):
Do not make this Action mandatory on the agentv repo's own CI. It's a reference / opt-in. The core test gate stays as-is.
Do not build a full SDK around the Action. The script is intentionally minimal; users who want richer reporting can fork.
Do not support languages other than YAML governance blocks in v1.
Design latitude
Skill folder name (agentv-compliance proposed) is open. Mirror naming conventions in sibling skills.
Phase 3 can use Claude SDK, the claude CLI, or direct Anthropic API — pick whichever has the lightest setup in GitHub Actions and best caching behaviour.
One PR per phase preferred, but bundling Phases 1+2 in one PR is acceptable if the diff stays focused. Phase 3 should ideally land in its own PR so the Action can be iterated on without blocking the core slim.
Reference files in the skill can be split or merged differently than the breakdown above as long as the AI-authoring and lint-rule contracts are clear.
Objective
Replace the typed governance metadata schema in
packages/core/(introduced in #1165) with an AI-facing skill that teaches agents how to author OWASP / MITRE ATLAS / EU AI Act / ISO 42001 governance blocks correctly, plus a reference GitHub Action that loads the same skill to provide CI enforcement. Core retains only a genericmetadata: Record<string, unknown>pass-through that already exists inEvalMetadata/EvalTest.The skill is the single source of truth for vocabulary and validation rules — used interactively for AI authoring and non-interactively (via the Action) for CI lint. No SDK package is needed: a Claude invocation with the skill loaded subsumes both use cases.
Three phases — ideally one PR per phase, all land in this issue:
plugins/agentv-dev/skills/agentv-compliance/with vocabulary, examples, and AI-authoring guidance. No core changes.GovernanceMetadataSchema,KNOWN_GOVERNANCE_FIELDS,EU_AI_ACT_RISK_TIERS,validateGovernance, andisWellFormedControlIdfrompackages/core/. Make the suite-↔-casemetadatamerge inyaml-parser.tsgeneric (nogovernancespecial case). Existing red-team suites in feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack #1166/feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes #1168 continue to run unchanged because core still pass-throughsmetadata.examples/governance/compliance-lint/that invokes Claude with theagentv-complianceskill loaded against changed*.eval.yamlfiles and reports pass/fail per governance block.Why
PR #1165 added typed fields (
owasp_llm_top_10_2025,owasp_agentic_top_10_2025,mitre_atlas,controls,risk_tier,owner) plus a hardcoded vocabulary lint that emits soft warnings only. Investigation found:.ai-register.yamlaggregator from docs(examples): AI system register convention (.ai-register.yaml) + aggregator Action template #1167, attestation reports).metadatamap. Fail.metadata: Record<string, any>in core (config reference). All framework presets (OWASP_LLM_TOP_10_MAPPING, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO 42001, GDPR) are static data tables insrc/redteam/constants/frameworks.ts, opt-in viaredteam.frameworks: [owasp:llm, mitre:atlas]— never typed onto the core test schema.Phase 1 — Create
agentv-complianceskillLocation:
plugins/agentv-dev/skills/agentv-compliance/Skill structure (follow conventions in sibling skills under
plugins/agentv-dev/skills/):SKILL.md— top-level skill description withdescription:frontmatter that triggers when an AI agent is authoring or editing a*.eval.yamlfile withgovernance:metadata. Also describe the dual mode: "interactively this skill helps you author governance blocks; non-interactively (e.g., from a GitHub Action) it lints them."references/owasp-llm-top-10-2025.md— the 10 categories with brief descriptions + canonical IDs (LLM01..LLM10), official source link.references/owasp-agentic-top-10-2025.md— the agentic categories (T1..Tn) with descriptions.references/mitre-atlas.md— common AML.Txxxx techniques relevant to LLM/agent eval, with link to https://atlas.mitre.org/.references/eu-ai-act-risk-tiers.md— the four risk tiers (prohibited/high_risk/limited_risk/minimal_risk) with one-line definitions and EU AI Act article references.references/iso-42001-controls.md— common controls relevant to AI eval (a curated subset, not exhaustive — point to the standard for completeness).references/governance-yaml-shape.md— the YAML shape of a governance block, with at least three example blocks copied from the suites landed in feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack #1166/feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes #1168 (examples/red-team/suites/llm01-prompt-injection.eval.yaml,examples/red-team/archetypes/coding-agent/suites/destructive-git.eval.yaml). Document the<FRAMEWORK>-<VERSION>:<ID>control-id convention currently enforced inisWellFormedControlId.references/lint-rules.md— explicit rules an AI applies when asked to lint a governance block: known-key allowlist, allowed values per key, control-id shape regex, expected interactions between fields. This is what Phase 3's Action prompts the skill to apply.Acceptance signals (Phase 1):
*.eval.yamlfile with agovernance:block (verify with the skill discovery mechanism the repo already uses).Non-goals (Phase 1):
examples/content for this skill — the existing feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack #1166/feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes #1168 suites are the canonical examples.Phase 2 — Slim core
Files to delete or substantially rewrite (in
packages/core/src/evaluation/):metadata.ts— deleteGovernanceMetadataSchemaand the typedgovernancefield on whatever schema currently embeds it. Keep only a genericmetadata: Record<string, unknown>(already present onEvalMetadataandEvalTest; verify and consolidate).validation/eval-validator.ts— deleteKNOWN_GOVERNANCE_FIELDS,EU_AI_ACT_RISK_TIERS,validateGovernance,isWellFormedControlId, and the lint call sites. Runbun run typecheckto surface dangling references.yaml-parser.ts(lines 936–993, suite-↔-casemetadatamerge) — generalise: arrays concatenate-and-dedupe, scalars on the case override. Remove anygovernance-specific branches; the same rules apply to all keys undermetadata.orchestrator.ts— the metadata pass-through stays. Verify it still surfaces the mergedmetadataontoEvaluationResultexactly as today.types.ts—metadata?: Record<string, unknown>onEvaluationResultstays unchanged.Acceptance signals (Phase 2):
examples/red-team/run unchanged:bun apps/cli/src/cli.ts eval examples/red-team/suites/llm01-prompt-injection.eval.yaml --target azure --test-id direct-ignore-previous --output .agentv/results/uat-phase2 --budget-usd 0.05.metadatablock compared to a baseline captured frommainbefore this PR. The only behavioural diff: a typo'd OWASP key no longer produces a soft warning at load time (instead, lint moves to Phase 3's Action).bun run typecheckandbun run testare clean.Non-goals (Phase 2):
registerMetadataValidator(name, fn)hook in this phase. YAGNI — the skill + Action covers enforcement; nothing in the runtime needs a hook.examples/governance/ai-register/(from docs(examples): AI system register convention (.ai-register.yaml) + aggregator Action template #1167); it consumes the JSONL output, not the schema.Phase 3 — Reference CI Action (
compliance-lint)Goal: demonstrate that the same skill that powers AI authoring also powers CI enforcement. Users wanting hard CI gating drop the Action into their workflow; users who don't want gating ignore it.
Location:
examples/governance/compliance-lint/Deliverable:
compliance-lint.yml— a reusable GitHub Action workflow file. Onpull_requestevents affecting**/*.eval.yaml, it:agentv-complianceskill (fromplugins/agentv-dev/skills/agentv-compliance/).*.eval.yaml, extracts thegovernance:block and passes it to a Claude invocation with the skill loaded.{ pass: bool, violations: [{ rule, key, value, message, suggestion }] }).pass: false.README.md— documents how a downstream consumer adopts this in their own repo: copy the workflow, point it at their skill location (or use the upstream skill via theclaude-plugins-officialmarketplace mechanism the repo already uses), setANTHROPIC_API_KEYsecret, optionally narrow trigger paths.script/step (small Python or TypeScript) that does the diff extraction and Claude invocation. Use the Claude API skill conventions documented in the repo. Cache aggressively to keep CI cost low.Acceptance signals (Phase 3):
examples/red-team/suites in CI and reportspass: truefor all of them.risk_tier: super_high,owasp_llm_top_10_2025: [LLM99], malformed control id) produces a violation with a clear message and suggestion.Non-goals (Phase 3):
Design latitude
agentv-complianceproposed) is open. Mirror naming conventions in sibling skills.claudeCLI, or direct Anthropic API — pick whichever has the lightest setup in GitHub Actions and best caching behaviour.Related
.ai-register.yamlaggregator (canonical downstream consumer ofgovernance.*)