feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack#1166
feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack#1166
Conversation
…pack Adds an opt-in `examples/red-team/` pack with 60 adversarial cases spanning OWASP LLM Top 10 v2025 IDs LLM01/02/06/07/08/10, OWASP Top 10 for Agentic Applications T1/T6/T7, and MITRE ATLAS v5.4 techniques AML.T0044/T0050/ T0051/T0070/T0075/T0076. Three reusable LLM-grader rubrics in `graders/` (refusal, pii-leak, tool-abuse) compose with existing primitives — no new core changes, no new dependencies. Every case is tagged with `metadata.governance` per the schema in #1161, so JSONL artifacts can be aggregated by control. Suite-level `governance:` anchors share tags across cases via YAML aliases; the validator on main emits a soft "unknown field" warning here that goes away once #1161 merges. README documents provenance + license per seed corpus (InjecAgent / AgentDojo / AgentHarm / NVIDIA Garak / promptfoo / OWASP / MITRE ATLAS). Closes #1162
4be3b55 to
cb708b5
Compare
Deploying agentv with
|
| Latest commit: |
cb708b5
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://0f683f4b.agentv.pages.dev |
| Branch Preview URL: | https://feat-1162-redteam-pack.agentv.pages.dev |
Manual UATRed (main, pack absent): Confirms Green (rebased branch, gpt-5.4-mini via (Note: ran with Both JSONL rows carry Side observation: the Azure deployment's content filter blocked both prompt-injection prompts at the model layer ( Rebased onto cd76bf8 (#1165 orchestrator change for governance metadata). Clean rebase, no conflicts. |
The `yaml` package leaves the YAML 1.1 merge key (`<<: *anchor`) as a literal sibling key when parsing in YAML 1.2 mode (its default). Since PR #1165 surfaced suite-level governance into JSONL, this caused a literal `"<<"` key to leak into `metadata.governance` for any case that used merge syntax (e.g. red-team suites authored in #1166). Funnel every YAML parse through a new `parseYamlValue` helper that sets `{ merge: true }`, so merge keys are unwrapped at the parse boundary once and downstream consumers (loaders, validators, JSONL artifacts) all benefit consistently. Promptfoo handles this via js-yaml whose default schema already supports merge keys; we get equivalent behavior. Regression test asserts `<<` is not retained as a key after parsing a document with `<<: *anchor`.
Closes #1162
Summary
examples/red-team/directory with 60 cases across 9 suites and 3 reusable LLM-grader rubrics (refusal.md,pii-leak.md,tool-abuse.md).packages/coreorapps/cli. Composes existing primitives only (llm-grader,contains±negate,icontains-any).metadata.governanceper the schema in feat(core): optional governance metadata on EvalMetadata and EvalTest (OWASP / NIST / ATLAS / controls) #1161; suite-levelgovernance:is shared across cases via YAML anchors so duplication is minimal.Coverage (acceptance signal #6)
Cases per suite: llm01=10, llm02=6, llm06=8, llm07=6, llm08=5, llm10=5, agentic-memory=6, agentic-tool=8, atlas=6 → 60 total, inside the 60-100 acceptance band.
Manual test plan (green)
1. Inventory. Pack ships at least 8 suites + 3 graders + README.
2. Every case is tagged (yq → python equivalent —
yqnot present in this env):6. Coverage across taxonomies — listed above.
7. License / provenance is auditable. README has the attribution paragraph; each seed corpus is named with its license (MIT for InjecAgent / AgentDojo / AgentHarm / promptfoo, Apache-2.0 for Garak, public for MITRE ATLAS, CC-BY-SA 4.0 for the OWASP catalogs).
Validation. All 9 suites pass
agentv validate. The single soft warning per suite ([governance] Unknown field 'governance'. This field will be ignored.) is the expected interaction with main before #1161 lands — the per-casemetadata.governanceblocks ride through unchanged becauseEvalTest.metadatais alreadyRecord<string, unknown>. Once #1161 merges, the warning goes away.Dry-run smoke test — every suite parses end-to-end, runs all cases against a
--dry-run --target llmtarget, and writes JSONL artifacts:Live red/green differential (test plan steps 3–5: vulnerable target vs aligned target) — not run in this PR. Each suite is wired to
execution.target: llmso a reviewer can point it at any backend intargets.yamland observe the differential. The cost of running 60 cases × 2 targets × frontier models is meaningful and the pack is opt-in; flagging this so reviewers can decide whether to run a sample suite themselves.Pre-push hook (Build / Typecheck / Lint / Test / Validate eval YAML files): all
Passed.Quality-gate self-check
packages/core/orapps/cli/owasp_llm_top_10_2025tagagentv initdefaults🤖 Generated with Claude Code