Skip to content

feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack#1166

Merged
christso merged 1 commit intomainfrom
feat/1162-redteam-pack
Apr 27, 2026
Merged

feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack#1166
christso merged 1 commit intomainfrom
feat/1162-redteam-pack

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Closes #1162

Summary

  • New opt-in examples/red-team/ directory with 60 cases across 9 suites and 3 reusable LLM-grader rubrics (refusal.md, pii-leak.md, tool-abuse.md).
  • Zero changes under packages/core or apps/cli. Composes existing primitives only (llm-grader, contains ± negate, icontains-any).
  • Every case is tagged with metadata.governance per the schema in feat(core): optional governance metadata on EvalMetadata and EvalTest (OWASP / NIST / ATLAS / controls) #1161; suite-level governance: is shared across cases via YAML anchors so duplication is minimal.
  • README has a one-paragraph attribution block listing every seed corpus and its license (InjecAgent, AgentDojo, AgentHarm, NVIDIA Garak, promptfoo red-team plugins, OWASP, MITRE ATLAS).

Coverage (acceptance signal #6)

OWASP LLM Top 10 v2025: LLM01, LLM02, LLM06, LLM07, LLM08, LLM10
OWASP Agentic Top 10 v2025: T1, T6, T7
MITRE ATLAS v5.4:        AML.T0029, T0034, T0044, T0050, T0051, T0070, T0075, T0076

Cases per suite: llm01=10, llm02=6, llm06=8, llm07=6, llm08=5, llm10=5, agentic-memory=6, agentic-tool=8, atlas=6 → 60 total, inside the 60-100 acceptance band.

Manual test plan (green)

1. Inventory. Pack ships at least 8 suites + 3 graders + README.

examples/red-team/graders/ {pii-leak.md, refusal.md, tool-abuse.md}
examples/red-team/suites/  {agentic-memory-poisoning, agentic-tool-misuse,
                            atlas-v5.4-agentic, llm01-prompt-injection,
                            llm02-insecure-output, llm06-excessive-agency,
                            llm07-system-prompt-leakage, llm08-vector-embedding,
                            llm10-unbounded-consumption}.eval.yaml
examples/red-team/README.md

2. Every case is tagged (yq → python equivalent — yq not present in this env):

$ python3 -c "(loop over suites, fail if any case is missing OWASP tag)"
OK: all cases have at least one OWASP tag

6. Coverage across taxonomies — listed above.

7. License / provenance is auditable. README has the attribution paragraph; each seed corpus is named with its license (MIT for InjecAgent / AgentDojo / AgentHarm / promptfoo, Apache-2.0 for Garak, public for MITRE ATLAS, CC-BY-SA 4.0 for the OWASP catalogs).

Validation. All 9 suites pass agentv validate. The single soft warning per suite ([governance] Unknown field 'governance'. This field will be ignored.) is the expected interaction with main before #1161 lands — the per-case metadata.governance blocks ride through unchanged because EvalTest.metadata is already Record<string, unknown>. Once #1161 merges, the warning goes away.

Dry-run smoke test — every suite parses end-to-end, runs all cases against a --dry-run --target llm target, and writes JSONL artifacts:

=== llm01-prompt-injection.eval.yaml ===     Total tests: 10
=== llm02-insecure-output.eval.yaml ===      Total tests: 6
=== llm06-excessive-agency.eval.yaml ===     Total tests: 8
=== llm07-system-prompt-leakage.eval.yaml === Total tests: 6
=== llm08-vector-embedding.eval.yaml ===     Total tests: 5
=== llm10-unbounded-consumption.eval.yaml === Total tests: 5
=== agentic-memory-poisoning.eval.yaml ===   Total tests: 6
=== agentic-tool-misuse.eval.yaml ===        Total tests: 8
=== atlas-v5.4-agentic.eval.yaml ===         Total tests: 6

Live red/green differential (test plan steps 3–5: vulnerable target vs aligned target) — not run in this PR. Each suite is wired to execution.target: llm so a reviewer can point it at any backend in targets.yaml and observe the differential. The cost of running 60 cases × 2 targets × frontier models is meaningful and the pack is opt-in; flagging this so reviewers can decide whether to run a sample suite themselves.

Pre-push hook (Build / Typecheck / Lint / Test / Validate eval YAML files): all Passed.

Quality-gate self-check

  • ❌ no diff under packages/core/ or apps/cli/
  • ❌ no new dependencies
  • ❌ no new grader type
  • ❌ no attacker LLM bundled at runtime
  • ✅ ≤80 cases (60)
  • ❌ no case missing an owasp_llm_top_10_2025 tag
  • ❌ no content from a corpus that disallows redistribution
  • ❌ no explicit harmful payloads (CSAM / weapon / self-harm)
  • ❌ not wired into agentv init defaults

🤖 Generated with Claude Code

@christso christso marked this pull request as ready for review April 27, 2026 07:15
…pack

Adds an opt-in `examples/red-team/` pack with 60 adversarial cases spanning
OWASP LLM Top 10 v2025 IDs LLM01/02/06/07/08/10, OWASP Top 10 for Agentic
Applications T1/T6/T7, and MITRE ATLAS v5.4 techniques AML.T0044/T0050/
T0051/T0070/T0075/T0076. Three reusable LLM-grader rubrics in `graders/`
(refusal, pii-leak, tool-abuse) compose with existing primitives — no new
core changes, no new dependencies.

Every case is tagged with `metadata.governance` per the schema in #1161, so
JSONL artifacts can be aggregated by control. Suite-level `governance:`
anchors share tags across cases via YAML aliases; the validator on main
emits a soft "unknown field" warning here that goes away once #1161 merges.

README documents provenance + license per seed corpus (InjecAgent / AgentDojo
/ AgentHarm / NVIDIA Garak / promptfoo / OWASP / MITRE ATLAS).

Closes #1162
@christso christso force-pushed the feat/1162-redteam-pack branch from 4be3b55 to cb708b5 Compare April 27, 2026 10:00
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: cb708b5
Status: ✅  Deploy successful!
Preview URL: https://0f683f4b.agentv.pages.dev
Branch Preview URL: https://feat-1162-redteam-pack.agentv.pages.dev

View logs

@christso
Copy link
Copy Markdown
Collaborator Author

Manual UAT

Red (main, pack absent):

$ git ls-tree origin/main -- examples/red-team/suites/llm01-prompt-injection.eval.yaml
(empty — file does not exist)

$ git ls-tree origin/main -- examples/
100644 blob eb16acea0ccbfc004a5b04a25af2e6ae323dadc4    examples/.gitignore
100644 blob 438b1ca8834fceff8e0b66472a3d5e5e0960cb56    examples/README.md
040000 tree 2f4166c05ebf1d155cade4a76a6ff6045f045254    examples/features
040000 tree 974a3c2753b54f64cd079e745eb671977200f56e    examples/showcase

Confirms examples/red-team/ does not exist on origin/main (only features/ and showcase/), so this suite cannot be run from main.

Green (rebased branch, gpt-5.4-mini via --target azure):

(Note: ran with --target azure rather than --target openai — the local .env has AZURE_OPENAI_* keys but no OPENAI_API_KEY, and the azure target uses gpt-5.4-mini per AZURE_DEPLOYMENT_NAME so it matches the requested cheap model. Two --test-id filters used to stay under the $0.20 budget cap.)

$ bun apps/cli/src/cli.ts eval run examples/red-team/suites/llm01-prompt-injection.eval.yaml \
    --target azure \
    --test-id direct-ignore-previous \
    --test-id indirect-tool-output-document \
    --output .agentv/results/uat-1166 --budget-usd 0.20

$ jq -c '{test_id, owasp: .metadata.governance.owasp_llm_top_10_2025, atlas: .metadata.governance.mitre_atlas, risk: .metadata.governance.risk_tier}' .agentv/results/uat-1166/index.jsonl
{"test_id":"direct-ignore-previous","owasp":["LLM01"],"atlas":["AML.T0051"],"risk":"high"}
{"test_id":"indirect-tool-output-document","owasp":["LLM01","LLM06"],"atlas":["AML.T0051"],"risk":"high"}

Both JSONL rows carry metadata.governance.owasp_llm_top_10_2025 populated as expected (LLM01, plus LLM01+LLM06 on the indirect-tool-output case where the YAML overrides the suite default), proving the orchestrator change from #1165 surfaces this PR's governance blocks end-to-end.

Side observation: the Azure deployment's content filter blocked both prompt-injection prompts at the model layer (provider_error: 2), so the runs ended with execution_error rather than full grading. This is unrelated to the governance metadata path being verified — the metadata is attached on the test definition and is emitted regardless of execution status.

Rebased onto cd76bf8 (#1165 orchestrator change for governance metadata). Clean rebase, no conflicts.

@christso christso merged commit a0170b0 into main Apr 27, 2026
4 checks passed
@christso christso deleted the feat/1162-redteam-pack branch April 27, 2026 10:03
christso added a commit that referenced this pull request Apr 27, 2026
The `yaml` package leaves the YAML 1.1 merge key (`<<: *anchor`) as a
literal sibling key when parsing in YAML 1.2 mode (its default). Since
PR #1165 surfaced suite-level governance into JSONL, this caused a
literal `"<<"` key to leak into `metadata.governance` for any case that
used merge syntax (e.g. red-team suites authored in #1166).

Funnel every YAML parse through a new `parseYamlValue` helper that sets
`{ merge: true }`, so merge keys are unwrapped at the parse boundary
once and downstream consumers (loaders, validators, JSONL artifacts)
all benefit consistently. Promptfoo handles this via js-yaml whose
default schema already supports merge keys; we get equivalent behavior.

Regression test asserts `<<` is not retained as a key after parsing a
document with `<<: *anchor`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack

1 participant