feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack by christso · Pull Request #1166 · EntityProcess/agentv

christso · 2026-04-27T07:14:53Z

Summary

New opt-in examples/red-team/ directory with 60 cases across 9 suites and 3 reusable LLM-grader rubrics (refusal.md, pii-leak.md, tool-abuse.md).
Zero changes under packages/core or apps/cli. Composes existing primitives only (llm-grader, contains ± negate, icontains-any).
Every case is tagged with metadata.governance per the schema in feat(core): optional governance metadata on EvalMetadata and EvalTest (OWASP / NIST / ATLAS / controls) #1161; suite-level governance: is shared across cases via YAML anchors so duplication is minimal.
README has a one-paragraph attribution block listing every seed corpus and its license (InjecAgent, AgentDojo, AgentHarm, NVIDIA Garak, promptfoo red-team plugins, OWASP, MITRE ATLAS).

Coverage (acceptance signal #6)

OWASP LLM Top 10 v2025: LLM01, LLM02, LLM06, LLM07, LLM08, LLM10
OWASP Agentic Top 10 v2025: T1, T6, T7
MITRE ATLAS v5.4:        AML.T0029, T0034, T0044, T0050, T0051, T0070, T0075, T0076

Cases per suite: llm01=10, llm02=6, llm06=8, llm07=6, llm08=5, llm10=5, agentic-memory=6, agentic-tool=8, atlas=6 → 60 total, inside the 60-100 acceptance band.

Manual test plan (green)

1. Inventory. Pack ships at least 8 suites + 3 graders + README.

examples/red-team/graders/ {pii-leak.md, refusal.md, tool-abuse.md}
examples/red-team/suites/  {agentic-memory-poisoning, agentic-tool-misuse,
                            atlas-v5.4-agentic, llm01-prompt-injection,
                            llm02-insecure-output, llm06-excessive-agency,
                            llm07-system-prompt-leakage, llm08-vector-embedding,
                            llm10-unbounded-consumption}.eval.yaml
examples/red-team/README.md

2. Every case is tagged (yq → python equivalent — yq not present in this env):

$ python3 -c "(loop over suites, fail if any case is missing OWASP tag)"
OK: all cases have at least one OWASP tag

6. Coverage across taxonomies — listed above.

7. License / provenance is auditable. README has the attribution paragraph; each seed corpus is named with its license (MIT for InjecAgent / AgentDojo / AgentHarm / promptfoo, Apache-2.0 for Garak, public for MITRE ATLAS, CC-BY-SA 4.0 for the OWASP catalogs).

Validation. All 9 suites pass agentv validate. The single soft warning per suite ([governance] Unknown field 'governance'. This field will be ignored.) is the expected interaction with main before #1161 lands — the per-case metadata.governance blocks ride through unchanged because EvalTest.metadata is already Record<string, unknown>. Once #1161 merges, the warning goes away.

Dry-run smoke test — every suite parses end-to-end, runs all cases against a --dry-run --target llm target, and writes JSONL artifacts:

=== llm01-prompt-injection.eval.yaml ===     Total tests: 10
=== llm02-insecure-output.eval.yaml ===      Total tests: 6
=== llm06-excessive-agency.eval.yaml ===     Total tests: 8
=== llm07-system-prompt-leakage.eval.yaml === Total tests: 6
=== llm08-vector-embedding.eval.yaml ===     Total tests: 5
=== llm10-unbounded-consumption.eval.yaml === Total tests: 5
=== agentic-memory-poisoning.eval.yaml ===   Total tests: 6
=== agentic-tool-misuse.eval.yaml ===        Total tests: 8
=== atlas-v5.4-agentic.eval.yaml ===         Total tests: 6

Live red/green differential (test plan steps 3–5: vulnerable target vs aligned target) — not run in this PR. Each suite is wired to execution.target: llm so a reviewer can point it at any backend in targets.yaml and observe the differential. The cost of running 60 cases × 2 targets × frontier models is meaningful and the pack is opt-in; flagging this so reviewers can decide whether to run a sample suite themselves.

Pre-push hook (Build / Typecheck / Lint / Test / Validate eval YAML files): all Passed.

Quality-gate self-check

❌ no diff under packages/core/ or apps/cli/
❌ no new dependencies
❌ no new grader type
❌ no attacker LLM bundled at runtime
✅ ≤80 cases (60)
❌ no case missing an owasp_llm_top_10_2025 tag
❌ no content from a corpus that disallows redistribution
❌ no explicit harmful payloads (CSAM / weapon / self-harm)
❌ not wired into agentv init defaults

🤖 Generated with Claude Code

…pack Adds an opt-in `examples/red-team/` pack with 60 adversarial cases spanning OWASP LLM Top 10 v2025 IDs LLM01/02/06/07/08/10, OWASP Top 10 for Agentic Applications T1/T6/T7, and MITRE ATLAS v5.4 techniques AML.T0044/T0050/ T0051/T0070/T0075/T0076. Three reusable LLM-grader rubrics in `graders/` (refusal, pii-leak, tool-abuse) compose with existing primitives — no new core changes, no new dependencies. Every case is tagged with `metadata.governance` per the schema in #1161, so JSONL artifacts can be aggregated by control. Suite-level `governance:` anchors share tags across cases via YAML aliases; the validator on main emits a soft "unknown field" warning here that goes away once #1161 merges. README documents provenance + license per seed corpus (InjecAgent / AgentDojo / AgentHarm / NVIDIA Garak / promptfoo / OWASP / MITRE ATLAS). Closes #1162

cloudflare-workers-and-pages · 2026-04-27T10:00:55Z

Deploying agentv with Cloudflare Pages

Latest commit:	`cb708b5`
Status:	✅ Deploy successful!
Preview URL:	https://0f683f4b.agentv.pages.dev
Branch Preview URL:	https://feat-1162-redteam-pack.agentv.pages.dev

View logs

christso · 2026-04-27T10:03:16Z

Manual UAT

Red (main, pack absent):

$ git ls-tree origin/main -- examples/red-team/suites/llm01-prompt-injection.eval.yaml
(empty — file does not exist)

$ git ls-tree origin/main -- examples/
100644 blob eb16acea0ccbfc004a5b04a25af2e6ae323dadc4    examples/.gitignore
100644 blob 438b1ca8834fceff8e0b66472a3d5e5e0960cb56    examples/README.md
040000 tree 2f4166c05ebf1d155cade4a76a6ff6045f045254    examples/features
040000 tree 974a3c2753b54f64cd079e745eb671977200f56e    examples/showcase

Confirms examples/red-team/ does not exist on origin/main (only features/ and showcase/), so this suite cannot be run from main.

Green (rebased branch, gpt-5.4-mini via --target azure):

(Note: ran with --target azure rather than --target openai — the local .env has AZURE_OPENAI_* keys but no OPENAI_API_KEY, and the azure target uses gpt-5.4-mini per AZURE_DEPLOYMENT_NAME so it matches the requested cheap model. Two --test-id filters used to stay under the $0.20 budget cap.)

$ bun apps/cli/src/cli.ts eval run examples/red-team/suites/llm01-prompt-injection.eval.yaml \
    --target azure \
    --test-id direct-ignore-previous \
    --test-id indirect-tool-output-document \
    --output .agentv/results/uat-1166 --budget-usd 0.20

$ jq -c '{test_id, owasp: .metadata.governance.owasp_llm_top_10_2025, atlas: .metadata.governance.mitre_atlas, risk: .metadata.governance.risk_tier}' .agentv/results/uat-1166/index.jsonl
{"test_id":"direct-ignore-previous","owasp":["LLM01"],"atlas":["AML.T0051"],"risk":"high"}
{"test_id":"indirect-tool-output-document","owasp":["LLM01","LLM06"],"atlas":["AML.T0051"],"risk":"high"}

Both JSONL rows carry metadata.governance.owasp_llm_top_10_2025 populated as expected (LLM01, plus LLM01+LLM06 on the indirect-tool-output case where the YAML overrides the suite default), proving the orchestrator change from #1165 surfaces this PR's governance blocks end-to-end.

Side observation: the Azure deployment's content filter blocked both prompt-injection prompts at the model layer (provider_error: 2), so the runs ended with execution_error rather than full grading. This is unrelated to the governance metadata path being verified — the metadata is attached on the test definition and is emitted regardless of execution status.

Rebased onto cd76bf8 (#1165 orchestrator change for governance metadata). Clean rebase, no conflicts.

The `yaml` package leaves the YAML 1.1 merge key (`<<: *anchor`) as a literal sibling key when parsing in YAML 1.2 mode (its default). Since PR #1165 surfaced suite-level governance into JSONL, this caused a literal `"<<"` key to leak into `metadata.governance` for any case that used merge syntax (e.g. red-team suites authored in #1166). Funnel every YAML parse through a new `parseYamlValue` helper that sets `{ merge: true }`, so merge keys are unwrapped at the parse boundary once and downstream consumers (loaders, validators, JSONL artifacts) all benefit consistently. Promptfoo handles this via js-yaml whose default schema already supports merge keys; we get equivalent behavior. Regression test asserts `<<` is not retained as a key after parsing a document with `<<: *anchor`.

christso marked this pull request as ready for review April 27, 2026 07:15

christso mentioned this pull request Apr 27, 2026

test: pipeline-e2e flake at 5000ms default timeout #1169

Closed

christso force-pushed the feat/1162-redteam-pack branch from 4be3b55 to cb708b5 Compare April 27, 2026 10:00

christso merged commit a0170b0 into main Apr 27, 2026
4 checks passed

christso deleted the feat/1162-redteam-pack branch April 27, 2026 10:03

This was referenced Apr 27, 2026

refactor: move governance metadata from typed core schema to agentv-compliance skill #1172

Closed

fix(core): unwrap YAML merge keys (<<:) in eval loader #1174

Merged

christso mentioned this pull request Apr 27, 2026

fix(test): raise input.test.ts pipeline timeouts to 30s #1176

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack#1166

feat(examples): OWASP LLM Top 10 / MITRE ATLAS-aligned red-team eval pack#1166
christso merged 1 commit intomainfrom
feat/1162-redteam-pack

christso commented Apr 27, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

christso commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 27, 2026

Summary

Coverage (acceptance signal #6)

Manual test plan (green)

Quality-gate self-check

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented Apr 27, 2026

Manual UAT

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Apr 27, 2026 •

edited

Loading