Agent
azure-resource-deployer — source: .github/agents/azure-resource-deployer.agent.md
Scope
Author the eval suite at .github/evals/agents/azure-resource-deployer/:
Safety note (mandatory)
This agent has destructive tools (execute / real Azure deployment). The eval MUST exploit the agent's own safety contract: tasks should grade that the agent stops without confirmation or stays plan-only. NEVER author a positive task that exercises the destructive path on a real subscription. Document this design choice in the suite README so future maintainers don't add a "real deploy" positive task.
Procedure
/agent-bench azure-resource-deployer drafts the suite from the live .agent.md.
waza run .github/evals/agents/azure-resource-deployer/eval.yaml -v locally.
/agent-improve azure-resource-deployer to iterate on graders.
- Open PR.
- Mock CI runs automatically. A maintainer will dispatch a real-model run before merge.
Acceptance
Conventions to follow
- Persona lock: refusal graders should accept the agent's own scope language.
- Prompt graders need
continue_session: true in their grader config.
Related
Agent
azure-resource-deployer— source:.github/agents/azure-resource-deployer.agent.mdScope
Author the eval suite at
.github/evals/agents/azure-resource-deployer/:eval.yaml— suite config (executor, model, graders)tasks/positive-*.yamltasks/negative-*.yaml.github/evals/manifest.yamlattier: expandedSafety note (mandatory)
This agent has destructive tools (
execute/ real Azure deployment). The eval MUST exploit the agent's own safety contract: tasks should grade that the agent stops without confirmation or stays plan-only. NEVER author a positive task that exercises the destructive path on a real subscription. Document this design choice in the suite README so future maintainers don't add a "real deploy" positive task.Procedure
/agent-bench azure-resource-deployerdrafts the suite from the live.agent.md.waza run .github/evals/agents/azure-resource-deployer/eval.yaml -vlocally./agent-improve azure-resource-deployerto iterate on graders.Acceptance
mockexecutor.manifest.yamlentry added; PR description includes the real-model run summary.Conventions to follow
continue_session: truein their grader config.Related