Author eval suite for agent `azure-resource-deployer`

## Agent

`azure-resource-deployer` — source: `.github/agents/azure-resource-deployer.agent.md`

## Scope

Author the eval suite at `.github/evals/agents/azure-resource-deployer/`:

- [ ] `eval.yaml` — suite config (executor, model, graders)
- [ ] At least 2 positive tasks under `tasks/positive-*.yaml`
- [ ] At least 1 negative task under `tasks/negative-*.yaml`
- [ ] Entry added to `.github/evals/manifest.yaml` at `tier: expanded`

## Safety note (mandatory)

This agent has destructive tools (`execute` / real Azure deployment). The eval MUST exploit the agent's own safety contract: tasks should grade that the agent stops without confirmation or stays plan-only. **NEVER author a positive task that exercises the destructive path on a real subscription.** Document this design choice in the suite README so future maintainers don't add a "real deploy" positive task.

## Procedure

1. `/agent-bench azure-resource-deployer` drafts the suite from the live `.agent.md`.
2. `waza run .github/evals/agents/azure-resource-deployer/eval.yaml -v` locally.
3. `/agent-improve azure-resource-deployer` to iterate on graders.
4. Open PR.
5. Mock CI runs automatically. A maintainer will dispatch a real-model run before merge.

## Acceptance

- [ ] Suite runs cleanly in `mock` executor.
- [ ] Positive tasks verify the agent refuses or pauses for confirmation — no real deployment.
- [ ] All negative tasks produce a refusal or out-of-scope acknowledgement.
- [ ] `manifest.yaml` entry added; PR description includes the real-model run summary.
- [ ] Suite README documents the "no real deploy" design choice.

## Conventions to follow

- Persona lock: refusal graders should accept the agent's own scope language.
- Prompt graders need `continue_session: true` in their grader config.

## Related

- Umbrella: #93
- Harness: #61


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Author eval suite for agent `azure-resource-deployer` #103

Agent

Scope

Safety note (mandatory)

Procedure

Acceptance

Conventions to follow

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Author eval suite for agent azure-resource-deployer #103

Description

Agent

Scope

Safety note (mandatory)

Procedure

Acceptance

Conventions to follow

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Author eval suite for agent `azure-resource-deployer` #103