docs/examples: team-eval starter pack for coordinated multi-agent workflows

## Objective

Add a first-party team-eval example pack that shows how to evaluate coordinated multi-agent workflows in AgentV without requiring users to reverse-engineer the pattern from research notes.

## Why this is needed

Current AgentV examples are strong for single-agent and single-test flows, but the frontier benchmark/story around agent teams keeps recurring:

- coordinated specialist agents
- judgeable intermediate artifacts
- role adherence and division of labor
- end-to-end scoring of the team result

Even before dependency-aware DAG execution lands, AgentV can already demonstrate useful team-eval patterns with existing primitives: multi-turn transcripts, composite evaluators, code graders, tool trajectory checks, and imported session data.

## Suggested example coverage

1. **Two-role handoff example** — planner -> implementer, scored on final output + role adherence
2. **Team transcript import example** — evaluate an existing multi-agent / multi-role transcript offline
3. **Composite team score example** — combine outcome quality, tool-use constraints, and collaboration-specific rubric signals
4. **Future-facing note for dependency DAGs** — show how richer orchestration examples could layer on top of #331 once available

## Acceptance signals

- a docs/examples page that explains how to structure a team-eval in todays AgentV primitives
- at least one runnable example or fixture using multiple roles / participants
- explicit distinction between what works now vs what depends on #331

## Non-goals

- not a request to build a general multi-agent runtime into AgentV
- not a duplicate of #331 dependency-aware eval ordering

## Related

- #331 multi-agent swarm evaluation — dependency-aware eval ordering and cross-agent scoring
- #337 orchestrator eval patterns tracker
- MiniMax M2.7 follow-up research: https://github.com/agentevals/agentevals-research/pull/58


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/examples: team-eval starter pack for coordinated multi-agent workflows #1077

Objective

Why this is needed

Suggested example coverage

Acceptance signals

Non-goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs/examples: team-eval starter pack for coordinated multi-agent workflows #1077

Description

Objective

Why this is needed

Suggested example coverage

Acceptance signals

Non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions