Improve agent-plugin-review skill to pass remaining 3 eval tests

## Summary

The `agent-plugin-review` skill passes 6/9 eval tests against pi-cli (mean score 0.722). Three tests fail consistently:

| Test | Score | Issue |
|---|---|---|
| detect-relative-file-paths | 0.500 | Partially detected — skill mentions leading `/` but agent doesn't consistently flag it |
| detect-repeated-inputs | 0.000 | Missed — agent doesn't suggest top-level input for repeated file references |
| detect-missing-hard-gates | 0.000 | Missed — agent doesn't flag missing artifact existence checks between phases |

## Approach

Use the agentv-bench eval-driven iteration loop:
1. Analyze the failing test transcripts to understand what the agent does instead
2. Identify which SKILL.md instructions are unclear or missing
3. Make targeted edits to the skill
4. Re-run evals to verify improvement
5. Repeat until all 9 pass

## Possible improvements

- **Relative file paths**: Add an explicit checklist item about checking `type: file` values in eval YAML
- **Repeated inputs**: Add guidance about the top-level `input` field from [AgentV eval docs](https://agentv.dev/evaluation/eval-files/)
- **Hard gates**: Make the workflow-checklist.md more prescriptive about what to look for (artifact existence checks at the start of each phase skill)

## Eval command

```bash
bun run --filter @agentv/core build && bun apps/cli/src/cli.ts eval evals/agentic-engineering/agent-plugin-review.eval.yaml --target pi-cli
```

Note: must rebuild `@agentv/core` dist before running if core source was modified.

## Related

- PR #776 — baseline eval results (6/9 pass)
- PR #772 — original skill creation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve agent-plugin-review skill to pass remaining 3 eval tests #779

Summary

Approach

Possible improvements

Eval command

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test	Score	Issue
detect-relative-file-paths	0.500	Partially detected — skill mentions leading `/` but agent doesn't consistently flag it
detect-repeated-inputs	0.000	Missed — agent doesn't suggest top-level input for repeated file references
detect-missing-hard-gates	0.000	Missed — agent doesn't flag missing artifact existence checks between phases

Improve agent-plugin-review skill to pass remaining 3 eval tests #779

Description

Summary

Approach

Possible improvements

Eval command

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions