-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Objective
Update AgentV's eval-related skill and agent descriptions to explicitly disambiguate from Anthropic's skill-creator when both are loaded in the same Claude session. Without this, generic requests like "run evals on my skill" or "optimize my skill" are ambiguous — both systems claim to handle them.
Architecture Boundary
external-first (frontmatter changes in plugins/agentv-dev/skills/ and plugins/agentv-dev/agents/)
The Problem
Skill-creator description:
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
AgentV eval-orchestrator description:
Run AgentV evaluations by orchestrating eval subcommands. Use this skill when asked to run evals, evaluate an agent, test prompt quality using agentv, or run Agent Skills evals.json files.
AgentV optimizer description:
Optimize agent prompts through evaluation-driven refinement.
Overlapping trigger keywords: "run evals", "optimize", "test", "benchmark", "skill performance". A user saying "run evals on my skill" or "optimize my agent" matches both systems. Claude has no signal for which to prefer.
Disambiguation Strategy
The natural boundary is:
- Skill-creator = creating/improving Claude Code skills (SKILL.md files,
.skillpackages, skill description triggering) - AgentV = evaluating AI agent output quality (EVAL.yaml, evals.json,
agentv eval run,agentv compare)
AgentV's skill descriptions should make this boundary explicit by:
- Including "AgentV" in trigger phrases (e.g., "Use when running AgentV evaluations" not just "Use when running evals")
- Referencing AgentV-specific file types (EVAL.yaml,
.eval.yaml,agentvCLI) - Adding a DO NOT TRIGGER clause for skill-creator's domain ("Do not use for creating or modifying SKILL.md files, packaging skills, or optimizing skill trigger descriptions")
Design Latitude
- Choose exact wording for disambiguation
- May add a brief "When to use this vs skill-creator" note in descriptions or leave it implicit via trigger keywords
- Choose whether to add DO NOT TRIGGER patterns (explicit exclusion) or rely on positive trigger keywords being specific enough
- May update all eval-related skills or only the ones with highest conflict risk (eval-orchestrator, optimizer)
Skills and Agents to Update
High conflict risk (must update):
plugins/agentv-dev/skills/agentv-eval-orchestrator/SKILL.md— "run evals" conflicts directlyplugins/agentv-dev/skills/agentv-optimizer/SKILL.md— "optimize" conflicts directly
Medium conflict risk (should update):
plugins/agentv-dev/skills/agentv-eval-builder/SKILL.md— "create eval files" could be confused with skill-creator's eval authoringplugins/agentv-dev/skills/agentv-trace-analyst/SKILL.md— "benchmark" keyword overlaps
Low conflict risk (consider):
- Agent descriptions (
eval-judge.md,eval-candidate.md) — only dispatched by skills, not triggered directly
Acceptance Signals
- A user with both
agentv-devandanthropics/skills(skill-creator) loaded can say "run evals on my skill" and Claude can distinguish which system to use based on context (presence of EVAL.yaml → AgentV, presence of SKILL.md evals → skill-creator) - AgentV skill descriptions reference AgentV-specific artifacts (EVAL.yaml,
agentvCLI commands) as trigger indicators - No ambiguous trigger keyword matches remain for the high-conflict-risk skills
Non-Goals
- Changing skill-creator's frontmatter (that's Anthropic's repo)
- Adding a router skill that asks "which eval system?"
- Changing AgentV's skill names
- Modifying skill behavior (only descriptions/frontmatter)
Related
- Anthropic skill-creator SKILL.md — the competing trigger description