Skip to content

fix: disambiguate agentv eval skill triggers from skill-creator #572

@christso

Description

@christso

Objective

Update AgentV's eval-related skill and agent descriptions to explicitly disambiguate from Anthropic's skill-creator when both are loaded in the same Claude session. Without this, generic requests like "run evals on my skill" or "optimize my skill" are ambiguous — both systems claim to handle them.

Architecture Boundary

external-first (frontmatter changes in plugins/agentv-dev/skills/ and plugins/agentv-dev/agents/)

The Problem

Skill-creator description:

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

AgentV eval-orchestrator description:

Run AgentV evaluations by orchestrating eval subcommands. Use this skill when asked to run evals, evaluate an agent, test prompt quality using agentv, or run Agent Skills evals.json files.

AgentV optimizer description:

Optimize agent prompts through evaluation-driven refinement.

Overlapping trigger keywords: "run evals", "optimize", "test", "benchmark", "skill performance". A user saying "run evals on my skill" or "optimize my agent" matches both systems. Claude has no signal for which to prefer.

Disambiguation Strategy

The natural boundary is:

  • Skill-creator = creating/improving Claude Code skills (SKILL.md files, .skill packages, skill description triggering)
  • AgentV = evaluating AI agent output quality (EVAL.yaml, evals.json, agentv eval run, agentv compare)

AgentV's skill descriptions should make this boundary explicit by:

  1. Including "AgentV" in trigger phrases (e.g., "Use when running AgentV evaluations" not just "Use when running evals")
  2. Referencing AgentV-specific file types (EVAL.yaml, .eval.yaml, agentv CLI)
  3. Adding a DO NOT TRIGGER clause for skill-creator's domain ("Do not use for creating or modifying SKILL.md files, packaging skills, or optimizing skill trigger descriptions")

Design Latitude

  • Choose exact wording for disambiguation
  • May add a brief "When to use this vs skill-creator" note in descriptions or leave it implicit via trigger keywords
  • Choose whether to add DO NOT TRIGGER patterns (explicit exclusion) or rely on positive trigger keywords being specific enough
  • May update all eval-related skills or only the ones with highest conflict risk (eval-orchestrator, optimizer)

Skills and Agents to Update

High conflict risk (must update):

  • plugins/agentv-dev/skills/agentv-eval-orchestrator/SKILL.md — "run evals" conflicts directly
  • plugins/agentv-dev/skills/agentv-optimizer/SKILL.md — "optimize" conflicts directly

Medium conflict risk (should update):

  • plugins/agentv-dev/skills/agentv-eval-builder/SKILL.md — "create eval files" could be confused with skill-creator's eval authoring
  • plugins/agentv-dev/skills/agentv-trace-analyst/SKILL.md — "benchmark" keyword overlaps

Low conflict risk (consider):

  • Agent descriptions (eval-judge.md, eval-candidate.md) — only dispatched by skills, not triggered directly

Acceptance Signals

  • A user with both agentv-dev and anthropics/skills (skill-creator) loaded can say "run evals on my skill" and Claude can distinguish which system to use based on context (presence of EVAL.yaml → AgentV, presence of SKILL.md evals → skill-creator)
  • AgentV skill descriptions reference AgentV-specific artifacts (EVAL.yaml, agentv CLI commands) as trigger indicators
  • No ambiguous trigger keyword matches remain for the high-conflict-risk skills

Non-Goals

  • Changing skill-creator's frontmatter (that's Anthropic's repo)
  • Adding a router skill that asks "which eval system?"
  • Changing AgentV's skill names
  • Modifying skill behavior (only descriptions/frontmatter)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions