Skip to content

Author eval suite for skill git-ape-onboarding #96

@arnaudlh

Description

@arnaudlh

Skill

git-ape-onboarding — source: .github/skills/git-ape-onboarding/SKILL.md

Scope

Author the eval suite at .github/evals/git-ape-onboarding/:

  • eval.yaml — suite config (executor, model, graders)
  • At least 2 positive tasks under tasks/positive-*.yaml
  • At least 1 negative task under tasks/negative-*.yaml
  • Entry added to .github/evals/manifest.yaml at tier: expanded

Notes

This is the most complex skill in the set. Recommend deferring until at least 3 simpler skill suites have landed and the authoring patterns are well-established.

The skill is interactive and gates on az login + input collection. The eval should grade the gated step-1 reply (prereq table shown, missing prereqs flagged, inputs requested, plan preview for upcoming steps) rather than expecting full end-to-end execution in a single response.

Procedure

  1. /skill-bench git-ape-onboarding drafts the suite from the live SKILL.md.
  2. waza run .github/evals/git-ape-onboarding/eval.yaml -v locally.
  3. /skill-improve git-ape-onboarding to iterate on graders.
  4. Open PR.
  5. Mock CI runs automatically. A maintainer will dispatch a real-model run before merge.

Acceptance

  • Suite runs cleanly in mock executor.
  • At least one positive task passes in a real-model run.
  • All negative tasks produce a refusal or out-of-scope acknowledgement.
  • manifest.yaml entry added; PR description includes the real-model run summary.

Conventions to follow

  • Persona lock: refusal graders should accept the agent's own scope language.
  • Don't add required_skills to a skill_invocation grader unless the skill genuinely invokes those sub-skills.
  • Prompt graders need continue_session: true in their grader config.

Related

Metadata

Metadata

Assignees

Labels

AI-evalsAll things related to agent and skills evaluation.enhancementNew feature or requestgood first issueGood for newcomers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions