test: add NeMo Relay skill eval datasets#225
Conversation
Signed-off-by: asawarkar <asawarkar@nvidia.com>
WalkthroughThis PR adds evaluation test specifications for 14 NeMo Relay skills across the complete instrumentation, observability, optimization, and migration pipeline. Each evals.json file defines positive and negative test cases with expected behaviors, skill routing, and validation checklists. One metadata addition updates the migration skill's author field. ChangesNeMo Relay Evaluation Test Suite
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: asawarkar <asawarkar@nvidia.com>
|
/ok to test 4c57d2f |
@willkill07, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 0513bc5 |
|
/merge |
|
/nvskills-ci |
Overview
This PR adds P0 eval datasets for all public NeMo Relay consumer skills so the skills can move through NVIDIA verified-skills onboarding and become available in the official NVIDIA skills catalog. The datasets give NVSkills/NVCARPS the required
evals/evals.jsoninputs before benchmark, skill-card, and signature artifacts are generated.Details
evals/evals.jsonfor all 14 publicnemo-relay-*consumer skills.nv-base create-eval-dataset --full, with positive routing coverage and one negative case per skill.nemo-relay-migrate-from-flowto match the other public skills.Validated with:
nv-base create-eval-dataset skills/<skill> --force --fullfor all 14 public NeMo Relay skillsjq empty skills/*/evals/evals.jsonnv-base validate skills --external --no-dedup --fail-fastWhere should the reviewer start?
Start with
skills/nemo-relay-start/evals/evals.jsonto see the eval shape, then spot-check sibling routing cases inskills/nemo-relay-instrument-calls/evals/evals.json,skills/nemo-relay-setup-observability/evals/evals.json, andskills/nemo-relay-migrate-from-flow/evals/evals.json. The onlySKILL.mdmetadata change is inskills/nemo-relay-migrate-from-flow/SKILL.md.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Follow-up
After review, a NeMo Relay maintainer/admin should comment
/nvskills-cion this PR to generate the benchmark, skill-card, and signature artifacts required for NVIDIA/skills publication.