test: regenerate NeMo Relay evals with NV-BASE#226
Conversation
Signed-off-by: asawarkar <asawarkar@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (14)
📜 Recent review details🧰 Additional context used📓 Path-based instructions (2)**/*.{md,mdx,py,sh,yaml,yml,toml,json}📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
Files:
**⚙️ CodeRabbit configuration file
Files:
🔇 Additional comments (16)
WalkthroughThis PR restructures evaluation test data across 14 NeMo Relay skill directories. Each ChangesEvaluation Fixture Schema Migration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/ok to test c03e8bc |
Signed-off-by: asawarkar <asawarkar@nvidia.com>
|
/ok to test 8c5ee3e |
|
/ok to test 3e2e31b |
|
/nvskills-ci |
Overview
Follow-up to #225. The original PR added eval datasets for the public NeMo Relay skills and was merged before the NV-BASE regeneration pass landed on the fork branch. This PR replaces those hand-seeded datasets with NV-BASE-generated datasets so the skills follow the verified-skills onboarding guide and are ready for the official NVIDIA skills catalog flow.
Details
evals/evals.jsonfor all 14 publicnemo-relay-*consumer skills using NV-BASE.nv-base create-eval-dataset --full --forceto produce 4 cases per skill: 3 positive routing/use cases plus 1 negative case.{ "skill": ..., "cases": [...] }wrapper shape to NV-BASE's top-level array format.Validated with:
nv-base create-eval-dataset skills/<skill> --force --fullfor all 14 public NeMo Relay skillsjq empty skills/*/evals/evals.jsonnv-base validate skills --external --no-dedup --fail-fastWhere should the reviewer start?
Start with
skills/nemo-relay-start/evals/evals.jsonto see the updated NV-BASE dataset shape, then spot-checkskills/nemo-relay-migrate-from-flow/evals/evals.jsonfor script-aware cases andskills/nemo-relay-tune-adaptive-hints/evals/evals.jsonfor sibling-skill negative routing.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Follow-up
After review, a NeMo Relay maintainer/admin should comment
/nvskills-cion this PR to generate the benchmark, skill-card, and signature artifacts required for NVIDIA/skills publication.Summary by CodeRabbit