Feat/skills ci automation#602
Merged
Merged
Conversation
skills-eval.yml:
- Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd)
using docker run alpine before git clean runs — permanent fix for the
EACCES checkout failure loop
- Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py
(agentic approach — diffs PR, routes per-spec platform, posts PR comment)
.github/skill-eval/skills_eval_agent.py + AGENTS.md:
Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that
handles diff detection, per-spec routing (cpu/gpu), Harbor execution,
and PR comment posting
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Previously skill-source/.agents/skills/ changes were invisible to the automated pipeline — dorny/paths-filter only watched skills/** and AGENTS.md Step 1 only diffed under skills/. Changes: - skills-eval.yml: add skill-source/** to paths filter so PRs touching the monolithic rag-blueprint skill trigger the eval - AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/, resolve SKILL_DIR to the correct root per location - AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and monolithic skills get the right --skill-dir and --spec paths Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision ONE VM and run all H100 trials against it sequentially. Prevents spinning up 2 separate VMs (saves 13+ min provisioning + halves cost). Added fallback types for capacity failures: dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
…thor action needed Previously skill authors had to add deploy instructions to their h100 spec env field. Now the agent handles this automatically: Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys it using rag-blueprint/h100.json automatically — regardless of which skills changed in the PR. Skill authors write their h100.json specs normally. The infrastructure handles the RAG stack prerequisite. Same pattern as VSS's profile field but handled at the agent level not spec level. Also: one Brev VM per platform per run (not one per spec). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
shubhadeepd
approved these changes
May 22, 2026
5 tasks
shubhadeepd
added a commit
that referenced
this pull request
May 25, 2026
* Supress milvus lite logs for library lite notebook (#598) * ci: bump GitHub Actions to Node.js 24 runtimes (#597) Update checkout, upload-artifact, setup-helm, and Docker actions to versions that default to Node 24, resolving deprecation warnings on publish and CI workflows. * Feat/skills ci automation (#602) * feat(ci): agentic eval + pre-checkout volume cleanup skills-eval.yml: - Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd) using docker run alpine before git clean runs — permanent fix for the EACCES checkout failure loop - Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py (agentic approach — diffs PR, routes per-spec platform, posts PR comment) .github/skill-eval/skills_eval_agent.py + AGENTS.md: Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that handles diff detection, per-spec routing (cpu/gpu), Harbor execution, and PR comment posting Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * fix(ci): rm -rf /target/* not /target — can't delete mount point Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * feat(ci): automate skill-source/ evals in PR pipeline Previously skill-source/.agents/skills/ changes were invisible to the automated pipeline — dorny/paths-filter only watched skills/** and AGENTS.md Step 1 only diffed under skills/. Changes: - skills-eval.yml: add skill-source/** to paths filter so PRs touching the monolithic rag-blueprint skill trigger the eval - AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/, resolve SKILL_DIR to the correct root per location - AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and monolithic skills get the right --skill-dir and --spec paths Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * ci: use actions/checkout@v5 and upload-artifact@v5 (aligns with #597) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * fix(eval): one Brev VM per platform per run — not one per spec If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision ONE VM and run all H100 trials against it sequentially. Prevents spinning up 2 separate VMs (saves 13+ min provisioning + halves cost). Added fallback types for capacity failures: dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * feat(eval): auto-deploy RAG stack before any H100 trial — no skill author action needed Previously skill authors had to add deploy instructions to their h100 spec env field. Now the agent handles this automatically: Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys it using rag-blueprint/h100.json automatically — regardless of which skills changed in the PR. Skill authors write their h100.json specs normally. The infrastructure handles the RAG stack prerequisite. Same pattern as VSS's profile field but handled at the agent level not spec level. Also: one Brev VM per platform per run (not one per spec). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> --------- Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(ingestor): return backend-canonicalized collection name to align summary keys Signed-off-by: smasurekar <smasurekar@nvidia.com> * fix(agentic-rag): disable JSON mode and harden response parser for malformed LLM output (#605) Signed-off-by: smasurekar <smasurekar@nvidia.com> * copy pr bot additional trustees * copy pr bot additional trustees * Added a Limitations bullet noting that the per-response metrics block isn't populated for agentic requests Signed-off-by: smasurekar <smasurekar@nvidia.com> * updated label * use GitHub Secrets directly * ci: fix skills-eval runner label, credentials, artifacts, and NV-BASE noise Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * ci: add ANTHROPIC_BASE_URL and fix NV-BASE push trigger Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * ci: add ANTHROPIC_MODEL and CLAUDE_CODE_DISABLE_THINKING for NVIDIA proxy * updated agent code * docs: document OpenShift support in v2.6.0 release notes (#600) Add OpenShift to the release summary and Highlights for the 2.6.0 release, linking to the Helm on OpenShift deployment guide. * docs: add agent skill routing table to README (#599) Signed-off-by: Niyati Singal <nsingal@nvidia.com> --------- Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> Signed-off-by: smasurekar <smasurekar@nvidia.com> Signed-off-by: Niyati Singal <nsingal@nvidia.com> Co-authored-by: nv-pranjald <150428320+nv-pranjald@users.noreply.github.com> Co-authored-by: vidushig-nv <vidushig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: smasurekar <smasurekar@nvidia.com> Co-authored-by: niyatisingal <nsingal@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Checklist
git commit -s) and GPG signed (git commit -S).