Feat/skills ci automation by vidushig-nv · Pull Request #602 · NVIDIA-AI-Blueprints/rag

vidushig-nv · 2026-05-22T09:45:00Z

Description

Checklist

I am familiar with the Contributing Guidelines.
All commits are signed-off (git commit -s) and GPG signed (git commit -S).
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

skills-eval.yml: - Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd) using docker run alpine before git clean runs — permanent fix for the EACCES checkout failure loop - Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py (agentic approach — diffs PR, routes per-spec platform, posts PR comment) .github/skill-eval/skills_eval_agent.py + AGENTS.md: Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that handles diff detection, per-spec routing (cpu/gpu), Harbor execution, and PR comment posting Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

Previously skill-source/.agents/skills/ changes were invisible to the automated pipeline — dorny/paths-filter only watched skills/** and AGENTS.md Step 1 only diffed under skills/. Changes: - skills-eval.yml: add skill-source/** to paths filter so PRs touching the monolithic rag-blueprint skill trigger the eval - AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/, resolve SKILL_DIR to the correct root per location - AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and monolithic skills get the right --skill-dir and --spec paths Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision ONE VM and run all H100 trials against it sequentially. Prevents spinning up 2 separate VMs (saves 13+ min provisioning + halves cost). Added fallback types for capacity failures: dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

…thor action needed Previously skill authors had to add deploy instructions to their h100 spec env field. Now the agent handles this automatically: Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys it using rag-blueprint/h100.json automatically — regardless of which skills changed in the PR. Skill authors write their h100.json specs normally. The infrastructure handles the RAG stack prerequisite. Same pattern as VSS's profile field but handled at the agent level not spec level. Also: one Brev VM per platform per run (not one per spec). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* Supress milvus lite logs for library lite notebook (#598) * ci: bump GitHub Actions to Node.js 24 runtimes (#597) Update checkout, upload-artifact, setup-helm, and Docker actions to versions that default to Node 24, resolving deprecation warnings on publish and CI workflows. * Feat/skills ci automation (#602) * feat(ci): agentic eval + pre-checkout volume cleanup skills-eval.yml: - Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd) using docker run alpine before git clean runs — permanent fix for the EACCES checkout failure loop - Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py (agentic approach — diffs PR, routes per-spec platform, posts PR comment) .github/skill-eval/skills_eval_agent.py + AGENTS.md: Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that handles diff detection, per-spec routing (cpu/gpu), Harbor execution, and PR comment posting Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * fix(ci): rm -rf /target/* not /target — can't delete mount point Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * feat(ci): automate skill-source/ evals in PR pipeline Previously skill-source/.agents/skills/ changes were invisible to the automated pipeline — dorny/paths-filter only watched skills/** and AGENTS.md Step 1 only diffed under skills/. Changes: - skills-eval.yml: add skill-source/** to paths filter so PRs touching the monolithic rag-blueprint skill trigger the eval - AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/, resolve SKILL_DIR to the correct root per location - AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and monolithic skills get the right --skill-dir and --spec paths Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * ci: use actions/checkout@v5 and upload-artifact@v5 (aligns with #597) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * fix(eval): one Brev VM per platform per run — not one per spec If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision ONE VM and run all H100 trials against it sequentially. Prevents spinning up 2 separate VMs (saves 13+ min provisioning + halves cost). Added fallback types for capacity failures: dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * feat(eval): auto-deploy RAG stack before any H100 trial — no skill author action needed Previously skill authors had to add deploy instructions to their h100 spec env field. Now the agent handles this automatically: Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys it using rag-blueprint/h100.json automatically — regardless of which skills changed in the PR. Skill authors write their h100.json specs normally. The infrastructure handles the RAG stack prerequisite. Same pattern as VSS's profile field but handled at the agent level not spec level. Also: one Brev VM per platform per run (not one per spec). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> --------- Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(ingestor): return backend-canonicalized collection name to align summary keys Signed-off-by: smasurekar <smasurekar@nvidia.com> * fix(agentic-rag): disable JSON mode and harden response parser for malformed LLM output (#605) Signed-off-by: smasurekar <smasurekar@nvidia.com> * copy pr bot additional trustees * copy pr bot additional trustees * Added a Limitations bullet noting that the per-response metrics block isn't populated for agentic requests Signed-off-by: smasurekar <smasurekar@nvidia.com> * updated label * use GitHub Secrets directly * ci: fix skills-eval runner label, credentials, artifacts, and NV-BASE noise Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * ci: add ANTHROPIC_BASE_URL and fix NV-BASE push trigger Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> * ci: add ANTHROPIC_MODEL and CLAUDE_CODE_DISABLE_THINKING for NVIDIA proxy * updated agent code * docs: document OpenShift support in v2.6.0 release notes (#600) Add OpenShift to the release summary and Highlights for the 2.6.0 release, linking to the Helm on OpenShift deployment guide. * docs: add agent skill routing table to README (#599) Signed-off-by: Niyati Singal <nsingal@nvidia.com> --------- Signed-off-by: Vidushi Gupta <vidushig@nvidia.com> Signed-off-by: smasurekar <smasurekar@nvidia.com> Signed-off-by: Niyati Singal <nsingal@nvidia.com> Co-authored-by: nv-pranjald <150428320+nv-pranjald@users.noreply.github.com> Co-authored-by: vidushig-nv <vidushig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: smasurekar <smasurekar@nvidia.com> Co-authored-by: niyatisingal <nsingal@nvidia.com>

vidushig-nv and others added 6 commits May 22, 2026 14:17

fix(ci): rm -rf /target/* not /target — can't delete mount point

43081c1

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

ci: use actions/checkout@v5 and upload-artifact@v5 (aligns with #597)

a05d37c

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

vidushig-nv requested a review from shubhadeepd May 22, 2026 09:45

shubhadeepd approved these changes May 22, 2026

View reviewed changes

shubhadeepd merged commit 75125fa into develop May 22, 2026
6 checks passed

shubhadeepd deleted the feat/skills-ci-automation branch May 22, 2026 10:00

shubhadeepd mentioned this pull request May 25, 2026

Cherry-pick develop changes into release-v2.6.0 #613

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/skills ci automation#602

Feat/skills ci automation#602
shubhadeepd merged 6 commits into
developfrom
feat/skills-ci-automation

vidushig-nv commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vidushig-nv commented May 22, 2026

Description

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants