Skip to content

Feat/skills ci automation#602

Merged
shubhadeepd merged 6 commits into
developfrom
feat/skills-ci-automation
May 22, 2026
Merged

Feat/skills ci automation#602
shubhadeepd merged 6 commits into
developfrom
feat/skills-ci-automation

Conversation

@vidushig-nv
Copy link
Copy Markdown
Contributor

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • All commits are signed-off (git commit -s) and GPG signed (git commit -S).
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

vidushig-nv and others added 6 commits May 22, 2026 14:17
skills-eval.yml:
  - Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd)
    using docker run alpine before git clean runs — permanent fix for the
    EACCES checkout failure loop
  - Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py
    (agentic approach — diffs PR, routes per-spec platform, posts PR comment)

.github/skill-eval/skills_eval_agent.py + AGENTS.md:
  Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that
  handles diff detection, per-spec routing (cpu/gpu), Harbor execution,
  and PR comment posting

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Previously skill-source/.agents/skills/ changes were invisible to the
automated pipeline — dorny/paths-filter only watched skills/** and
AGENTS.md Step 1 only diffed under skills/.

Changes:
- skills-eval.yml: add skill-source/** to paths filter so PRs touching
  the monolithic rag-blueprint skill trigger the eval
- AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/,
  resolve SKILL_DIR to the correct root per location
- AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and
  monolithic skills get the right --skill-dir and --spec paths

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision
ONE VM and run all H100 trials against it sequentially. Prevents spinning
up 2 separate VMs (saves 13+ min provisioning + halves cost).

Added fallback types for capacity failures:
  dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
…thor action needed

Previously skill authors had to add deploy instructions to their h100
spec env field. Now the agent handles this automatically:

Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks
if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys
it using rag-blueprint/h100.json automatically — regardless of which
skills changed in the PR.

Skill authors write their h100.json specs normally. The infrastructure
handles the RAG stack prerequisite. Same pattern as VSS's profile field
but handled at the agent level not spec level.

Also: one Brev VM per platform per run (not one per spec).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
@vidushig-nv vidushig-nv requested a review from shubhadeepd May 22, 2026 09:45
@shubhadeepd shubhadeepd merged commit 75125fa into develop May 22, 2026
6 checks passed
@shubhadeepd shubhadeepd deleted the feat/skills-ci-automation branch May 22, 2026 10:00
shubhadeepd added a commit that referenced this pull request May 25, 2026
* Supress milvus lite logs for library lite notebook (#598)

* ci: bump GitHub Actions to Node.js 24 runtimes (#597)

Update checkout, upload-artifact, setup-helm, and Docker actions to
versions that default to Node 24, resolving deprecation warnings on
publish and CI workflows.

* Feat/skills ci automation (#602)

* feat(ci): agentic eval + pre-checkout volume cleanup

skills-eval.yml:
  - Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd)
    using docker run alpine before git clean runs — permanent fix for the
    EACCES checkout failure loop
  - Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py
    (agentic approach — diffs PR, routes per-spec platform, posts PR comment)

.github/skill-eval/skills_eval_agent.py + AGENTS.md:
  Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that
  handles diff detection, per-spec routing (cpu/gpu), Harbor execution,
  and PR comment posting

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* fix(ci): rm -rf /target/* not /target — can't delete mount point

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* feat(ci): automate skill-source/ evals in PR pipeline

Previously skill-source/.agents/skills/ changes were invisible to the
automated pipeline — dorny/paths-filter only watched skills/** and
AGENTS.md Step 1 only diffed under skills/.

Changes:
- skills-eval.yml: add skill-source/** to paths filter so PRs touching
  the monolithic rag-blueprint skill trigger the eval
- AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/,
  resolve SKILL_DIR to the correct root per location
- AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and
  monolithic skills get the right --skill-dir and --spec paths

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* ci: use actions/checkout@v5 and upload-artifact@v5 (aligns with #597)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* fix(eval): one Brev VM per platform per run — not one per spec

If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision
ONE VM and run all H100 trials against it sequentially. Prevents spinning
up 2 separate VMs (saves 13+ min provisioning + halves cost).

Added fallback types for capacity failures:
  dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* feat(eval): auto-deploy RAG stack before any H100 trial — no skill author action needed

Previously skill authors had to add deploy instructions to their h100
spec env field. Now the agent handles this automatically:

Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks
if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys
it using rag-blueprint/h100.json automatically — regardless of which
skills changed in the PR.

Skill authors write their h100.json specs normally. The infrastructure
handles the RAG stack prerequisite. Same pattern as VSS's profile field
but handled at the agent level not spec level.

Also: one Brev VM per platform per run (not one per spec).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

---------

Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ingestor): return backend-canonicalized collection name to align summary keys

Signed-off-by: smasurekar <smasurekar@nvidia.com>

* fix(agentic-rag): disable JSON mode and harden response parser for malformed LLM output (#605)

Signed-off-by: smasurekar <smasurekar@nvidia.com>

* copy pr bot additional trustees

* copy pr bot additional trustees

* Added a Limitations bullet noting that the per-response metrics block isn't populated for agentic requests

Signed-off-by: smasurekar <smasurekar@nvidia.com>

* updated label

* use GitHub Secrets directly

* ci: fix skills-eval runner label, credentials, artifacts, and NV-BASE noise

Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* ci: add ANTHROPIC_BASE_URL and fix NV-BASE push trigger

Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* ci: add ANTHROPIC_MODEL and CLAUDE_CODE_DISABLE_THINKING for NVIDIA proxy

* updated agent code

* docs: document OpenShift support in v2.6.0 release notes (#600)

Add OpenShift to the release summary and Highlights for the 2.6.0
release, linking to the Helm on OpenShift deployment guide.

* docs: add agent skill routing table to README (#599) 

Signed-off-by: Niyati Singal <nsingal@nvidia.com>

---------

Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Signed-off-by: smasurekar <smasurekar@nvidia.com>
Signed-off-by: Niyati Singal <nsingal@nvidia.com>
Co-authored-by: nv-pranjald <150428320+nv-pranjald@users.noreply.github.com>
Co-authored-by: vidushig-nv <vidushig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: smasurekar <smasurekar@nvidia.com>
Co-authored-by: niyatisingal <nsingal@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants