Cherry-pick develop changes into release-v2.6.0#613
Merged
Conversation
Update checkout, upload-artifact, setup-helm, and Docker actions to versions that default to Node 24, resolving deprecation warnings on publish and CI workflows.
* feat(ci): agentic eval + pre-checkout volume cleanup
skills-eval.yml:
- Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd)
using docker run alpine before git clean runs — permanent fix for the
EACCES checkout failure loop
- Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py
(agentic approach — diffs PR, routes per-spec platform, posts PR comment)
.github/skill-eval/skills_eval_agent.py + AGENTS.md:
Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that
handles diff detection, per-spec routing (cpu/gpu), Harbor execution,
and PR comment posting
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
* fix(ci): rm -rf /target/* not /target — can't delete mount point
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
* feat(ci): automate skill-source/ evals in PR pipeline
Previously skill-source/.agents/skills/ changes were invisible to the
automated pipeline — dorny/paths-filter only watched skills/** and
AGENTS.md Step 1 only diffed under skills/.
Changes:
- skills-eval.yml: add skill-source/** to paths filter so PRs touching
the monolithic rag-blueprint skill trigger the eval
- AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/,
resolve SKILL_DIR to the correct root per location
- AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and
monolithic skills get the right --skill-dir and --spec paths
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
* ci: use actions/checkout@v5 and upload-artifact@v5 (aligns with #597)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
* fix(eval): one Brev VM per platform per run — not one per spec
If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision
ONE VM and run all H100 trials against it sequentially. Prevents spinning
up 2 separate VMs (saves 13+ min provisioning + halves cost).
Added fallback types for capacity failures:
dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
* feat(eval): auto-deploy RAG stack before any H100 trial — no skill author action needed
Previously skill authors had to add deploy instructions to their h100
spec env field. Now the agent handles this automatically:
Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks
if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys
it using rag-blueprint/h100.json automatically — regardless of which
skills changed in the PR.
Skill authors write their h100.json specs normally. The infrastructure
handles the RAG stack prerequisite. Same pattern as VSS's profile field
but handled at the agent level not spec level.
Also: one Brev VM per platform per run (not one per spec).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
---------
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…summary keys Signed-off-by: smasurekar <smasurekar@nvidia.com>
…lformed LLM output (#605) Signed-off-by: smasurekar <smasurekar@nvidia.com>
…owercase-fix-updates fix(ingestor): return backend-canonicalized collection name to align summary keys
… isn't populated for agentic requests Signed-off-by: smasurekar <smasurekar@nvidia.com>
…tic-metrics-note Added a Limitations bullet noting that the per-response metrics block isn't populated for agentic requests
… noise Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Add OpenShift to the release summary and Highlights for the 2.6.0 release, linking to the Helm on OpenShift deployment guide.
Signed-off-by: Niyati Singal <nsingal@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cherry-picks commits from
developthat are not yet onrelease-v2.6.0, bringing the release branch up to date with post-release fixes and improvements without merging all ofdevelop.response_parser.py, disable JSON mode for malformed LLM output); document agentic metrics limitation indocs/agentic-rag.md.17 files changed (+395 / −190).
Test plan
ci-pipeline.yml,skills-eval.yml,skills-nv-base.yml)test_agentic_rag.py, ingestor main tests