Add GitHub Actions workflow to sync skills from product repos by sayalinvidia · Pull Request #8 · NVIDIA/skills

sayalinvidia · 2026-04-10T20:23:00Z

Implements the automated sync pipeline (Step 5 of onboarding) that sparse-checkouts the skills directory from each registered product repo and mirrors them into this catalog. Runs twice daily on a cron schedule and supports manual dispatch.

Registered repos: cuOpt, TensorRT-LLM, nemotron-voice-agent, NeMo Gym.

Implements the automated sync pipeline (Step 5 of onboarding) that sparse-checkouts the skills directory from each registered product repo and mirrors them into this catalog. Runs twice daily on a cron schedule and supports manual dispatch. Registered repos: cuOpt, TensorRT-LLM, nemotron-voice-agent, NeMo Gym. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mosheabr

Good start on the sync workflow, Sayali — the sparse-checkout approach and idempotent commit logic are solid. A few things to address before this is ready to merge:

Critical

Cross-repo auth will fail for private repos — actions/checkout@v4 uses the default GITHUB_TOKEN, which only has access to NVIDIA/skills. If any of the product repos (NVIDIA/cuopt, NVIDIA/TensorRT-LLM, etc.) are private, the checkout steps will 403. You'll need a PAT or GitHub App token:
```
with:
  token: ${{ secrets.SKILLS_SYNC_PAT }}
```
Data loss risk if a checkout fails — Each product block does rm -rf skills/<product> before rsync. If the checkout step fails (repo moved, branch renamed, transient error), you've deleted the existing catalog copy with nothing to replace it. Fix: guard the rm -rf so it only runs when .tmp/<product>/skills/ actually exists and is non-empty, or move the delete into a conditional.
Missing NeMo Evaluator — The catalog currently lists 5 products (cuOpt, TensorRT-LLM, Nemotron Voice Agent, NeMo Gym, NeMo Evaluator). The workflow only syncs 4 — NeMo Evaluator needs a block added.

Important

Direct push to main bypasses branch protection — Consider using peter-evans/create-pull-request@v6 to open a PR instead of pushing directly, so changes can be reviewed before landing.
No fault isolation — If one product checkout fails, the entire job fails and no other products get synced. Consider continue-on-error: true on each checkout step, or a matrix strategy per product.
No concurrency control — If a manual dispatch overlaps with a cron run, two pushes could race. Add:
```
concurrency:
  group: sync-skills
  cancel-in-progress: true
```

Minor

rm -rf + rsync --delete is redundant — rsync --delete already handles file removals from source. The rm -rf + mkdir -p before it is unnecessary.
Static commit message — "chore: sync skills from product repos" doesn't indicate which products changed. Would be helpful to include a summary.
No failure notification — If the cron sync silently fails, nobody knows. Consider adding a Slack or email notification step on failure.

Critical fixes: - Use SKILLS_SYNC_PAT secret for all product repo checkouts (default GITHUB_TOKEN will 403 on private repos) - Guard rm -rf behind existence + non-empty checks so a failed checkout preserves the existing catalog copy instead of deleting it - Add missing products from upstream README: Model-Optimizer, Megatron-Core, Megatron-Bridge, NeMo Evaluator (Launcher + Evaluator synced into separate catalog directories to avoid conflicts) Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rsync --delete already removes destination files not present in the source. The rm -rf + mkdir -p before each rsync was unnecessary — mkdir -p alone handles the first-ever run. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add continue-on-error: true to each checkout step so a single repo failure (transient 503, repo renamed, branch deleted) does not block the remaining products from syncing. The existing non-empty guard on each copy step already handles the case where a checkout produced nothing. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

If a manual workflow_dispatch overlaps with a scheduled cron run, two jobs could race and produce conflicting pushes. The concurrency group ensures only one sync runs at a time, cancelling the in-progress run if a new one is triggered. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the direct commit-and-push to main with peter-evans/create-pull-request@v6. Changes now land on a automated/sync-skills branch and open a PR for review, respecting branch protection rules. The action handles idempotency — if no files changed, no PR is created. The branch is auto-deleted after merge. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each copy step now logs which products were synced. The PR title includes the product names (e.g. "chore: sync skills (cuOpt, TensorRT-LLM)") and the body lists them with the trigger source. Replaces the static "chore: sync skills from product repos" message. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the workflow fails, a GitHub issue is automatically created with a link to the failed run, the trigger type, and a sync-failure label. This ensures silent cron failures get noticed instead of drifting undetected. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The sync log was initialized with echo "" which wrote a blank line, causing a leading comma in the product list. Use truncate -s 0 to create a truly empty file instead. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mosheabr

Great revision, Sayali. All 9 items from the first review are addressed: PAT auth, data loss guards, fault isolation, concurrency, PR-based commits, dynamic commit messages, and failure notifications. This is solid.

One thing to add before merging: CUDA-Q was just merged into the catalog (#7). The sync workflow needs a block for it:

# -- CUDA-Q --
- name: Checkout CUDA-Q
  continue-on-error: true
  uses: actions/checkout@v4
  with:
    repository: NVIDIA/cuda-quantum
    ref: main
    path: .tmp/cuda-quantum
    token: ${{ secrets.SKILLS_SYNC_PAT }}
    sparse-checkout: |
      .claude/skills/

- name: Copy CUDA-Q skills into catalog
  run: |
    if [ -d ".tmp/cuda-quantum/.claude/skills" ] && [ -n "$(ls -A .tmp/cuda-quantum/.claude/skills)" ]; then
      mkdir -p skills/CUDA-Q
      rsync -a --delete .tmp/cuda-quantum/.claude/skills/ skills/CUDA-Q/
      echo "- CUDA-Q" >> /tmp/synced-products.txt
    else
      echo "⚠ CUDA-Q checkout empty or missing — skipping to preserve existing catalog"
    fi

Once that's added, this is ready to go.

CUDA-Q was merged into the catalog (NVIDIA#7). Add checkout + copy block for NVIDIA/cuda-quantum → skills/CUDA-Q. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mosheabr · 2026-04-10T22:38:04Z

CUDA-Q block looks good. All 10 products covered, all review items addressed. This is ready to merge whenever you mark it ready for review.

sayalinvidia · 2026-04-10T22:38:29Z

Thank you @mosheabr
Added CUDA-Q (NVIDIA/cuda-quantum) to the sync workflow as well!

sayalinvidia requested a review from mosheabr as a code owner April 10, 2026 20:23

sayalinvidia marked this pull request as draft April 10, 2026 20:27

mosheabr requested changes Apr 10, 2026

View reviewed changes

sayalinvidia and others added 7 commits April 10, 2026 14:20

mosheabr mentioned this pull request Apr 10, 2026

fix: update Megatron-Core skill count from 2 to 3 #9

Merged

1 task

mosheabr approved these changes Apr 10, 2026

View reviewed changes

Add CUDA-Q to skills sync workflow

c5db6b7

CUDA-Q was merged into the catalog (NVIDIA#7). Add checkout + copy block for NVIDIA/cuda-quantum → skills/CUDA-Q. Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sayalinvidia marked this pull request as ready for review April 11, 2026 00:10

sayalinvidia merged commit b729ebb into NVIDIA:main Apr 11, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GitHub Actions workflow to sync skills from product repos#8

Add GitHub Actions workflow to sync skills from product repos#8
sayalinvidia merged 10 commits intoNVIDIA:mainfrom
sayalinvidia:add-skills-sync-workflow

sayalinvidia commented Apr 10, 2026

Uh oh!

mosheabr left a comment

Uh oh!

mosheabr left a comment

Uh oh!

mosheabr commented Apr 10, 2026

Uh oh!

sayalinvidia commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sayalinvidia commented Apr 10, 2026

Uh oh!

mosheabr left a comment

Choose a reason for hiding this comment

Critical

Important

Minor

Uh oh!

mosheabr left a comment

Choose a reason for hiding this comment

Uh oh!

mosheabr commented Apr 10, 2026

Uh oh!

sayalinvidia commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants