Skip to content

Add GitHub Actions workflow to sync skills from product repos#8

Merged
sayalinvidia merged 10 commits intoNVIDIA:mainfrom
sayalinvidia:add-skills-sync-workflow
Apr 11, 2026
Merged

Add GitHub Actions workflow to sync skills from product repos#8
sayalinvidia merged 10 commits intoNVIDIA:mainfrom
sayalinvidia:add-skills-sync-workflow

Conversation

@sayalinvidia
Copy link
Copy Markdown
Collaborator

Implements the automated sync pipeline (Step 5 of onboarding) that sparse-checkouts the skills directory from each registered product repo and mirrors them into this catalog. Runs twice daily on a cron schedule and supports manual dispatch.

Registered repos: cuOpt, TensorRT-LLM, nemotron-voice-agent, NeMo Gym.

Implements the automated sync pipeline (Step 5 of onboarding) that
sparse-checkouts the skills directory from each registered product
repo and mirrors them into this catalog. Runs twice daily on a cron
schedule and supports manual dispatch.

Registered repos: cuOpt, TensorRT-LLM, nemotron-voice-agent, NeMo Gym.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sayalinvidia sayalinvidia requested a review from mosheabr as a code owner April 10, 2026 20:23
@sayalinvidia sayalinvidia marked this pull request as draft April 10, 2026 20:27
Copy link
Copy Markdown
Collaborator

@mosheabr mosheabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start on the sync workflow, Sayali — the sparse-checkout approach and idempotent commit logic are solid. A few things to address before this is ready to merge:

Critical

  1. Cross-repo auth will fail for private reposactions/checkout@v4 uses the default GITHUB_TOKEN, which only has access to NVIDIA/skills. If any of the product repos (NVIDIA/cuopt, NVIDIA/TensorRT-LLM, etc.) are private, the checkout steps will 403. You'll need a PAT or GitHub App token:

    with:
      token: ${{ secrets.SKILLS_SYNC_PAT }}
  2. Data loss risk if a checkout fails — Each product block does rm -rf skills/<product> before rsync. If the checkout step fails (repo moved, branch renamed, transient error), you've deleted the existing catalog copy with nothing to replace it. Fix: guard the rm -rf so it only runs when .tmp/<product>/skills/ actually exists and is non-empty, or move the delete into a conditional.

  3. Missing NeMo Evaluator — The catalog currently lists 5 products (cuOpt, TensorRT-LLM, Nemotron Voice Agent, NeMo Gym, NeMo Evaluator). The workflow only syncs 4 — NeMo Evaluator needs a block added.

Important

  1. Direct push to main bypasses branch protection — Consider using peter-evans/create-pull-request@v6 to open a PR instead of pushing directly, so changes can be reviewed before landing.

  2. No fault isolation — If one product checkout fails, the entire job fails and no other products get synced. Consider continue-on-error: true on each checkout step, or a matrix strategy per product.

  3. No concurrency control — If a manual dispatch overlaps with a cron run, two pushes could race. Add:

    concurrency:
      group: sync-skills
      cancel-in-progress: true

Minor

  1. rm -rf + rsync --delete is redundantrsync --delete already handles file removals from source. The rm -rf + mkdir -p before it is unnecessary.

  2. Static commit message"chore: sync skills from product repos" doesn't indicate which products changed. Would be helpful to include a summary.

  3. No failure notification — If the cron sync silently fails, nobody knows. Consider adding a Slack or email notification step on failure.

sayalinvidia and others added 7 commits April 10, 2026 14:20
Critical fixes:
- Use SKILLS_SYNC_PAT secret for all product repo checkouts (default
  GITHUB_TOKEN will 403 on private repos)
- Guard rm -rf behind existence + non-empty checks so a failed
  checkout preserves the existing catalog copy instead of deleting it
- Add missing products from upstream README: Model-Optimizer,
  Megatron-Core, Megatron-Bridge, NeMo Evaluator (Launcher + Evaluator
  synced into separate catalog directories to avoid conflicts)

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rsync --delete already removes destination files not present in the
source. The rm -rf + mkdir -p before each rsync was unnecessary —
mkdir -p alone handles the first-ever run.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add continue-on-error: true to each checkout step so a single repo
failure (transient 503, repo renamed, branch deleted) does not block
the remaining products from syncing. The existing non-empty guard on
each copy step already handles the case where a checkout produced
nothing.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If a manual workflow_dispatch overlaps with a scheduled cron run,
two jobs could race and produce conflicting pushes. The concurrency
group ensures only one sync runs at a time, cancelling the in-progress
run if a new one is triggered.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the direct commit-and-push to main with
peter-evans/create-pull-request@v6. Changes now land on a
automated/sync-skills branch and open a PR for review, respecting
branch protection rules.

The action handles idempotency — if no files changed, no PR is
created. The branch is auto-deleted after merge.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each copy step now logs which products were synced. The PR title
includes the product names (e.g. "chore: sync skills (cuOpt,
TensorRT-LLM)") and the body lists them with the trigger source.
Replaces the static "chore: sync skills from product repos" message.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the workflow fails, a GitHub issue is automatically created with
a link to the failed run, the trigger type, and a sync-failure label.
This ensures silent cron failures get noticed instead of drifting
undetected.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sync log was initialized with echo "" which wrote a blank line,
causing a leading comma in the product list. Use truncate -s 0 to
create a truly empty file instead.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@mosheabr mosheabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great revision, Sayali. All 9 items from the first review are addressed: PAT auth, data loss guards, fault isolation, concurrency, PR-based commits, dynamic commit messages, and failure notifications. This is solid.

One thing to add before merging: CUDA-Q was just merged into the catalog (#7). The sync workflow needs a block for it:

# -- CUDA-Q --
- name: Checkout CUDA-Q
  continue-on-error: true
  uses: actions/checkout@v4
  with:
    repository: NVIDIA/cuda-quantum
    ref: main
    path: .tmp/cuda-quantum
    token: ${{ secrets.SKILLS_SYNC_PAT }}
    sparse-checkout: |
      .claude/skills/

- name: Copy CUDA-Q skills into catalog
  run: |
    if [ -d ".tmp/cuda-quantum/.claude/skills" ] && [ -n "$(ls -A .tmp/cuda-quantum/.claude/skills)" ]; then
      mkdir -p skills/CUDA-Q
      rsync -a --delete .tmp/cuda-quantum/.claude/skills/ skills/CUDA-Q/
      echo "- CUDA-Q" >> /tmp/synced-products.txt
    else
      echo "⚠ CUDA-Q checkout empty or missing — skipping to preserve existing catalog"
    fi

Once that's added, this is ready to go.

CUDA-Q was merged into the catalog (NVIDIA#7). Add checkout + copy block
for NVIDIA/cuda-quantum → skills/CUDA-Q.

Signed-off-by: Sayali Kandarkar <skandarkar@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mosheabr
Copy link
Copy Markdown
Collaborator

CUDA-Q block looks good. All 10 products covered, all review items addressed. This is ready to merge whenever you mark it ready for review.

@sayalinvidia
Copy link
Copy Markdown
Collaborator Author

Thank you @mosheabr
Added CUDA-Q (NVIDIA/cuda-quantum) to the sync workflow as well!

@sayalinvidia sayalinvidia marked this pull request as ready for review April 11, 2026 00:10
@sayalinvidia sayalinvidia merged commit b729ebb into NVIDIA:main Apr 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants