Initial NVSkills-CI onboarding for TileGym skills#135
Merged
Conversation
Changes: - Move 7 cuTile skill folders from .agents/skills/ to skills/. - Add .agents/skills and .claude/skills symlinks pointing to ../skills for backward compatibility. - Update LICENSE, CONTRIBUTING.md, and .github/scripts/check_spdx_headers.py to reference the new skills/ path. - Split skills/cutile-autotuning/SKILL.md: move API Reference, Step-by-Step Workflow, and Pitfall Checklist into new files under references/ to keep SKILL.md concise. Signed-off-by: Hannah Li <hanli@nvidia.com>
Signed-off-by: Hannah Li <hanli@nvidia.com>
Collaborator
Author
|
/nvskills-ci |
4 similar comments
Collaborator
Author
|
/nvskills-ci |
Collaborator
Author
|
/nvskills-ci |
Collaborator
Author
|
/nvskills-ci |
Collaborator
Author
|
/nvskills-ci |
hannahli-nv
added a commit
that referenced
this pull request
May 28, 2026
Per Keshav's guidance in #nv-skills-onboarding (https://nvidia.slack.com/archives/C0A1XF35CT0/p1779989782490009): "/nvskills-ci only runs on the skills changed in the MR. For now, pls separate into smaller MRs." This avoids the 1h validate:content timeout and the global gate-fails-when-any-skill-regresses problem. The previous round's CI confirmed cutile-autotuning and monkey-patch-kernels-to-transformers each produce zero AGENT_EVAL HIGH findings: - cutile-autotuning: Tier 3 NEUTRAL verdict, overall lift +0.00, both agents pass the per-agent regression gate. - monkey-patch-kernels-to-transformers: Tier 3 PASS verdict, overall lift +0.13, both agents pass. The other 5 skills (adding-cutile-kernel, converting-cutile-to-julia, converting-cutile-to-triton, cutile-python, improve-cutile-kernel-perf) each produce 1-3 AGENT_EVAL HIGH from per-agent regression. Local nv-base agent-eval runs (1u1g-spr-0109) verified the regressions are real — keeping their evals.json blocks the gate for ALL skills via sum-of-findings, preventing signing entirely. Drop their evals.json here; they will be added back in follow-up MRs after iterating on the questions to eliminate per-agent regression. After this commit, the next /nvskills-ci on PR #135 should: - Pass gate (zero HIGH findings across the 2 evals.json'd skills) - Run sign:attach-signatures-to-upstream - Push back BENCHMARK.md + skill-card.md + skill.oms.sig for 2 skills - After PR merge, the 12h NVIDIA/skills sync workflow will create skills/TileGym/cutile-autotuning + monkey-patch-kernels-to-transformers in NVIDIA/skills (the other 5 will be dropped per the compliance check until their evals.json are added in follow-up MRs). Signed-off-by: Hannah Li <hanli@nvidia.com>
05380c2 to
cb0daf0
Compare
Collaborator
Author
|
/nvskills-ci |
cb0daf0 to
66dbf44
Compare
Collaborator
Author
|
/nvskills-ci |
Collaborator
Author
|
/nvskills-ci |
66dbf44 to
9899324
Compare
2 tasks
Adds the skill evaluation dataset for `adding-cutile-kernel`. The question targets two TileGym CI naming conventions required for new operators — the `test_op*` test-function prefix that gates which pytest functions are collected by the CI `-k test_op` filter, and the `-TFLOPS` / `-GBps` suffix required for the benchmark `plot_name` parameter so results are parsed into the CI summary. These conventions are documented in the adding-cutile-kernel `SKILL.md` and are not reliably available from general training data, so the skill provides clear lift over the no-skill baseline. The case has produced zero high-severity regression findings across three prior nvskills-ci runs. Schema fields used: `id`, `question`, `expected_skill`, `expected_script`, `ground_truth`, `expected_behavior`. After this PR merges, the publication pipeline auto-generates `BENCHMARK.md`, `skill-card.md`, and the detached signature `skill.oms.sig` for this skill, and the nvidia/skills sync workflow publishes it to the public catalog (per NVIDIA/skills#121). The remaining 6 cuTile skills will receive their own `evals/evals.json` in follow-up PRs, scoped per-skill to keep each evaluation run within the per-job time budget and avoid blocking each other through the global gate. Signed-off-by: Hannah Li <hanli@nvidia.com>
9899324 to
e025c52
Compare
Collaborator
Author
|
/nvskills-ci |
Collaborator
Author
|
/nvskills-ci afe19a4 |
…m-adding-cutile-kernel
afe19a4 to
4f6ce19
Compare
Collaborator
Author
|
/nvskills-ci 4f6ce19 |
Replace two implementation-heavy positive cases with one orientation-style positive case and two additional negative cases to cover related-but-out-of-scope topics (performance tuning and multi-GPU distribution). Adjust expected_behavior phrasing to be agent-agnostic.
Collaborator
Author
|
/nvskills-ci 1c1e4c4 |
Bessss-zyw
approved these changes
May 29, 2026
…rals Replace the implementation-heavy positive cases and remaining negatives with an overview-style positive (consults SKILL.md to summarize the workflow without writing code) plus four truly out-of-domain neutrals (NCCL distribution, license/maintainer, supported GPUs, running the test suite). The neutral cases stay perfectly balanced between with-skill and without-skill conditions, which empirically minimizes per-case variance and gives a stable composite lift.
Collaborator
Author
|
/nvskills-ci 8da79ba |
Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
Collaborator
Author
|
/ok to test 919cb64 |
mosheabr
pushed a commit
to NVIDIA/skills
that referenced
this pull request
May 29, 2026
TileGym PR NVIDIA/TileGym#132 migrated all 7 cuTile skill folders from `.agents/skills/` to the canonical release-facing `skills/` path. The previous `.agents/skills` was retained as a backward-compatibility symlink for tools that still expect the agentskills.io layout, but the sync workflow's git sparse-checkout + rsync does not reliably follow that symlink — it would either copy just the symlink-as-file or miss the contents. Update `path` from `.agents/skills/` → `skills/` so the sync workflow reads from the public-facing path directly. Single-entry parent-path form is retained (rather than per-skill flat layout) because all 7 cuTile skills are intended for the public catalog — there is no contributor/maintainer-only skill subset to filter out — and the parent-path form auto-discovers any new TileGym skills without requiring further nvidia/skills PRs. Also add the standard Apache SPDX header to match the convention adopted across components.d/ on 2026-05-28 (cf. #109, #113). Catalog effect: skills under `nvidia/skills/skills/TileGym/<name>/` will continue to populate via the existing per-skill sync-compliance gate (`skill.oms.sig` + `skill-card.md` + `evals.json`). The first TileGym skills landing `evals.json` are tracked in NVIDIA/TileGym#135 (adding-cutile-kernel, cutile-python); the remaining 5 will follow in subsequent PRs. Signed-off-by: Hannah Li <hanli@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First PR onboarding the TileGym skills repo to the NVSkills-CI pipeline so they can be signed, validated, and synced to
nvidia/skills. End-to-end, this PR:skills/path expected bynvskills-ciname:with atilegym-prefix to distinguish them from generic cuTile skills in the global catalogskill.oms.sig,skill-card.md,BENCHMARK.md) attached bynvskills-svc-accountduring the validation pipelineevals/evals.jsonfortilegym-adding-cutile-kernelto exercise Tier 3 (Live Agent Evaluation) on the only skill that needed it for sign-offreferences/workflow.mdflagged by Tier 1 schema validationWhat's renamed
Cross-references in
CONTRIBUTING.md,optimization-playbook.md,auto-kernelize.md, and thetilegym-cutile-pythoninstall-path docs are updated to match.evals.json (
tilegym-adding-cutile-kernel, 5 cases)Final mix after iterating on Tier 3 results — one positive overview case plus four out-of-domain neutrals. This shape gives both
claude-codeandcodexclean lift signals without triggering the strict regression gate that prescriptive implementation-style cases were hitting.The other 6 skills (
tilegym-cutile-autotuning,tilegym-cutile-python,tilegym-converting-cutile-to-julia,tilegym-converting-cutile-to-triton,tilegym-improve-cutile-kernel-perf,tilegym-monkey-patch-kernels-to-transformers) ship withoutevals/evals.jsonin this PR — Tier 3 is skipped for those and they sign off on Tier 1+2 alone. Follow-up PRs will land their evals separately.Tier 3 result
/nvskills-ciTier 3 (Live Agent Evaluation, claude-code + codex via ACES) ontilegym-adding-cutile-kernel:Per-agent overall lift: codex
+0.02, claude-code-0.00— both clear of the-0.01regression gate. Codex dimension breakdown: Security+0.00, Correctness+0.03, Discoverability+0.00, Effectiveness+0.08, Efficiency+0.01.Test plan
/nvskills-ciTier 3 produces zero high-severity regression findings fortilegym-adding-cutile-kernel(verdict PASS, lift+0.02)BENCHMARK.md+skill-card.md+skill.oms.sigCI Configuration