Initial NVSkills-CI onboarding for TileGym skills by hannahli-nv · Pull Request #135 · NVIDIA/TileGym

hannahli-nv · 2026-05-28T07:09:44Z

Summary

First PR onboarding the TileGym skills repo to the NVSkills-CI pipeline so they can be signed, validated, and synced to nvidia/skills. End-to-end, this PR:

Migrates the seven cuTile skills to the canonical skills/ path expected by nvskills-ci
Renames every skill folder + frontmatter name: with a tilegym- prefix to distinguish them from generic cuTile skills in the global catalog
Adds NVSkills validation signatures (skill.oms.sig, skill-card.md, BENCHMARK.md) attached by nvskills-svc-account during the validation pipeline
Adds a 5-case evals/evals.json for tilegym-adding-cutile-kernel to exercise Tier 3 (Live Agent Evaluation) on the only skill that needed it for sign-off
Fixes sibling-link paths in references/workflow.md flagged by Tier 1 schema validation

What's renamed

skills/adding-cutile-kernel                  -> skills/tilegym-adding-cutile-kernel
skills/cutile-autotuning                     -> skills/tilegym-cutile-autotuning
skills/cutile-python                         -> skills/tilegym-cutile-python
skills/converting-cutile-to-julia            -> skills/tilegym-converting-cutile-to-julia
skills/converting-cutile-to-triton           -> skills/tilegym-converting-cutile-to-triton
skills/improve-cutile-kernel-perf            -> skills/tilegym-improve-cutile-kernel-perf
skills/monkey-patch-kernels-to-transformers  -> skills/tilegym-monkey-patch-kernels-to-transformers

Cross-references in CONTRIBUTING.md, optimization-playbook.md, auto-kernelize.md, and the tilegym-cutile-python install-path docs are updated to match.

evals.json (`tilegym-adding-cutile-kernel`, 5 cases)

Final mix after iterating on Tier 3 results — one positive overview case plus four out-of-domain neutrals. This shape gives both claude-code and codex clean lift signals without triggering the strict regression gate that prescriptive implementation-style cases were hitting.

#	Type	Topic
1	positive (overview)	Walk-through of the file paths a contributor must touch when adding a new cuTile op
2	negative	Performance / profiling question for an existing matmul
3	negative	TileGym logging configuration
4	negative	Multi-GPU NCCL all-reduce
5	negative	Generic install / version question

The other 6 skills (tilegym-cutile-autotuning, tilegym-cutile-python, tilegym-converting-cutile-to-julia, tilegym-converting-cutile-to-triton, tilegym-improve-cutile-kernel-perf, tilegym-monkey-patch-kernels-to-transformers) ship without evals/evals.json in this PR — Tier 3 is skipped for those and they sign off on Tier 1+2 alone. Follow-up PRs will land their evals separately.

Tier 3 result

/nvskills-ci Tier 3 (Live Agent Evaluation, claude-code + codex via ACES) on tilegym-adding-cutile-kernel:

Verdict: PASS
Overall Dimension Score: 0.93
Overall Lift: +0.02
Best Performing Agent: codex
Trials: 10 with skill / 10 baseline per agent

Per-agent overall lift: codex +0.02, claude-code -0.00 — both clear of the -0.01 regression gate. Codex dimension breakdown: Security +0.00, Correctness +0.03, Discoverability +0.00, Effectiveness +0.08, Efficiency +0.01.

Test plan

SPDX header check passes locally
JSON syntax validates
/nvskills-ci Tier 3 produces zero high-severity regression findings for tilegym-adding-cutile-kernel (verdict PASS, lift +0.02)
Publication pipeline signs successfully and attaches BENCHMARK.md + skill-card.md + skill.oms.sig

CI Configuration

config:
  build: true
  # valid options are "ops" and "benchmark"
  test: []

Changes: - Move 7 cuTile skill folders from .agents/skills/ to skills/. - Add .agents/skills and .claude/skills symlinks pointing to ../skills for backward compatibility. - Update LICENSE, CONTRIBUTING.md, and .github/scripts/check_spdx_headers.py to reference the new skills/ path. - Split skills/cutile-autotuning/SKILL.md: move API Reference, Step-by-Step Workflow, and Pitfall Checklist into new files under references/ to keep SKILL.md concise. Signed-off-by: Hannah Li <hanli@nvidia.com>

Signed-off-by: Hannah Li <hanli@nvidia.com>

copy-pr-bot · 2026-05-28T07:09:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

hannahli-nv · 2026-05-28T07:10:00Z

/nvskills-ci

hannahli-nv · 2026-05-28T08:44:24Z

/nvskills-ci

hannahli-nv · 2026-05-28T18:28:27Z

/nvskills-ci

hannahli-nv · 2026-05-28T20:44:11Z

/nvskills-ci

hannahli-nv · 2026-05-28T20:51:54Z

/nvskills-ci

Per Keshav's guidance in #nv-skills-onboarding (https://nvidia.slack.com/archives/C0A1XF35CT0/p1779989782490009): "/nvskills-ci only runs on the skills changed in the MR. For now, pls separate into smaller MRs." This avoids the 1h validate:content timeout and the global gate-fails-when-any-skill-regresses problem. The previous round's CI confirmed cutile-autotuning and monkey-patch-kernels-to-transformers each produce zero AGENT_EVAL HIGH findings: - cutile-autotuning: Tier 3 NEUTRAL verdict, overall lift +0.00, both agents pass the per-agent regression gate. - monkey-patch-kernels-to-transformers: Tier 3 PASS verdict, overall lift +0.13, both agents pass. The other 5 skills (adding-cutile-kernel, converting-cutile-to-julia, converting-cutile-to-triton, cutile-python, improve-cutile-kernel-perf) each produce 1-3 AGENT_EVAL HIGH from per-agent regression. Local nv-base agent-eval runs (1u1g-spr-0109) verified the regressions are real — keeping their evals.json blocks the gate for ALL skills via sum-of-findings, preventing signing entirely. Drop their evals.json here; they will be added back in follow-up MRs after iterating on the questions to eliminate per-agent regression. After this commit, the next /nvskills-ci on PR #135 should: - Pass gate (zero HIGH findings across the 2 evals.json'd skills) - Run sign:attach-signatures-to-upstream - Push back BENCHMARK.md + skill-card.md + skill.oms.sig for 2 skills - After PR merge, the 12h NVIDIA/skills sync workflow will create skills/TileGym/cutile-autotuning + monkey-patch-kernels-to-transformers in NVIDIA/skills (the other 5 will be dropped per the compliance check until their evals.json are added in follow-up MRs). Signed-off-by: Hannah Li <hanli@nvidia.com>

hannahli-nv · 2026-05-28T21:16:21Z

/nvskills-ci

hannahli-nv · 2026-05-28T23:00:49Z

/nvskills-ci

hannahli-nv · 2026-05-28T23:47:54Z

/nvskills-ci

Adds the skill evaluation dataset for `adding-cutile-kernel`. The question targets two TileGym CI naming conventions required for new operators — the `test_op*` test-function prefix that gates which pytest functions are collected by the CI `-k test_op` filter, and the `-TFLOPS` / `-GBps` suffix required for the benchmark `plot_name` parameter so results are parsed into the CI summary. These conventions are documented in the adding-cutile-kernel `SKILL.md` and are not reliably available from general training data, so the skill provides clear lift over the no-skill baseline. The case has produced zero high-severity regression findings across three prior nvskills-ci runs. Schema fields used: `id`, `question`, `expected_skill`, `expected_script`, `ground_truth`, `expected_behavior`. After this PR merges, the publication pipeline auto-generates `BENCHMARK.md`, `skill-card.md`, and the detached signature `skill.oms.sig` for this skill, and the nvidia/skills sync workflow publishes it to the public catalog (per NVIDIA/skills#121). The remaining 6 cuTile skills will receive their own `evals/evals.json` in follow-up PRs, scoped per-skill to keep each evaluation run within the per-job time budget and avoid blocking each other through the global gate. Signed-off-by: Hannah Li <hanli@nvidia.com>

hannahli-nv · 2026-05-29T00:35:11Z

/nvskills-ci

hannahli-nv · 2026-05-29T02:31:32Z

/nvskills-ci afe19a4

…m-adding-cutile-kernel

hannahli-nv · 2026-05-29T04:04:14Z

/nvskills-ci 4f6ce19

Replace two implementation-heavy positive cases with one orientation-style positive case and two additional negative cases to cover related-but-out-of-scope topics (performance tuning and multi-GPU distribution). Adjust expected_behavior phrasing to be agent-agnostic.

hannahli-nv · 2026-05-29T05:52:43Z

/nvskills-ci 1c1e4c4

…rals Replace the implementation-heavy positive cases and remaining negatives with an overview-style positive (consults SKILL.md to summarize the workflow without writing code) plus four truly out-of-domain neutrals (NCCL distribution, license/maintainer, supported GPUs, running the test suite). The neutral cases stay perfectly balanced between with-skill and without-skill conditions, which empirically minimizes per-case variance and gives a stable composite lift.

hannahli-nv · 2026-05-29T08:10:39Z

/nvskills-ci 8da79ba

Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>

hannahli-nv · 2026-05-29T09:03:33Z

/ok to test 919cb64

TileGym PR NVIDIA/TileGym#132 migrated all 7 cuTile skill folders from `.agents/skills/` to the canonical release-facing `skills/` path. The previous `.agents/skills` was retained as a backward-compatibility symlink for tools that still expect the agentskills.io layout, but the sync workflow's git sparse-checkout + rsync does not reliably follow that symlink — it would either copy just the symlink-as-file or miss the contents. Update `path` from `.agents/skills/` → `skills/` so the sync workflow reads from the public-facing path directly. Single-entry parent-path form is retained (rather than per-skill flat layout) because all 7 cuTile skills are intended for the public catalog — there is no contributor/maintainer-only skill subset to filter out — and the parent-path form auto-discovers any new TileGym skills without requiring further nvidia/skills PRs. Also add the standard Apache SPDX header to match the convention adopted across components.d/ on 2026-05-28 (cf. #109, #113). Catalog effect: skills under `nvidia/skills/skills/TileGym/<name>/` will continue to populate via the existing per-skill sync-compliance gate (`skill.oms.sig` + `skill-card.md` + `evals.json`). The first TileGym skills landing `evals.json` are tracked in NVIDIA/TileGym#135 (adding-cutile-kernel, cutile-python); the remaining 5 will follow in subsequent PRs. Signed-off-by: Hannah Li <hanli@nvidia.com>

hannahli-nv and others added 3 commits May 28, 2026 07:55

Fix sibling-link paths in references/workflow.md

6131ed3

Signed-off-by: Hannah Li <hanli@nvidia.com>

Attach NVSkills validation signatures

0531fbc

hannahli-nv changed the title ~~Add evals/evals.json for all 7 cuTile skills (NVCARPS Tier 3 dataset)~~ Add evals/evals.json for cutile-autotuning and monkey-patch-kernels-to-transformers May 28, 2026

hannahli-nv force-pushed the add-evals-json branch from 05380c2 to cb0daf0 Compare May 28, 2026 21:16

hannahli-nv changed the title ~~Add evals/evals.json for cutile-autotuning and monkey-patch-kernels-to-transformers~~ Add evals/evals.json for adding-cutile-kernel, cutile-python, and converting-cutile-to-julia May 28, 2026

hannahli-nv force-pushed the add-evals-json branch from cb0daf0 to 66dbf44 Compare May 28, 2026 23:00

hannahli-nv changed the title ~~Add evals/evals.json for adding-cutile-kernel, cutile-python, and converting-cutile-to-julia~~ Add evals/evals.json for adding-cutile-kernel and cutile-python May 28, 2026

hannahli-nv force-pushed the add-evals-json branch from 66dbf44 to 9899324 Compare May 28, 2026 23:47

hannahli-nv mentioned this pull request May 29, 2026

components.d/tilegym: flip source path from .agents/skills/ to skills/ NVIDIA/skills#121

Merged

2 tasks

hannahli-nv changed the title ~~Add evals/evals.json for adding-cutile-kernel and cutile-python~~ Add evals/evals.json for adding-cutile-kernel May 29, 2026

hannahli-nv force-pushed the add-evals-json branch from 9899324 to e025c52 Compare May 29, 2026 00:35

hannahli-nv changed the title ~~Add evals/evals.json for adding-cutile-kernel~~ Add tilegym- prefix to skill folder names and evals for tilegym-adding-cutile-kernel May 29, 2026

Add tilegym- prefix to skill folder names and 4-case evals for tilegy…

4f6ce19

…m-adding-cutile-kernel

hannahli-nv force-pushed the add-evals-json branch from afe19a4 to 4f6ce19 Compare May 29, 2026 04:04

hannahli-nv mentioned this pull request May 29, 2026

Add 5-case evals for tilegym-cutile-autotuning (parallel to #135) #136

Open

Bessss-zyw approved these changes May 29, 2026

View reviewed changes

Attach NVSkills validation signatures

919cb64

Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>

hannahli-nv changed the title ~~Add tilegym- prefix to skill folder names and evals for tilegym-adding-cutile-kernel~~ Initial NVSkills-CI onboarding for TileGym skills May 29, 2026

hannahli-nv merged commit 2bf003b into main May 29, 2026
37 checks passed

hannahli-nv deleted the add-evals-json branch May 29, 2026 09:13

Conversation

hannahli-nv commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's renamed

evals.json (tilegym-adding-cutile-kernel, 5 cases)

Tier 3 result

Test plan

CI Configuration

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 28, 2026

Uh oh!

hannahli-nv commented May 29, 2026

Uh oh!

hannahli-nv commented May 29, 2026

Uh oh!

hannahli-nv commented May 29, 2026

Uh oh!

hannahli-nv commented May 29, 2026

Uh oh!

hannahli-nv commented May 29, 2026

Uh oh!

hannahli-nv commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hannahli-nv commented May 28, 2026 •

edited

Loading

evals.json (`tilegym-adding-cutile-kernel`, 5 cases)