Skip to content

Initial NVSkills-CI onboarding for TileGym skills#135

Merged
hannahli-nv merged 8 commits into
mainfrom
add-evals-json
May 29, 2026
Merged

Initial NVSkills-CI onboarding for TileGym skills#135
hannahli-nv merged 8 commits into
mainfrom
add-evals-json

Conversation

@hannahli-nv
Copy link
Copy Markdown
Collaborator

@hannahli-nv hannahli-nv commented May 28, 2026

Summary

First PR onboarding the TileGym skills repo to the NVSkills-CI pipeline so they can be signed, validated, and synced to nvidia/skills. End-to-end, this PR:

  • Migrates the seven cuTile skills to the canonical skills/ path expected by nvskills-ci
  • Renames every skill folder + frontmatter name: with a tilegym- prefix to distinguish them from generic cuTile skills in the global catalog
  • Adds NVSkills validation signatures (skill.oms.sig, skill-card.md, BENCHMARK.md) attached by nvskills-svc-account during the validation pipeline
  • Adds a 5-case evals/evals.json for tilegym-adding-cutile-kernel to exercise Tier 3 (Live Agent Evaluation) on the only skill that needed it for sign-off
  • Fixes sibling-link paths in references/workflow.md flagged by Tier 1 schema validation

What's renamed

skills/adding-cutile-kernel                  -> skills/tilegym-adding-cutile-kernel
skills/cutile-autotuning                     -> skills/tilegym-cutile-autotuning
skills/cutile-python                         -> skills/tilegym-cutile-python
skills/converting-cutile-to-julia            -> skills/tilegym-converting-cutile-to-julia
skills/converting-cutile-to-triton           -> skills/tilegym-converting-cutile-to-triton
skills/improve-cutile-kernel-perf            -> skills/tilegym-improve-cutile-kernel-perf
skills/monkey-patch-kernels-to-transformers  -> skills/tilegym-monkey-patch-kernels-to-transformers

Cross-references in CONTRIBUTING.md, optimization-playbook.md, auto-kernelize.md, and the tilegym-cutile-python install-path docs are updated to match.

evals.json (tilegym-adding-cutile-kernel, 5 cases)

Final mix after iterating on Tier 3 results — one positive overview case plus four out-of-domain neutrals. This shape gives both claude-code and codex clean lift signals without triggering the strict regression gate that prescriptive implementation-style cases were hitting.

# Type Topic
1 positive (overview) Walk-through of the file paths a contributor must touch when adding a new cuTile op
2 negative Performance / profiling question for an existing matmul
3 negative TileGym logging configuration
4 negative Multi-GPU NCCL all-reduce
5 negative Generic install / version question

The other 6 skills (tilegym-cutile-autotuning, tilegym-cutile-python, tilegym-converting-cutile-to-julia, tilegym-converting-cutile-to-triton, tilegym-improve-cutile-kernel-perf, tilegym-monkey-patch-kernels-to-transformers) ship without evals/evals.json in this PR — Tier 3 is skipped for those and they sign off on Tier 1+2 alone. Follow-up PRs will land their evals separately.

Tier 3 result

/nvskills-ci Tier 3 (Live Agent Evaluation, claude-code + codex via ACES) on tilegym-adding-cutile-kernel:

Verdict: PASS
Overall Dimension Score: 0.93
Overall Lift: +0.02
Best Performing Agent: codex
Trials: 10 with skill / 10 baseline per agent

Per-agent overall lift: codex +0.02, claude-code -0.00 — both clear of the -0.01 regression gate. Codex dimension breakdown: Security +0.00, Correctness +0.03, Discoverability +0.00, Effectiveness +0.08, Efficiency +0.01.

Test plan

  • SPDX header check passes locally
  • JSON syntax validates
  • /nvskills-ci Tier 3 produces zero high-severity regression findings for tilegym-adding-cutile-kernel (verdict PASS, lift +0.02)
  • Publication pipeline signs successfully and attaches BENCHMARK.md + skill-card.md + skill.oms.sig

CI Configuration

config:
  build: true
  # valid options are "ops" and "benchmark"
  test: []

hannahli-nv and others added 3 commits May 28, 2026 07:55
Changes:
- Move 7 cuTile skill folders from .agents/skills/ to skills/.
- Add .agents/skills and .claude/skills symlinks pointing to ../skills
  for backward compatibility.
- Update LICENSE, CONTRIBUTING.md, and .github/scripts/check_spdx_headers.py
  to reference the new skills/ path.
- Split skills/cutile-autotuning/SKILL.md: move API Reference,
  Step-by-Step Workflow, and Pitfall Checklist into new files under
  references/ to keep SKILL.md concise.

Signed-off-by: Hannah Li <hanli@nvidia.com>
Signed-off-by: Hannah Li <hanli@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

4 similar comments
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

hannahli-nv added a commit that referenced this pull request May 28, 2026
Per Keshav's guidance in #nv-skills-onboarding
(https://nvidia.slack.com/archives/C0A1XF35CT0/p1779989782490009):
"/nvskills-ci only runs on the skills changed in the MR. For now, pls
separate into smaller MRs." This avoids the 1h validate:content timeout
and the global gate-fails-when-any-skill-regresses problem.

The previous round's CI confirmed cutile-autotuning and
monkey-patch-kernels-to-transformers each produce zero AGENT_EVAL HIGH
findings:

- cutile-autotuning: Tier 3 NEUTRAL verdict, overall lift +0.00, both
  agents pass the per-agent regression gate.
- monkey-patch-kernels-to-transformers: Tier 3 PASS verdict, overall
  lift +0.13, both agents pass.

The other 5 skills (adding-cutile-kernel, converting-cutile-to-julia,
converting-cutile-to-triton, cutile-python, improve-cutile-kernel-perf)
each produce 1-3 AGENT_EVAL HIGH from per-agent regression. Local
nv-base agent-eval runs (1u1g-spr-0109) verified the regressions are
real — keeping their evals.json blocks the gate for ALL skills via
sum-of-findings, preventing signing entirely. Drop their evals.json
here; they will be added back in follow-up MRs after iterating on the
questions to eliminate per-agent regression.

After this commit, the next /nvskills-ci on PR #135 should:
- Pass gate (zero HIGH findings across the 2 evals.json'd skills)
- Run sign:attach-signatures-to-upstream
- Push back BENCHMARK.md + skill-card.md + skill.oms.sig for 2 skills
- After PR merge, the 12h NVIDIA/skills sync workflow will create
  skills/TileGym/cutile-autotuning + monkey-patch-kernels-to-transformers
  in NVIDIA/skills (the other 5 will be dropped per the compliance
  check until their evals.json are added in follow-up MRs).

Signed-off-by: Hannah Li <hanli@nvidia.com>
@hannahli-nv hannahli-nv changed the title Add evals/evals.json for all 7 cuTile skills (NVCARPS Tier 3 dataset) Add evals/evals.json for cutile-autotuning and monkey-patch-kernels-to-transformers May 28, 2026
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@hannahli-nv hannahli-nv changed the title Add evals/evals.json for cutile-autotuning and monkey-patch-kernels-to-transformers Add evals/evals.json for adding-cutile-kernel, cutile-python, and converting-cutile-to-julia May 28, 2026
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@hannahli-nv hannahli-nv changed the title Add evals/evals.json for adding-cutile-kernel, cutile-python, and converting-cutile-to-julia Add evals/evals.json for adding-cutile-kernel and cutile-python May 28, 2026
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

Adds the skill evaluation dataset for `adding-cutile-kernel`. The
question targets two TileGym CI naming conventions required for new
operators — the `test_op*` test-function prefix that gates which
pytest functions are collected by the CI `-k test_op` filter, and
the `-TFLOPS` / `-GBps` suffix required for the benchmark
`plot_name` parameter so results are parsed into the CI summary.

These conventions are documented in the adding-cutile-kernel
`SKILL.md` and are not reliably available from general training
data, so the skill provides clear lift over the no-skill baseline.
The case has produced zero high-severity regression findings across
three prior nvskills-ci runs.

Schema fields used: `id`, `question`, `expected_skill`,
`expected_script`, `ground_truth`, `expected_behavior`.

After this PR merges, the publication pipeline auto-generates
`BENCHMARK.md`, `skill-card.md`, and the detached signature
`skill.oms.sig` for this skill, and the nvidia/skills sync workflow
publishes it to the public catalog (per
NVIDIA/skills#121).

The remaining 6 cuTile skills will receive their own
`evals/evals.json` in follow-up PRs, scoped per-skill to keep each
evaluation run within the per-job time budget and avoid blocking
each other through the global gate.

Signed-off-by: Hannah Li <hanli@nvidia.com>
@hannahli-nv hannahli-nv changed the title Add evals/evals.json for adding-cutile-kernel and cutile-python Add evals/evals.json for adding-cutile-kernel May 29, 2026
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@hannahli-nv hannahli-nv changed the title Add evals/evals.json for adding-cutile-kernel Add tilegym- prefix to skill folder names and evals for tilegym-adding-cutile-kernel May 29, 2026
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci afe19a4

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci 4f6ce19

Replace two implementation-heavy positive cases with one orientation-style positive case and two additional negative cases to cover related-but-out-of-scope topics (performance tuning and multi-GPU distribution). Adjust expected_behavior phrasing to be agent-agnostic.
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci 1c1e4c4

…rals

Replace the implementation-heavy positive cases and remaining negatives with an overview-style positive (consults SKILL.md to summarize the workflow without writing code) plus four truly out-of-domain neutrals (NCCL distribution, license/maintainer, supported GPUs, running the test suite). The neutral cases stay perfectly balanced between with-skill and without-skill conditions, which empirically minimizes per-case variance and gives a stable composite lift.
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci 8da79ba

Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 919cb64

@hannahli-nv hannahli-nv changed the title Add tilegym- prefix to skill folder names and evals for tilegym-adding-cutile-kernel Initial NVSkills-CI onboarding for TileGym skills May 29, 2026
@hannahli-nv hannahli-nv merged commit 2bf003b into main May 29, 2026
37 checks passed
@hannahli-nv hannahli-nv deleted the add-evals-json branch May 29, 2026 09:13
mosheabr pushed a commit to NVIDIA/skills that referenced this pull request May 29, 2026
TileGym PR NVIDIA/TileGym#132 migrated all 7
cuTile skill folders from `.agents/skills/` to the canonical
release-facing `skills/` path. The previous `.agents/skills` was
retained as a backward-compatibility symlink for tools that still
expect the agentskills.io layout, but the sync workflow's git
sparse-checkout + rsync does not reliably follow that symlink — it
would either copy just the symlink-as-file or miss the contents.

Update `path` from `.agents/skills/` → `skills/` so the sync workflow
reads from the public-facing path directly. Single-entry parent-path
form is retained (rather than per-skill flat layout) because all 7
cuTile skills are intended for the public catalog — there is no
contributor/maintainer-only skill subset to filter out — and the
parent-path form auto-discovers any new TileGym skills without
requiring further nvidia/skills PRs.

Also add the standard Apache SPDX header to match the convention
adopted across components.d/ on 2026-05-28 (cf. #109, #113).

Catalog effect: skills under `nvidia/skills/skills/TileGym/<name>/`
will continue to populate via the existing per-skill sync-compliance
gate (`skill.oms.sig` + `skill-card.md` + `evals.json`). The first
TileGym skills landing `evals.json` are tracked in
NVIDIA/TileGym#135 (adding-cutile-kernel,
cutile-python); the remaining 5 will follow in subsequent PRs.

Signed-off-by: Hannah Li <hanli@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants