Skip to content

Update TileGym Julia kernels to cuTile 0.2#102

Merged
hannahli-nv merged 3 commits intoNVIDIA:mainfrom
maleadt:cutile_0.2
Apr 10, 2026
Merged

Update TileGym Julia kernels to cuTile 0.2#102
hannahli-nv merged 3 commits intoNVIDIA:mainfrom
maleadt:cutile_0.2

Conversation

@maleadt
Copy link
Copy Markdown
Contributor

@maleadt maleadt commented Apr 9, 2026

Description

Also align with both the cuTile.jl examples and the Python TileGym implementations, and update the skills accordingly.

cuTile 0.2 API changes:

  • ct.full() (which doesn't exist in the Julia API) → zeros(), fill() — standard Julia constructors now work in kernels via overlays
  • Keyword argument ct.load/ct.store style, matching all cuTile.jl examples
  • while loops → native for loops where applicable
  • ct.num_tiles() and size(arr, dim) inside kernels instead of passing pre-computed values as arguments
  • Scalar-tile broadcasting (y .* alpha) instead of creating full tiles for scalar operations
  • ct.@compiler_options for kernel-level hints (occupancy, num_ctas) instead of launch kwargs

Alignment with Python TileGym:

  • Matmul: added 2D swizzle with 1D grid, matching swizzle_2d in the Python matmul
  • Softmax chunked: rewritten from ct.load/ct.store to ct.gather/ct.scatter with check_bounds=true and padding_value=-Inf, matching the Python chunked softmax
  • Softmax TMA/online: padding_mode=NegInf on loads, matching Python's padding_mode=NEG_INF / padding_value=-math.inf
  • Compiler options match Python decorators (@ct.kernel(occupancy=4)ct.@compiler_options occupancy=4)

Julia idioms:

  • Matmul layout changed from A(K,M), B(N,K), C(N,M) to standard A(M,K), B(K,N), C(M,N) — tests now verify A * B instead of B * A
  • Host functions accept CuArrays with output-first convention (matmul!(C, A, B)) instead of raw Int pointers with unsafe_wrap
  • Removed underscore prefixes on kernel function names
  • Tests simplified: no manual padding, no pointer arithmetic

CI Configuration

config:
  build: true
  # valid options are "ops", "benchmark", and "sanity"
  test: []

Checklist

  • Code formatted and imports sorted via repo specifications (./format.sh)
  • Documentation updated (if needed)
  • CI configuration reviewed

cc @0xtaruhi

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hannahli-nv
Copy link
Copy Markdown
Collaborator

Hi @maleadt , thank you so much for this contribution! Really appreciate you taking the time to update the Julia-related codes in TileGym to the cuTile 0.2 API.

One small item before we can merge: since this is your first contribution to TileGym, we need you to submit a Contributor License Agreement (CLA). You can find it at LICENSES/CLA.md in the repo. Please fill it out and email it to TileGym@nvidia.com.

Sorry for the inconvenience. It's a standard licensing requirement we need to fulfill for all first-time contributors. Once that's on file, we'll get this merged.

Thanks again!

@hannahli-nv
Copy link
Copy Markdown
Collaborator

/ok to test dd59577

@hannahli-nv
Copy link
Copy Markdown
Collaborator

/ok to test 7d6adb7

@maleadt
Copy link
Copy Markdown
Contributor Author

maleadt commented Apr 10, 2026

One small item before we can merge: since this is your first contribution to TileGym, we need you to submit a Contributor License Agreement (CLA). You can find it at LICENSES/CLA.md in the repo. Please fill it out and email it to TileGym@nvidia.com.

Done.

@hannahli-nv hannahli-nv merged commit 8d25f72 into NVIDIA:main Apr 10, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants