Update TileGym Julia kernels to cuTile 0.2#102
Conversation
|
Hi @maleadt , thank you so much for this contribution! Really appreciate you taking the time to update the Julia-related codes in TileGym to the cuTile 0.2 API. One small item before we can merge: since this is your first contribution to TileGym, we need you to submit a Contributor License Agreement (CLA). You can find it at LICENSES/CLA.md in the repo. Please fill it out and email it to TileGym@nvidia.com. Sorry for the inconvenience. It's a standard licensing requirement we need to fulfill for all first-time contributors. Once that's on file, we'll get this merged. Thanks again! |
|
/ok to test dd59577 |
|
/ok to test 7d6adb7 |
Done. |
Description
Also align with both the cuTile.jl examples and the Python TileGym implementations, and update the skills accordingly.
cuTile 0.2 API changes:
ct.full()(which doesn't exist in the Julia API) →zeros(),fill()— standard Julia constructors now work in kernels via overlaysct.load/ct.storestyle, matching all cuTile.jl exampleswhileloops → nativeforloops where applicablect.num_tiles()andsize(arr, dim)inside kernels instead of passing pre-computed values as argumentsy .* alpha) instead of creating full tiles for scalar operationsct.@compiler_optionsfor kernel-level hints (occupancy, num_ctas) instead of launch kwargsAlignment with Python TileGym:
swizzle_2din the Python matmulct.load/ct.storetoct.gather/ct.scatterwithcheck_bounds=trueandpadding_value=-Inf, matching the Python chunked softmaxpadding_mode=NegInfon loads, matching Python'spadding_mode=NEG_INF/padding_value=-math.inf@ct.kernel(occupancy=4)→ct.@compiler_options occupancy=4)Julia idioms:
A(K,M), B(N,K), C(N,M)to standardA(M,K), B(K,N), C(M,N)— tests now verifyA * Binstead ofB * ACuArrays with output-first convention (matmul!(C, A, B)) instead of rawIntpointers withunsafe_wrapCI Configuration
Checklist
./format.sh)cc @0xtaruhi