Skip to content

cleanup: Remove dead mask variable and make bounds checking explicit in GELU kernel#99

Merged
hannahli-nv merged 1 commit intomainfrom
fix/gelu-oob-bounds-check
Apr 8, 2026
Merged

cleanup: Remove dead mask variable and make bounds checking explicit in GELU kernel#99
hannahli-nv merged 1 commit intomainfrom
fix/gelu-oob-bounds-check

Conversation

@hannahli-nv
Copy link
Copy Markdown
Collaborator

@hannahli-nv hannahli-nv commented Apr 7, 2026

Summary

Clean up the GELU cuTile kernel (gelu_kernel_ct):

  • Remove unused mask variable — it was computed via ct.less(offsets, n_elements) but never passed to gather or scatter
  • Make bounds-checking behavior explicit by passing padding_value=0 to ct.gather and check_bounds=True to ct.scatter

Note: ct.gather and ct.scatter both default to check_bounds=True, so the kernel was already safe from out-of-bounds access. This change just removes dead code and makes the intent explicit, consistent with other cuTile kernels in the codebase (e.g., swiglu, softmax, silu_and_mul).

Closes #93

CI Configuration

config:
  build: true
  # valid options are "ops" and "benchmark"
  test: ["ops", "benchmark"]

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 4ce95c8

@hannahli-nv hannahli-nv changed the title fix: Add bounds checking to GELU kernel gather/scatter cleanup: Remove dead mask variable and make bounds checking explicit in GELU kernel Apr 7, 2026
@hannahli-nv hannahli-nv requested a review from xjmxyt April 7, 2026 07:24
@hannahli-nv hannahli-nv enabled auto-merge (squash) April 8, 2026 02:51
The GELU cuTile kernel computed a bounds mask but never applied it,
causing out-of-bounds memory reads on gather and out-of-bounds writes
on scatter when n_elements is not a multiple of BLOCK_SIZE.

Fix by adding padding_value=0 to ct.gather (safe OOB reads) and
check_bounds=True to ct.scatter (skip OOB writes), consistent with
other cuTile kernels in the codebase.
@hannahli-nv hannahli-nv force-pushed the fix/gelu-oob-bounds-check branch from 4ce95c8 to 14f1a00 Compare April 8, 2026 03:24
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 14f1a00

@hannahli-nv hannahli-nv merged commit 02bc7fb into main Apr 8, 2026
11 checks passed
@hannahli-nv hannahli-nv deleted the fix/gelu-oob-bounds-check branch April 8, 2026 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Apply mask to gather in gelu

2 participants