Fix assertion failure with sub-8B kernel arguments#34
Conversation
In KernelArgHelper::computeKernargData(), the explicit_args_length was
computed as roundArgsLength(last_explicit_arg_offset + size), rounding up
to the next 8-byte boundary. When a kernel has pointer args followed by
32-bit int args (e.g. hgemm_kernel(half*, half*, half*, int, int, int)),
the last explicit arg in the instrumented kernel (dh_comms*) is placed at a
4-byte-aligned offset. The rounded explicit_args_length overshot the actual
hidden-args boundary, so kernarg_length - explicit_args_length became smaller
than clone_hidden_args_length, triggering:
assert(desc.clone_hidden_args_length <=
desc.kernarg_length - desc.explicit_args_length)
Fix: use the offset of the FIRST hidden argument from the kernel metadata
as explicit_args_length. This is the exact explicit/hidden boundary with no
rounding required. Fall back to roundArgsLength only for kernels that have
no hidden arguments (e.g. some Triton kernels), where rounding is harmless.
Also:
- Add sub8b_args_test.cpp regression test kernel (ptr + int args)
- Wire it into CMakeLists.txt and run_basic_tests.sh
Co-authored-by: rwvo <21990117+rwvo@users.noreply.github.com>
|
Closing this PR. The assertion failure described in #32 was already fixed in commit 705ece5, which added a The PR's approach of reading the first hidden argument's offset directly from metadata is theoretically more robust (it would handle a hypothetical ABI where hidden args aren't 8-byte aligned), but doesn't address a current bug. The regression test kernel is a good idea, but as written it only checks that omniprobe doesn't crash — it doesn't verify that the integer arguments are passed through correctly to the instrumented kernel. A proper test should read back results and/or print argument values to confirm correctness with and without instrumentation. |
When a kernel has pointer arguments followed by sub-8B arguments (e.g.
int), the injecteddh_comms*in the instrumented clone lands at a 4-byte-aligned offset, so the explicit args end at a non-8-byte-aligned boundary (e.g. 44 bytes). The oldroundArgsLength()call rounded that up to 48, overshooting the actual hidden-args start and causing:Reproducer:
Changes
src/utils.cc—KernelArgHelper::computeKernargData: use the offset of the first hidden argument from the kernel metadata asexplicit_args_length. This is the exact explicit/hidden boundary as encoded by the compiler — no rounding needed.roundArgsLengthis retained only for kernels with no hidden arguments (e.g. some Triton kernels), where it is both harmless and preserves existing behaviour.tests/test_kernels/sub8b_args_test.cpp: regression kernel with(const int*, const int*, int*, int M, int N, int K)arguments — same layout as the failing kernel above.tests/test_kernels/CMakeLists.txt,tests/run_basic_tests.sh: wire the new kernel into the build and basic test suite.Original prompt
📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.