Conversation
f6d8716 to
4d9a54b
Compare
…HipPrintf
The device library (hipspv-spirv64.bc) is compiled as OpenCL C, producing
@llvm.used entries with `ptr` (addrspace 0). HIP kernel modules produce
@llvm.used entries with `ptr addrspace(4)`. LLVM's linker refuses to merge
these during -mlink-builtin-bitcode and reports "Appending variables with
different element types" for any HIP TU that also uses __attribute__((used))
(notably rocThrust's runtime_static_assert.h).
Fix part 1 (bitcode): drop `static + __attribute__((used))` from
_cl_print_str.cl and texture.cl. Without `used`, hipspv.bc no longer
emits an @llvm.used global, so the addrspace collision goes away.
Fix part 2 (llvm_passes/HipPrintf.cpp): the historical pull-in for
_cl_print_str was the @llvm.used global itself — once it is gone,
-mlink-builtin-bitcode (LinkOnlyNeeded) no longer pulls _cl_print_str
into the kernel module, because HipPrintf inserts the call only AFTER
that step runs. Define the body of _cl_print_str inline in HipPrintf's
getOrCreatePrintStringF() via IRBuilder so the kernel module is
self-contained for printf %s. Equivalent C:
void _cl_print_str(__generic const char *S) {
if (S == 0) return;
unsigned Pos = 0;
char C;
while ((C = S[Pos]) != 0) { printf("%c", C); ++Pos; }
}
This avoids any runtime device-library link path for printf %s, which
is important on backends whose OpenCL driver rejects clLinkProgram on
the equivalent SPIR-V (e.g. Mali-G52).
Fixes rocThrust generate_const_iterators build failure and any HIP code
that uses `__attribute__((used)) __device__` variables. Adds reproducer
TestFixLlvmUsedAddrspace.hip.
4d9a54b to
6812405
Compare
Collaborator
Author
|
/run-aurora-ci |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The device library (
hipspv-spirv64.bc) is compiled as OpenCL C, producing@llvm.usedwithptr(addrspace 0). HIP kernel modules produce@llvm.usedwith
ptr addrspace(4). LLVM's linker refuses to merge these during-mlink-builtin-bitcodeand reports"Appending variables with different element types"for any HIP TU that also uses__attribute__((used))—notably rocThrust's
runtime_static_assert.h.Fix
Two coordinated changes:
bitcode/_cl_print_str.cl,bitcode/texture.cl— dropstatic + __attribute__((used))from the 10 affected functions. Withoutused,hipspv.bcno longer emits an@llvm.usedglobal, so theaddrspace collision with HIP TUs disappears.
llvm_passes/HipPrintf.cpp—getOrCreatePrintStringF()now definesthe body of
_cl_print_strinline in the kernel module viaIRBuilderinstead of inserting only an external declaration.
Why (2) is required: the historical mechanism that pulled
_cl_print_strinto kernel modules was the
@llvm.usedglobal itself, not function-levelliveness. Once
@llvm.usedis gone (from change 1),-mlink-builtin-bitcode(which uses
LinkOnlyNeeded) no longer pulls_cl_print_strin, becauseHipPrintfonly inserts the call to it after that step runs. Definingthe body inside
HipPrintfmakes the kernel module self-contained forprintf %sand avoids any runtime device-library link path — importanton backends whose OpenCL driver rejects
clLinkProgramon the equivalentSPIR-V (e.g. Mali-G52, observed
CL_INVALID_OPERATION).Impact
generate_const_iteratorsbuild failure.__attribute__((used)) __device__variables.printf %s: kernel modules now carry their own_cl_print_strbody.Test plan
TestFixLlvmUsedAddrspace.hip— FAIL before, PASS after.Unit_printf_specifierandPrintfDynamiccontinue to pass on everybackend in CI (Intel CPU/GPU, PoCL on macOS, Mali via Salami).
and
unit-tests-llvm-18-release-salami.