-
Notifications
You must be signed in to change notification settings - Fork 345
Description
This issue can be closed by expanding the CUB Developer Guide section on symbol visibility with simple, but complete sketch of a template free function that ultimately invokes a kernel, e.g.,
template <typename ...>
auto bar(...){
kernel<<<...>>>(...);
}
template <typename ....>
auto foo(...){
return bar(...);
}but with the appropriate annotations, arch-specific guarding, etc.
It should satisfy the following scenarios:
libA.so and libB.so both include <foo_kernel> and invoke with same template arguments:
libA.so compiled with -arch=sm_80
#include <foo>
auto A(){ return foo(...);}
libB.so compiled with -arch=sm_90
#include <foo>
auto B(){ return foo(...);}
// main.cu
#include <A.h>
#include <B.h>
int main(){
if( arch == 80 )
A();
else if(arch == 90)
B();
}
Calling A() should always invoke kernel for sm_80, and likewise B() calls kernel for sm_90.
This code example should be compiled and run as part of our regular CI. It should incorporate any testing that @gevtushenko developed in https://gitlab-master.nvidia.com/gevtushenko/cub_visibility that detect problematic scenarios.
This should pull in any additional learnings from @gevtushenko original investigation in https://gitlab-master.nvidia.com/gevtushenko/cub_visibility such that this repo can be closed and the developer guide is the source of truth. This should reflect the updates in CUDA 13 with __global__ implying static.