Skip to content

Expand Developer Guide section on arch symbol visibility with complete example and past learnings  #4323

@jrhemstad

Description

@jrhemstad

This issue can be closed by expanding the CUB Developer Guide section on symbol visibility with simple, but complete sketch of a template free function that ultimately invokes a kernel, e.g.,

template <typename ...>
auto bar(...){
   kernel<<<...>>>(...);
}


template <typename ....>
auto foo(...){
   return bar(...);
}

but with the appropriate annotations, arch-specific guarding, etc.

It should satisfy the following scenarios:

libA.so and libB.so both include <foo_kernel> and invoke with same template arguments:

libA.so compiled with -arch=sm_80

#include <foo>
auto A(){ return foo(...);}

libB.so compiled with -arch=sm_90

#include <foo>
auto B(){ return foo(...);}
// main.cu
#include <A.h>
#include <B.h>

int main(){
   if( arch == 80 )
     A();
   else if(arch == 90)
     B();
}

Calling A() should always invoke kernel for sm_80, and likewise B() calls kernel for sm_90.

This code example should be compiled and run as part of our regular CI. It should incorporate any testing that @gevtushenko developed in https://gitlab-master.nvidia.com/gevtushenko/cub_visibility that detect problematic scenarios.

This should pull in any additional learnings from @gevtushenko original investigation in https://gitlab-master.nvidia.com/gevtushenko/cub_visibility such that this repo can be closed and the developer guide is the source of truth. This should reflect the updates in CUDA 13 with __global__ implying static.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions