Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Unresolved extern function 'cudaLaunchDevice' error while using NVCC 11.x and cub 2.10 with -G #692

Closed
lilohuang opened this issue May 17, 2023 · 3 comments

Comments

@lilohuang
Copy link
Contributor

Hi @senior-zero @allisonvacanti ,

I hope you are doing well. Today, I encountered a CUDA compilation failure, specifically an "Unresolved extern function 'cudaLaunchDevice'," while using CUDA 11.6.2 SDK in conjunction with thrust/cub 2.1.0 or trunk versions. This issue arose after enabling the device code debugging information (-G). It is worth noting that there are no problems with the default thrust/cub version included in CUDA SDK 11.6.2. Furthermore, I have observed that this particular compiler error does not exist in the CUDA 12.x compiler.

AFAIK, thrust/cub 2.1.0 should be compatible with CUDA 11.0 and above (i.e., https://github.com/NVIDIA/cub#supported-compilers). I have also uploaded a minimal bug reproducer to https://cuda.godbolt.org/z/7j39EToxh. Could you please take a look at it and provide guidance on how to resolve the bug using CUDA SDK 11.6.2 and thrust/cub 2.1.0?

#include <thrust/device_vector.h>
#include <cub/cub.cuh>

static_assert(THRUST_VERSION >= 200100, "thrust version must be >= 2.10");

int main() {
   constexpr const auto num_items = 10;
   thrust::device_vector<int> d_in(num_items, 1);
   thrust::device_vector<int> d_out(num_items);
   size_t temp_storage_bytes = 0;
   cub::DeviceScan::ExclusiveSum(
      nullptr,
      temp_storage_bytes,
      thrust::raw_pointer_cast(d_in.data()),
      thrust::raw_pointer_cast(d_out.data()),
      num_items);
   thrust::device_vector<char> d_temp_storage(temp_storage_bytes);
   cub::DeviceScan::ExclusiveSum(
      thrust::raw_pointer_cast(d_temp_storage.data()),
      temp_storage_bytes,
      thrust::raw_pointer_cast(d_in.data()),
      thrust::raw_pointer_cast(d_out.data()),
      num_items);
   for (auto value : d_out) {
      std::cout << value << std::endl;
   }

   return 0;
}
@gevtushenko
Copy link
Collaborator

Hello @lilohuang and thank you for reporting the issue! A bit simpler reproducer with the same behavior:

#include <nv/target>

template <class T> __device__ int device(T);

template <class T>
__device__ int doit_device(T val) {
  return device(val);
}

template <class T>
__host__ int doit_host(T val) {
  return 0;
}

template <class T>
int doit(T val) 
{
  NV_IF_TARGET(NV_IS_HOST,
                (return doit_host(val);),
                (return doit_device(val);));
}

int main() {
  return doit(42);
}

It seems that we have to guard the doit_device in Thrust to be something along the following lines:

template<class K, class... Args>
cudaError_t __device__
doit_device(K k, Args const&... args) const
{
  #ifdef THRUST_RDC_ENABLED
  const size_t size = argument_pack_size(0,args...);
  void *param_buffer = cudaGetParameterBuffer(64,size);
  fill_arguments((char*)param_buffer, 0, args...);
  return launch_device(k, param_buffer);
  #endif
}

@lilohuang
Copy link
Contributor Author

Awesome! Thank you @senior-zero. It worked! 💯 Please close this issue after merging NVIDIA/thrust#1939

@gevtushenko
Copy link
Collaborator

Merged NVIDIA/thrust#1939

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants