-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HCC development should not be abandoned #1098
Comments
We'll continue to develop and support HIP but not the hc:: API and its programming model. |
Could you expand on what "other areas" means exactly? Does this potentially include an effort from AMD to support open standards such as SYCL or perhaps ISO C++ executors? Personally, HCC has been the single reason why I started investing in GPU computing. As someone who usually writes modern C++, the hc:: API has been an absolute blast to work with - automatic memory management, ability to use modern C++ idioms in the kernels, out-of-the-box support for various Apart from all its proprietary baggage, the other reason I never invested heavily in CUDA was the C-like API. HCC filled a perhaps small (but I believe with a large potential for growth) niche combining modern C++ with the computational power of modern GPUs. It seems like now we have to take a few steps back and go back to a CUDA-like API with HIP, or perhaps OpenCL. This is a great disappointment. So my question is: is AMD planning to support some sort of API based on modern C++ idioms, or is the plan now just to follow the mantra of CUDA compatibility with HIP? |
You can launch hcc-style kernels with lambdas in HIP with this function: struct index
{
std::size_t global;
std::size_t local;
std::size_t group;
};
template <class F>
__global__ void launcher(F f)
{
index idx{blockIdx.x * blockDim.x + threadIdx.x, threadIdx.x, blockIdx.x};
f(idx);
}
inline auto launch(hipStream_t stream, std::size_t global, std::size_t local=64)
{
return [=](auto f) {
assert(local > 0);
assert(global > 0);
using f_type = decltype(f);
dim3 nblocks(global / local);
dim3 nthreads(local);
hipLaunchKernelGGL((launcher<f_type>), nblocks, nthreads, 0, stream, f);
};
} Then you can launch kernels with: launch(stream, n)([=](auto idx) {
y[idx.global] = x[idx.global];
}); It just wont take care of memory movement like hcc would, and there is still no C++ memory management. |
Actually most things which I mentioned like automatic memory "movements" and RAII allocation/deallocation can be done by using standard C++ idioms and based on some C-like api like HIP or HSA as is done in hc array_view. Only thing about which I was really afraid was lambda kernels but is good to realise that even this is doable in C++ even when we have api that does not directly supports lambda functions. The key thing here is to be able to have single-source model and C++ language in kernels so the key is compiler that supports this so I would not be surprised that HCC can be "simulated" by using header only "library" solution over HIP. Also I would not be surprised that SYCL can be implemented in such way over HCC or HIP (this actually exists https://gpuopen.com/compute-product/hipsycl/ which is actually not header only as it needs to add So I am interested about future of rocm. Now is HIP like "library" over HCC and HCC has compiler that supports single-source model and uses HSA C api. Will be HCC later skipped and there will be direct HIP compiler? Or HIP will be based on something other like C++20/23 executors? Also seems with HSA should be possible to have device side enqueues (which OpenCL 2.0 supports). Because kernel dispatch is done by simply writing some bytes into some memory (which seems is accessible from GPU). I already tested this and it is great. |
HCC is great language. I personally do not like CUDA-like style used in HIP. Is HCC not used as backend for HIP? Will not be then HIP based on deprecated and abandoned backend with no more development and bugfixing? HIP is missing many great features of HCC like C++ host api, automatic memory "movements" (instead of manually moving memory between GPU and CPU using hipMemcpy), RAII smart pointers (array_view) with automatic deallocation, running kernel from lambda function, lambda captures (capturing variables into kernels instead of manually passing them as kernel arguments), functors as kernels and so on.
The text was updated successfully, but these errors were encountered: