Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCC development should not be abandoned #1098

Closed
misos1 opened this issue Mar 28, 2019 · 4 comments
Closed

HCC development should not be abandoned #1098

misos1 opened this issue Mar 28, 2019 · 4 comments

Comments

@misos1
Copy link

misos1 commented Mar 28, 2019

HCC is great language. I personally do not like CUDA-like style used in HIP. Is HCC not used as backend for HIP? Will not be then HIP based on deprecated and abandoned backend with no more development and bugfixing? HIP is missing many great features of HCC like C++ host api, automatic memory "movements" (instead of manually moving memory between GPU and CPU using hipMemcpy), RAII smart pointers (array_view) with automatic deallocation, running kernel from lambda function, lambda captures (capturing variables into kernels instead of manually passing them as kernel arguments), functors as kernels and so on.

@scchan
Copy link
Collaborator

scchan commented Mar 29, 2019

We'll continue to develop and support HIP but not the hc:: API and its programming model.
We don't see how hc:: would fit in the future of C++ for GPUs. ISO C++ is heading into a completely different direction with the executor proposals. C++AMP never gained any traction and seems to be disappearing. We think this is the right moment for us to stop developing hc:: and focus in other areas.
This is an open source project so someone interested could create a fork and continue to develop it. :)

@scchan scchan closed this as completed Mar 29, 2019
@bluescarni
Copy link

bluescarni commented Apr 8, 2019

We'll continue to develop and support HIP but not the hc:: API and its programming model.
We don't see how hc:: would fit in the future of C++ for GPUs. ISO C++ is heading into a completely different direction with the executor proposals. C++AMP never gained any traction and seems to be disappearing. We think this is the right moment for us to stop developing hc:: and focus in other areas.

Could you expand on what "other areas" means exactly? Does this potentially include an effort from AMD to support open standards such as SYCL or perhaps ISO C++ executors?

Personally, HCC has been the single reason why I started investing in GPU computing. As someone who usually writes modern C++, the hc:: API has been an absolute blast to work with - automatic memory management, ability to use modern C++ idioms in the kernels, out-of-the-box support for various std classes, lambda functions, etc.

Apart from all its proprietary baggage, the other reason I never invested heavily in CUDA was the C-like API. HCC filled a perhaps small (but I believe with a large potential for growth) niche combining modern C++ with the computational power of modern GPUs. It seems like now we have to take a few steps back and go back to a CUDA-like API with HIP, or perhaps OpenCL. This is a great disappointment.

So my question is: is AMD planning to support some sort of API based on modern C++ idioms, or is the plan now just to follow the mantra of CUDA compatibility with HIP?

@pfultz2
Copy link
Contributor

pfultz2 commented Apr 8, 2019

You can launch hcc-style kernels with lambdas in HIP with this function:

struct index
{
    std::size_t global;
    std::size_t local;
    std::size_t group;
};

template <class F>
__global__ void launcher(F f)
{
    index idx{blockIdx.x * blockDim.x + threadIdx.x, threadIdx.x, blockIdx.x};
    f(idx);
}

inline auto launch(hipStream_t stream, std::size_t global, std::size_t local=64)
{
    return [=](auto f) {
        assert(local > 0);
        assert(global > 0);
        using f_type = decltype(f);
        dim3 nblocks(global / local);
        dim3 nthreads(local);
        hipLaunchKernelGGL((launcher<f_type>), nblocks, nthreads, 0, stream, f);
    };
}

Then you can launch kernels with:

launch(stream, n)([=](auto idx) {
    y[idx.global] = x[idx.global];
});

It just wont take care of memory movement like hcc would, and there is still no C++ memory management.

@misos1
Copy link
Author

misos1 commented Apr 22, 2019

Actually most things which I mentioned like automatic memory "movements" and RAII allocation/deallocation can be done by using standard C++ idioms and based on some C-like api like HIP or HSA as is done in hc array_view.

Only thing about which I was really afraid was lambda kernels but is good to realise that even this is doable in C++ even when we have api that does not directly supports lambda functions. The key thing here is to be able to have single-source model and C++ language in kernels so the key is compiler that supports this so I would not be surprised that HCC can be "simulated" by using header only "library" solution over HIP. Also I would not be surprised that SYCL can be implemented in such way over HCC or HIP (this actually exists https://gpuopen.com/compute-product/hipsycl/ which is actually not header only as it needs to add __device__ attributes, but little annotated SYCL can be supported using header only, also this page should be updated as seems hipSYCL already supports non-annotated regular SYCL - https://github.com/illuhad/hipSYCL).

So I am interested about future of rocm. Now is HIP like "library" over HCC and HCC has compiler that supports single-source model and uses HSA C api. Will be HCC later skipped and there will be direct HIP compiler? Or HIP will be based on something other like C++20/23 executors?
And will be then possible to use standard clang compiler to compile parallel C++ code with "kernels" (using C++ executors) for amd gpu?
(based on HSA? there definitely must be some support from OS like HSA in linux kernel, like also C++ threads and other features needs support from OS)

Also seems with HSA should be possible to have device side enqueues (which OpenCL 2.0 supports). Because kernel dispatch is done by simply writing some bytes into some memory (which seems is accessible from GPU). I already tested this and it is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants