-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method that copies memory only if it is in a different memory #28
Comments
We will try and tackle this for 0.9 since it was mentioned by CMS as nice-to-have. |
Is this issue about zero-copying between two buffers, if they reside on the same device? If yes, this should be similar to We definitely should have this functionality in alpaka. I needed it a couple of times already. I would propose an API similar to Kokkos, but use the target device explicitely: auto buf = alpaka::allocBuf(devA, ...);
auto buf2 = alpaka::mirrorBuf(buf, devA); // buf2 shares the same memory with buf
alpaka::memcpy(buf, buf2); // zero-copy, does nothing
auto buf3 = alpaka::mirrorBuf(buf, devB); // buf3 may share the same memory, depending on whether devA and devB share memory
alpaka::memcpy(buf, buf2); // maybe zero-copy, maybe deep copy The typical use case would be that |
Hi @bernhardmgruber, I'm not familiar with Kokkos' approach, but if you write something like auto buf3 = alpaka::mirrorBuf(buf, devB); // buf3 may share the same memory, depending on whether devA and devB share memory I would expect that Instead, from the description of the issue I understand that the intended behaviour is that the copy is a one-time action, and that it should simply be elided if the two buffers are on the same device ? Something like auto src = alpaka::allocBuf(devA, ...);
auto dst = alpaka::allocBuf(devB, ...);
// if the two buffers are on a different device, copy the content, otherwise alias the buffer
if constexpr(std::is_same_v<decltype(devA), decltype(devB)>) {
if (devA == devB) {
dst = src;
} else {
alpaka::memcpy(dst, src);
} else {
alpaka::memcpy(dst, src);
} If I understood correctly, I think that adding a new kind of buffer or view may not be the best approach, and instead we could just add an alternative to Something like (for lack of a better name) template <...>
alpaka::convey(TQueue queue, TViewDst&& dst, TViewSrc const& src)
{
...
} |
Maybe something like |
@fwyzard That is roughly the implementation I had in mind! Maybe we can also allow e.g. different CPU devices to also have this sharing behavior or, whenever unified shared memory is implemented, have it for CPU/GPU as well. You are right that
Yes, no new buffer type. Yes, I want to have a smarter
Valid name! |
I agree... however we should keep in mind that this would be an API with a different behaviour depending on the accelerator. For example:
So one has to be aware of the difference when writing generic code ! On the other hand, I wouldn't know how to make it more obvious from the API itself, apart from documenting the behaviour :-) |
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
IMO to model the ideas described in this issue we need memory visibility information within alpaka. My point is that we must extent first the descriptive of our Buffers/Views to model a solution for this issue. |
From what I just read, the Frontier/Crusher approach is not very dissimilar from unified/managed memory. The differences are that
That's an interesting approach, and it should be possible to handle it in the same way as unified/managed memory. So, between the different backends, the possible cases would be:
Standard and pinned host memory are handled by a Standard device memory is handled by a Managed memory probably needs a new kind of buffer type, in order to expose new methods to migrate the memory across the host and device(s). Maybe a more general approach could be to extend the interface of the buffer classes to include an additional device (or list of additional devices) from where the memory can be accessed ? But IMHO all this is orthogonal to the method discussed here: we would simply not implement the |
Further comment from @fwyzard (#1820 (comment)):
|
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Having thought about this for a bit, it should be the latter. If the user intends to have a full copy they should use |
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Adds an overload set `makeAvailable` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
I was wondering myself today and this lead to the question what it actually means for a buffer to be e.g. a I came across this question when considering what the type of auto buf1 = alpaka::allocMappedBuf(devHost, ...);
auto buf2 = alpaka::makeAvailable(queue, devCuda, buf1); Because Notice that I did not start talking about managed memory yet and whether it needs its own data type or not. We somewhat have that problem today already. Well, we can ignore the problem largely if we don't merge and evolve #1820, but still: The buffer data type does not tell us where memory is accessible from. The reason we did not need to address this issue until now is that alpaka can type-erase all that information by calling So, maybe managed memory needs a new kind of buffer type. And maybe such a general "MultiDeviceBuffer" is also needed to be returned from
@psychocoderHPC suggested that as well to me again yesterday. This may be a proper solution. It also covers the case that we can add and remove devices to this list, when we map/unmap memory to additional devices. However, what is the difference then between a
Yes and no. We can ignore that for now though and that is still useful, as is the state of #1820. There, we just perform a zero-copy if the source and destination device are the same. This already coveres the important use case of avoiding the copies in a host->device copy, kernel, device->host copy program. However, going further and allowing |
Adds an overload set `makeAvailable` that only copies a buffer if the destination device requires the buffer in a different memory space. Otherwise, no copy is performed and just the handles are adjusted. Fixes: alpaka-group#28
Some random ideas about what I think we may need to distinguish "make available for reading once" and "make available for efficiently working with"
For the first case, here called
For the second case, here called
In this table, On the other hand |
This allows some methods that have const memory to prevent double buffering on the host, a method
copyIfDifferentMem
should be implemented.The text was updated successfully, but these errors were encountered: