Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Shared host/external memory between multiple physical devices #614

Closed
larso0 opened this Issue Nov 7, 2017 · 5 comments

Comments

Projects
None yet
5 participants
@larso0
Copy link

larso0 commented Nov 7, 2017

Implementing an efficient way to copy an image from one GPU to another is difficult. My current implementation to do that is to create a linear tiling staging image on each GPU and memcpy between them. This becomes increasingly a bottleneck once resolution increases.

I understand that copying from device local memory to device local memory on a different GPU might not be possible. However, I fail to see why two VkDevice objects can't share the same host-visible memory. What I want is the capability to copy images like this: device local memory(GPU1) -> vkCmdCopyImage(GPU1) -> shared memory -> vkCmdCopyImage(GPU2) -> device local memory(GPU2). I want to avoid having to map two staging images and copy between them.

I could achieve what I want with VK_KHR_external_memory*, if not all the external memory handle types available required that the VkDevice objects that use the handle must have matching device UUID (VkPhysicalDeviceIDPropertiesKHR::deviceUUID). So I have to use the same GPU for all the VkDevice objects that use the external memory, which defeats the purpose of what I'm trying to achieve (unlinked multi-GPU parallel rendering).

I suggest expanding VK_KHR_external_memory* with another memory handle type:
VK_EXTERNAL_MEMORY_HANDLE_TYPE_GENERAL_BIT_KHR, VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_BIT_KHR, or something similar. In addition to the handle type, you could add a VkImportMemoryGeneralHandleKHR/VkImportMemoryHostHandleKHR, similar to VkImportMemoryWin32HandleInfoKHR, that allowed importing just a plain C pointer as handle. Example usage:

void* externalMemory = malloc(SIZE);

VkImportMemoryGeneralHandleKHR import;
import.sType = VK_STRUCTURE_TYPE_IMPORT_MEMORY_GENERAL_HANDLE_KHR;
import.pNext = NULL;
import.handle = externalMemory;

VkMemoryAllocateInfo allocateInfo;
allocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocateInfo.pNext = &import;
allocateInfo.allocationSize = SIZE;
...
@NicolBolas

This comment has been minimized.

Copy link

NicolBolas commented Nov 8, 2017

However, I fail to see why two VkDevice objects can't share the same host-visible memory.

What if one device has different memory alignments than another? What if one device uses write-combined memory for a particular memory type while another does not?

What you want requires devices to explicitly cooperate on this sort of thing. Which requires some form of inter-device protocol, whereby a physical device driver has to be able to talk to other physical device drivers on the system, so they can agree on how to allocate sharable memory.

Also, I think this would best be done by creating a whole new memory type flag for them. That way, you're not forcing drivers to allocate memory in less efficient ways, simply because a user might use that allocation for sharing.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Nov 8, 2017

VK_MEMORY_PROPERTY_HOST_CACHED_BIT seems like it should be shareable (i.e. it seems to be the plain old main memory). The cache should really be able to be std::moved to and from C++. Potentially I would want to use an output from Vulkan on host without having to copy it out or keeping it "mapped"(whatever that means if both the map and the memory are actually in main memory).

Then again it probably is not that useful for this case (as there are still too many potentially unnecessary middle-men for the copy). There should be a better way.

@NicolBolas

This comment has been minimized.

Copy link

NicolBolas commented Nov 8, 2017

Cached memory is not enough. First, a device is not required to be able to use cached memory. The requirement is that there is at least one host-visible memory type. That doesn't have to be cached.

Second, even if two devices can use cached memory, that doesn't mean they can use each other's memory. The memory has to be allocated and set up in ways that the implementation expects and requires. That is, the separate devices may add additional parameters to the underlying allocation that are incompatible with each other.

That's why I think it should be a separate memory type. That way, implementations are able to put more restrictions on how that memory can be used.

Potentially I would want to use an output from Vulkan on host without having to copy it out or keeping it "mapped"(whatever that means if both the map and the memory are actually in main memory).

Mapping, in the context of Vulkan, means making a piece of memory available to direct CPU access. From a hardware perspective, that means assigning CPU virtual memory pages to the allocation of memory.

@larso0

This comment has been minimized.

Copy link
Author

larso0 commented Nov 9, 2017

A separate memory type would be nice. Maybe add a VK_MEMORY_PROPERTY_HOST_SHARED_BIT flag and some way to share the memory with another VkDevice. That's what I've been trying to do with VK_KHR_external_memory. Some way or another this might have to be an extension struct added to the VkMemoryAllocateInfo::pNext chain (correct me if I'm wrong). That's why I've suggested to add another handle type to the external extension. But it might be doable if the vkBind*Memory functions would support binding a VkDeviceMemory object (with the correct memory type) from a different device.

@drakos-amd

This comment has been minimized.

Copy link

drakos-amd commented Dec 4, 2017

Depending on the target platform, some recently published EXT extensions allow sharing memory between different physical devices.

VK_EXT_external_memory_host enables importing host allocations or host-mapped foreign device memory using a host pointer as the handle.

VK_EXT_external_memory_dma_buf enables importing dma_buf handles on Linux which can possibly come from another physical device.

The spec now also has a table where it's listed which external memory handle types require a matching physical device and which don't.

Additionally, I'd also like to draw your attention to additional features which enable execution control across multiple physical devices. At least on Linux (and possibly other POSIX based systems) semaphores and fences can be shared across physical devices if the FENCE_FD and SYNC_FD handle types are used. These are part of the KHR external semaphore/fence extensions.

@larso0 larso0 closed this Dec 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.