Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmaCreatePool failing at startup on some Nvidia gpu + driver combos #390

Open
alecazam opened this issue Dec 13, 2023 · 6 comments
Open
Labels
question Further information is requested

Comments

@alecazam
Copy link

alecazam commented Dec 13, 2023

So our users have Nvidia 2070, 3060, 4090 gpu with low coherent memory (210 mb) failing to allocate a single 4mb uniform buffer pool. This isn't on all gpu/driver combos. And many of these like my 3070 have resizable bar. All of our other allocations are cpu or gpu only allocations. My 3070 doesn't exhibit this problem but has 8 gb of coherent memory.

The pool is one of the first things we allocate at startup. No buffers, no images, or render targets. So this OOM makes no sense to me. The only thing I can think is that the command buffers may also be allocated out of this memory. Wish there was a way to identify heap usage in Vulkan.

Found memory type 4 for uniform pool
[E] Error: ERROR_OUT_OF_DEVICE_MEMORY at call vmaCreatePool in CreateDevice, exiting app..

I reviewed the vmaCreatePool code. The only OOM error would be from vkAllocateMemory. But we barely request any memory from the COHERENT heap. There are maybe a total of this 4mb + 1mb of other allocations. I can't make this fallback to another memory type, or other parts of our engine fail with asserts.


Here are the stats on the card and heap. This driver is

Win10
GpuDevices NVIDIA GeForce RTX 2070 with Max-Q Design (0x10de:0x1f50)
DriverVersion:545.92.0.0 (0x88570000)

MemHeap0 size:7.83 gb, flags:DEVICE_LOCAL(0x1)
MemHeap0 flags:DEVICE_LOCAL(0x1)

MemHeap1 size:15.94 gb, flags:(0x0)
MemHeap1 flags:(0x0)
MemHeap1 flags:HOST_VISIBLE,HOST_COHERENT(0x6)
MemHeap1 flags:HOST_VISIBLE,HOST_COHERENT,HOST_CACHED(0xe)

MemHeap2 size:0.21 gb, flags:DEVICE_LOCAL(0x1)
MemHeap2 flags:DEVICE_LOCAL,HOST_VISIBLE,HOST_COHERENT(0x7) <- type 4

MemHeaps Local:8.04 Coherent:0.21 NonLocal:15.94

@alecazam
Copy link
Author

alecazam commented Dec 13, 2023

If I can't rely on allocating any amount of memory from a heap that reports 210 mb, then I'm not quite sure how to build an app. I realize these are heap totals, and not free memory. I realize that we're not the only app using the gpu, but having a basic allocation like this fail at startup is hard to resolve. Worse case, I can probably fixup code to fallback to HOST instead instead DEVICE_LOCAL memory if an OOM error occurs at startup.

@adam-sawicki-a
Copy link
Contributor

adam-sawicki-a commented Dec 14, 2023

Regarding ReBAR: This is not something you either have or don't have. If supported by the GPU + motherboard + rest of the system, you can enable/disable it in the BIOS/UEFI at system startup, so you should be able to test it yourself without ReBAR enabled.

Regarding available memory: Vulkan does provide a way to query for available budget per memory heap - see extension VK_EXT_memory_budget, which is wrapped in VMA by function vmaGetHeapBudgets. Returned VmaBudget::budget should be the amount of memory you can probably safely allocate out of a specific heap (total, not free, so including the memory you already allocated), while VmaBudget::usage should be the amount already allocated by your process. According to my experiments, usage includes implicit allocations like descriptors, command lists, pipelines, etc.

If you are pretty certain that is error code is returned from Vulkan, then this problem is not a bug in VMA - it is out of scope of this project and I think it should rather be discussed in a repository about Vulkan.

Although VMA works cross-platform and cross-GPU-vendor, I work at AMD, so for issues specific to Nvidia I recommend to reach out to one of their Developer Technology Engineers. Hopefully they can provide more information about what is happening on their platform.

I think that when ReBAR is not enabled and you see only the classic 256 MB BAR memory (DEVICE_LOCAL + HOST_VISIBLE), you should not rely on it in your app. Please note that older Nvidia drivers didn't even expose this heap at all. It should be safe to fall back to non-DEVICE_LOCAL memory, as DEVICE_LOCAL is just a hint about the possible better performance when accessing the memory from the GPU and it doesn't provide any new capabilities that you wouldn't have without it.

You may find these articles useful:
https://asawicki.info/news_1740_vulkan_memory_types_on_pc_and_how_to_use_them
https://asawicki.info/news_1696_vulkan_with_dxgi_-_experiment_results

@alecazam
Copy link
Author

Ah, the budget thing is useful. I'm initializing that for VMA, but have never looked at those numbers. It's never clear if I need to init it, since it's folded into 1.1 already. I'll try to add those budgets to my logs.

I'll also try to see if there's a control for ReBAR. I didn't see one in the HP bios last I looked, but may not have a consistent name.

I'll just try to summarize any findings in this issue. So nothing to act upon until I get to the bottom of all this. And I may just fallback to HOST memory instead of DEVICE_LOCAL for that pool if OOM occurs.

@adam-sawicki-a adam-sawicki-a added the question Further information is requested label Dec 15, 2023
@alecazam
Copy link
Author

alecazam commented Feb 1, 2024

I'm back revisiting this. Seems to mostly occur on Win11. Like it's locked down all of memory type 4, and then VMA can't allocate our 4MB uniform pool. It's unclear how to map memory type 4 to one of the heaps, but I assume it's just the heaps in their sequential order. In all your examples, the ordinal starts at 0 on each heap. So that makes it hard to relate what VMA is looking up for a COHERENT + UNIFORM_BIT lookup. And I think I want memory type 2 if that fails, but not sure how to sway vmaFindMemoryTypeIndexForBufferInfo to pick 2 as the fallback.

            PoolInfo infos[ Memory::Pool_MAX ] = {
		   { "Uniforms", VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VMA_MEMORY_USAGE_AUTO, kUniformDynamicBufferSize }, // 4mb
	   };
            // unclear why we need to set this with auto, but otherwise they're not host-coherent uniforms
            // Want memory type 4, and without this get back 1 which is only device_local.  That requires upload.
	    createInfo.requiredFlags |= VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;

           for ( uint32_t i=0; i<Memory::Pool_MAX; ++i )
	   {
		   const auto& info = infos[i];
		   bufferInfo.usage = info.bufferUsage;
		   createInfo.usage = info.memoryUsage;
		   poolCreateInfo.flags = VMA_POOL_CREATE_IGNORE_BUFFER_IMAGE_GRANULARITY_BIT;		// Since we don't mix textures in these pools
		   
		   poolCreateInfo.minBlockCount = 1;
		   poolCreateInfo.maxBlockCount = 0;
		   poolCreateInfo.blockSize = info.blockSize; 

		   vmaFindMemoryTypeIndexForBufferInfo( m_memory.allocator, &bufferInfo, &createInfo, &poolCreateInfo.memoryTypeIndex );

		   GFX_INFO( "Found memory type %d for uniform pool", poolCreateInfo.memoryTypeIndex );
		   
		   VkResult res = vmaCreatePool( m_memory.allocator, &poolCreateInfo, &m_memory.pool[ i ] );

                    // This is failing on a Nvidia 1650, 3070, 4090 trying to allocate type 4, find out why.
		   // Don't want to assert, or state isn't set.
		   if ( res == VK_ERROR_OUT_OF_DEVICE_MEMORY )
		   {
                         // TODO: here's where I need to drop to another memory type
			return false;
                    }

This is just one startup failure of many. But most of the Nvidia cards work properly. AMD and Intel don't hit this.

OSVersion: Win11 10.0.22631   <- It’s mostly Win11.
CpuMemory: 32.53
CpuName: 13th Gen Intel(R) Core(TM) i7-13700K, 3.40 GHz
CpuCores: 8/8, 16/24 HT
GpuDevices NVIDIA GeForce RTX 4090 (0x10de:0x2684) 4.00+ GB, Intel UHD Graphics 770 (0x8086:0xa780) 1.00 GB
Available GPUs
	gpu[0] = DeviceName:NVIDIA GeForce RTX 4090 (0x00002684) DeviceType:GpuDiscrete
	gpu[1] = DeviceName:Intel(R) UHD Graphics 770 (0x0000A780) DeviceType:GpuIntegrated

// local memory, this lives on gpu (can’t use for uniforms without an upload pass)
MemHeap0 size:23.58 gb, flags:DEVICE_LOCAL(0x1)
MemHeap0 flags:DEVICE_LOCAL(0x1) <- Type 1? 

// non local memory
MemHeap1 size:15.88 gb, flags:(0x0)
MemHeap1 flags:(0x0) <- no type?
MemHeap1 flags:HOST_VISIBLE,HOST_COHERENT(0x6) <- Type 2?  (can this be used for uniforms?)
MemHeap1 flags:HOST_VISIBLE,HOST_COHERENT,HOST_CACHED(0xe) <- Type 3? 

// device local, but system has somehow locked most of this down, and we can’t allocate 4mb block / 256
MemHeap2 size:0.21 gb, flags:DEVICE_LOCAL(0x1)
MemHeap2 flags:DEVICE_LOCAL,HOST_VISIBLE,HOST_COHERENT(0x7) <- Type 4? 

MemHeaps Local:23.79 Coherent:0.21 NonLocal:15.88
MemHeap Low coherent heap detected

Found memory type 4 for uniform pool
[E] Error: ERROR_OUT_OF_DEVICE_MEMORY at call vmaCreatePool in CreateDevice, exiting app..

No fallback, so app quits. We can't use the split device/host buffer uniforms, since our code doesn't handle the copy from device to host on uniforms. I messed up the memory_budget test trying to fix Android, so I don't have that reported above. This isn't my card or setup, it's from a user. On our next release, I'll finally have that info on some of these crashes.

@alecazam
Copy link
Author

alecazam commented Feb 1, 2024

Digging into the pool allocation. VMA finds type 2 and type 4, but picks 4 since its cost is 0 vs. 1. But since this heap type is full, that doesn’t work and VMA pool allocation fails. So I think I need to set |= VMA_MEMORY_USAGE_PREFER_HOST so that it picks type 2 on the failure case.

That seems to work, and falls back to type 2. Note this is all on 1.3 drivers, and so is using the budget and buffer memory requirements function.

@mbechard
Copy link

mbechard commented Jun 5, 2024

Doesn't the do-while loop in VmaAllocator_T::AllocateMemory() try out type=2 if type=4 fails though? You shouldn't need to explicitly set PREFER_HOST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants