-
Notifications
You must be signed in to change notification settings - Fork 412
Description
This is based on an idea from @adam-sawicki-a and @papazhang66 which came up in #419.
Intro
Currently, when FindMemoryTypeIndex is called, it has to do quite some work to get find right memory type for the specified creation parameters:
VkResult VmaAllocator_T::FindMemoryTypeIndex(
uint32_t memoryTypeBits,
const VmaAllocationCreateInfo* pAllocationCreateInfo,
VkFlags bufImgUsage,
uint32_t* pMemoryTypeIndex) const
{
memoryTypeBits &= GetGlobalMemoryTypeBits();
if(pAllocationCreateInfo->memoryTypeBits != 0)
{
memoryTypeBits &= pAllocationCreateInfo->memoryTypeBits;
}
VkMemoryPropertyFlags requiredFlags = 0, preferredFlags = 0, notPreferredFlags = 0;
if(!FindMemoryPreferences(
IsIntegratedGpu(),
*pAllocationCreateInfo,
bufImgUsage,
requiredFlags, preferredFlags, notPreferredFlags))
{
return VK_ERROR_FEATURE_NOT_PRESENT;
}
*pMemoryTypeIndex = UINT32_MAX;
uint32_t minCost = UINT32_MAX;
for(uint32_t memTypeIndex = 0, memTypeBit = 1;
memTypeIndex < GetMemoryTypeCount();
++memTypeIndex, memTypeBit <<= 1)
{
// This memory type is acceptable according to memoryTypeBits bitmask.
if((memTypeBit & memoryTypeBits) != 0)
{
const VkMemoryPropertyFlags currFlags =
m_MemProps.memoryTypes[memTypeIndex].propertyFlags;
// This memory type contains requiredFlags.
if((requiredFlags & ~currFlags) == 0)
{
// Calculate cost as number of bits from preferredFlags not present in this memory type.
uint32_t currCost = VMA_COUNT_BITS_SET(preferredFlags & ~currFlags) +
VMA_COUNT_BITS_SET(currFlags & notPreferredFlags);
// Remember memory type with lowest cost.
if(currCost < minCost)
{
*pMemoryTypeIndex = memTypeIndex;
if(currCost == 0)
{
return VK_SUCCESS;
}
minCost = currCost;
}
}
}
}
return (*pMemoryTypeIndex != UINT32_MAX) ? VK_SUCCESS : VK_ERROR_FEATURE_NOT_PRESENT;
}Improvement
The idea would be to hash all parameters used for buffer or image creation in an std::unordered_map with the parameters as key and an std::uint32_t as value for the memory type index. You can write your own hashing function for std::unordered_map (by specifying it as third parameter), and using such a cache with an underlying hash system is very common for other Vulkan objects pipelines or descriptor set layouts (for example, here is a nice tutorial which describes an abstraction for descriptor set layouts: https://vkguide.dev/docs/extra-chapter/abstracting_descriptors).
How much this improves performance is something that would have to be measured. On the one side the hashing will take a little time, but so does calling FindMemoryTypeIndex currently. In general, performance is something that must always be measured instead of being estimated. This could easily analyzed by writing a test for it when implementing this feature. There are also other frameworks like Google Benchmark for advanced benchmarking. It is likely that the exact performance improvement depends a lot on the actual parameters, the hardware, and more factors.
This brings me to another question:
@adam-sawicki-a Are there other parts in the code where you think such a cache with hashed values could improve performance?
best regards
Johannes