-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix texture corruption due to buffer mis-align #200
Conversation
Tentative fix for gfx-rs/wgpu#323 and cloudhead/rx#14 |
I have only very basic knowledge of how this library works, so maybe my fix is naive 🤷♂️ - but it seems to have solved the issue for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There're few nits though :)
memory/src/allocator/dynamic.rs
Outdated
@@ -336,6 +338,9 @@ where | |||
let ref mut chunk = chunks[chunk_index as usize]; | |||
let block_index = chunk.acquire_blocks(count)?; | |||
let block_range = chunk.blocks_range(block_size, block_index, count); | |||
if block_range.start % align != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will result in allocation failure when this chunk could provide blocks with required alignment.
For example:
Chunk has size 128
and blocks are of size 4
. Let's say chunk starts at offset 0
.
We request 4
blocks with alignment 16
.
If first blocks in chunk was allocated then we'd get block_range = 4..20
. It doesn't have required alignment.
But we could allocate range 16..32
from the chunk instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, yes! I've tried to fix this here e2982d1 - let me know if that is what you meant. The only question I have is whether the block_size
is correct here, since it's passed as a parameter, and I'm not sure how to verify if/how it matches the chunk's block_size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it up!
memory/src/allocator/dynamic.rs
Outdated
@@ -307,14 +307,14 @@ where | |||
.next_back() | |||
{ | |||
// Allocate block for the chunk. | |||
let (block, allocated) = self.alloc_from_entry(device, chunk_size, 1)?; | |||
let (block, allocated) = self.alloc_from_entry(device, chunk_size, 1, None)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should pass block size here and below in this function to uphold invariant that all blocks are aligned to their size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here 47e0b35
memory/src/allocator/dynamic.rs
Outdated
let mask = ((1 << count) - 1) << index; | ||
self.blocks &= !mask; | ||
Some(index) | ||
for i in index..MAX_BLOCKS_PER_CHUNK { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it all in one loop please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realizing now that the code is wrong, but that the original code also doesn't make sense to me: it finds an index into the bit-array where there is a 1
, ie. a free slot, then creates a mask for count
bits, but assumes that there are count
* 1
bits at index
, which isn't verified anywhere? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind, I get it - that's what the first loop does 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting this to work is much more complicated than I expected. I think the new code here: 6c3313a is correct, but it's more complicated. I'm not sure how to simplify it, or if it's possible to figure out the right index without trying each index, eg. if block_size = 432411
and align = 16
, I don't know of a simple formula to solve for n
in (432411 * n) % 16 = 0
. If block_size < align
, then it's fairly easy though, but that's a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, there is a case where for eg. you have blocks = 00000111100
, and count = 4
, index = 2
, which would work fine if it was all aligned. But now imagine index = 3
is the one that's correctly aligned, but there aren't enough blocks to fulfill that. That's why I'm checking each bit in the range 3..3+count
here. Again, I'm sure there's a faster way, but I want to make sure this is even correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I cleaned things up a bit - here's the current state: https://github.com/cloudhead/rendy/blob/887826ec0cb311acd862ef0b8a667a6c592a2ae5/memory/src/allocator/dynamic.rs#L633-L660
I'm not sure what you meant by making it in one loop - did you mean combining the loops together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still one problem left. And I added few suggestions.
Sorry for late review.
memory/src/allocator/dynamic.rs
Outdated
let mask: u64 = ((1 << count) - 1) << i; | ||
|
||
// Exit if we don't have enough free blocks at the current offset. | ||
if (self.blocks & mask).count_ones() < count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check can be simplified to self.blocks & mask == mask
.
Or alternatively blocks & (1 << i) == (1 << i)
. So instead of looping over every index we can just iterate over 1s in blocks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good point. count_ones
is cheap though, as it uses the popcount
instruction.
memory/src/allocator/dynamic.rs
Outdated
|
||
// Exit if we don't have enough free blocks at the current offset. | ||
if (self.blocks & mask).count_ones() < count { | ||
return None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If check fails we should look further instead of returning.
For example if self.blocks
is b000110110
and we need 2
blocks with align = 2 * block_size
then first pair doesn't fit, but this check will fail at next indicex (and at one after it'd fail too) and another free pair that statisfies requirements won't be examined.
memory/src/allocator/dynamic.rs
Outdated
|
||
// We may not be aligned at `index`, but may find alignment | ||
// at subsequent indices. | ||
for i in index..MAX_BLOCKS_PER_CHUNK { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to look past MAX_BLOCKS_PER_CHUNK - count
.
@omni-viral ready for another look 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it seems correct.
Would you mind considering optimization below?
memory/src/allocator/dynamic.rs
Outdated
|
||
// We may not be aligned at `index`, but may find alignment | ||
// at subsequent indices. | ||
for i in index..MAX_BLOCKS_PER_CHUNK - count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over indices of set bits should be faster than iterating over all and checking mask.
And checking alignment doesn't require division (% operator is compiled to division instruiction).
So I think this loop can be rewritten as
while blocks != 0 {
let index = blocks.trailing_zeros();
blocks &= !(1 << index);
if (index as u64 * block_size) & (align - 1) == 0 {
return Some(index);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I had to also mask self.blocks
- maybe there's a way to reuse the other mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
This patch ensures that even when the dynamic allocator fulfills an allocation request by returning multiple small blocks, the memory is correctly aligned. This wasn't the case previously, as alignment was only guaranteed to match the block size, so when fulfilling a large allocation with multiple smaller blocks, the memory would be aligned to the smaller block size. The validation triggered by the Vulkan layer before this fix would look like this: VALIDATION [VUID-vkBindImageMemory-memoryOffset-01048 (0)]: vkBindImageMemory(): memoryOffset is 0x43300 but must be an integer multiple of the VkMemoryRequirements::alignment value 0x1000, returned from a call to vkGetImageMemoryRequirements with image. The Vulkan spec states: memoryOffset must be an integer multiple of the alignment member of the VkMemoryRequirements structure returned from a call to vkGetImageMemoryRequirements [...]
This patch ensures that even when the dynamic allocator
fulfills an allocation request by returning multiple small
blocks, the memory is correctly aligned.
This wasn't the case previously, as alignment was only
guaranteed to match the block size, so when fulfilling
a large allocation with multiple smaller blocks, the memory
would be aligned to the smaller block size.
The validation triggered by the Vulkan layer before this fix
would look like this: