Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix texture corruption due to buffer mis-align #200

Merged
merged 1 commit into from
Sep 16, 2019

Conversation

cloudhead
Copy link
Contributor

This patch ensures that even when the dynamic allocator
fulfills an allocation request by returning multiple small
blocks, the memory is correctly aligned.

This wasn't the case previously, as alignment was only
guaranteed to match the block size, so when fulfilling
a large allocation with multiple smaller blocks, the memory
would be aligned to the smaller block size.

The validation triggered by the Vulkan layer before this fix
would look like this:

VALIDATION [VUID-vkBindImageMemory-memoryOffset-01048 (0)]:
vkBindImageMemory(): memoryOffset is 0x43300 but must be an integer
                     multiple of the VkMemoryRequirements::alignment
                     value 0x1000, returned from a call to
                     vkGetImageMemoryRequirements with image. The
                     Vulkan spec states: memoryOffset must be an
                     integer multiple of the alignment member of the
                     VkMemoryRequirements structure returned from a
                     call to vkGetImageMemoryRequirements [...]

@cloudhead
Copy link
Contributor Author

Tentative fix for gfx-rs/wgpu#323 and cloudhead/rx#14

@cloudhead
Copy link
Contributor Author

cloudhead commented Sep 4, 2019

I have only very basic knowledge of how this library works, so maybe my fix is naive 🤷‍♂️ - but it seems to have solved the issue for me.

Copy link
Member

@zakarumych zakarumych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

There're few nits though :)

memory/src/allocator/dynamic.rs Outdated Show resolved Hide resolved
@@ -336,6 +338,9 @@ where
let ref mut chunk = chunks[chunk_index as usize];
let block_index = chunk.acquire_blocks(count)?;
let block_range = chunk.blocks_range(block_size, block_index, count);
if block_range.start % align != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will result in allocation failure when this chunk could provide blocks with required alignment.
For example:
Chunk has size 128 and blocks are of size 4. Let's say chunk starts at offset 0.
We request 4 blocks with alignment 16.
If first blocks in chunk was allocated then we'd get block_range = 4..20. It doesn't have required alignment.
But we could allocate range 16..32 from the chunk instead.

Copy link
Contributor Author

@cloudhead cloudhead Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, yes! I've tried to fix this here e2982d1 - let me know if that is what you meant. The only question I have is whether the block_size is correct here, since it's passed as a parameter, and I'm not sure how to verify if/how it matches the chunk's block_size.

Copy link
Member

@zakarumych zakarumych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it up!

@@ -307,14 +307,14 @@ where
.next_back()
{
// Allocate block for the chunk.
let (block, allocated) = self.alloc_from_entry(device, chunk_size, 1)?;
let (block, allocated) = self.alloc_from_entry(device, chunk_size, 1, None)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should pass block size here and below in this function to uphold invariant that all blocks are aligned to their size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here 47e0b35

let mask = ((1 << count) - 1) << index;
self.blocks &= !mask;
Some(index)
for i in index..MAX_BLOCKS_PER_CHUNK {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it all in one loop please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing now that the code is wrong, but that the original code also doesn't make sense to me: it finds an index into the bit-array where there is a 1, ie. a free slot, then creates a mask for count bits, but assumes that there are count * 1 bits at index, which isn't verified anywhere? Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind, I get it - that's what the first loop does 👍

Copy link
Contributor Author

@cloudhead cloudhead Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this to work is much more complicated than I expected. I think the new code here: 6c3313a is correct, but it's more complicated. I'm not sure how to simplify it, or if it's possible to figure out the right index without trying each index, eg. if block_size = 432411 and align = 16, I don't know of a simple formula to solve for n in (432411 * n) % 16 = 0. If block_size < align, then it's fairly easy though, but that's a special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, there is a case where for eg. you have blocks = 00000111100, and count = 4, index = 2, which would work fine if it was all aligned. But now imagine index = 3 is the one that's correctly aligned, but there aren't enough blocks to fulfill that. That's why I'm checking each bit in the range 3..3+count here. Again, I'm sure there's a faster way, but I want to make sure this is even correct.

Copy link
Contributor Author

@cloudhead cloudhead Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I cleaned things up a bit - here's the current state: https://github.com/cloudhead/rendy/blob/887826ec0cb311acd862ef0b8a667a6c592a2ae5/memory/src/allocator/dynamic.rs#L633-L660

I'm not sure what you meant by making it in one loop - did you mean combining the loops together?

Copy link
Member

@zakarumych zakarumych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still one problem left. And I added few suggestions.

Sorry for late review.

let mask: u64 = ((1 << count) - 1) << i;

// Exit if we don't have enough free blocks at the current offset.
if (self.blocks & mask).count_ones() < count {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can be simplified to self.blocks & mask == mask.
Or alternatively blocks & (1 << i) == (1 << i). So instead of looping over every index we can just iterate over 1s in blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good point. count_ones is cheap though, as it uses the popcount instruction.


// Exit if we don't have enough free blocks at the current offset.
if (self.blocks & mask).count_ones() < count {
return None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If check fails we should look further instead of returning.
For example if self.blocks is b000110110 and we need 2 blocks with align = 2 * block_size then first pair doesn't fit, but this check will fail at next indicex (and at one after it'd fail too) and another free pair that statisfies requirements won't be examined.


// We may not be aligned at `index`, but may find alignment
// at subsequent indices.
for i in index..MAX_BLOCKS_PER_CHUNK {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to look past MAX_BLOCKS_PER_CHUNK - count.

@cloudhead
Copy link
Contributor Author

@omni-viral ready for another look 👍

Copy link
Member

@zakarumych zakarumych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it seems correct.
Would you mind considering optimization below?


// We may not be aligned at `index`, but may find alignment
// at subsequent indices.
for i in index..MAX_BLOCKS_PER_CHUNK - count {
Copy link
Member

@zakarumych zakarumych Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over indices of set bits should be faster than iterating over all and checking mask.
And checking alignment doesn't require division (% operator is compiled to division instruiction).

So I think this loop can be rewritten as

while blocks != 0 {
  let index = blocks.trailing_zeros();
  blocks &= !(1 << index);
  if (index as u64 * block_size) & (align - 1) == 0 {
    return Some(index);  
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I had to also mask self.blocks - maybe there's a way to reuse the other mask?

Copy link
Member

@zakarumych zakarumych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

This patch ensures that even when the dynamic allocator
fulfills an allocation request by returning multiple small
blocks, the memory is correctly aligned.

This wasn't the case previously, as alignment was only
guaranteed to match the block size, so when fulfilling
a large allocation with multiple smaller blocks, the memory
would be aligned to the smaller block size.

The validation triggered by the Vulkan layer before this fix
would look like this:

    VALIDATION [VUID-vkBindImageMemory-memoryOffset-01048 (0)]:
    vkBindImageMemory(): memoryOffset is 0x43300 but must be an integer
                         multiple of the VkMemoryRequirements::alignment
                         value 0x1000, returned from a call to
                         vkGetImageMemoryRequirements with image. The
                         Vulkan spec states: memoryOffset must be an
                         integer multiple of the alignment member of the
                         VkMemoryRequirements structure returned from a
                         call to vkGetImageMemoryRequirements [...]
@zakarumych zakarumych merged commit 778b47f into amethyst:master Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants