Fix texture corruption due to buffer mis-align #200

cloudhead · 2019-09-04T18:34:56Z

This patch ensures that even when the dynamic allocator
fulfills an allocation request by returning multiple small
blocks, the memory is correctly aligned.

This wasn't the case previously, as alignment was only
guaranteed to match the block size, so when fulfilling
a large allocation with multiple smaller blocks, the memory
would be aligned to the smaller block size.

The validation triggered by the Vulkan layer before this fix
would look like this:

VALIDATION [VUID-vkBindImageMemory-memoryOffset-01048 (0)]:
vkBindImageMemory(): memoryOffset is 0x43300 but must be an integer
                     multiple of the VkMemoryRequirements::alignment
                     value 0x1000, returned from a call to
                     vkGetImageMemoryRequirements with image. The
                     Vulkan spec states: memoryOffset must be an
                     integer multiple of the alignment member of the
                     VkMemoryRequirements structure returned from a
                     call to vkGetImageMemoryRequirements [...]

cloudhead · 2019-09-04T18:35:59Z

Tentative fix for gfx-rs/wgpu#323 and cloudhead/rx#14

cloudhead · 2019-09-04T18:38:17Z

I have only very basic knowledge of how this library works, so maybe my fix is naive 🤷‍♂️ - but it seems to have solved the issue for me.

zakarumych

Thanks!

There're few nits though :)

memory/src/allocator/dynamic.rs

zakarumych · 2019-09-05T16:28:20Z

memory/src/allocator/dynamic.rs

@@ -336,6 +338,9 @@ where
        let ref mut chunk = chunks[chunk_index as usize];
        let block_index = chunk.acquire_blocks(count)?;
        let block_range = chunk.blocks_range(block_size, block_index, count);
+        if block_range.start % align != 0 {


This check will result in allocation failure when this chunk could provide blocks with required alignment.
For example:
Chunk has size 128 and blocks are of size 4. Let's say chunk starts at offset 0.
We request 4 blocks with alignment 16.
If first blocks in chunk was allocated then we'd get block_range = 4..20. It doesn't have required alignment.
But we could allocate range 16..32 from the chunk instead.

This makes sense, yes! I've tried to fix this here e2982d1 - let me know if that is what you meant. The only question I have is whether the block_size is correct here, since it's passed as a parameter, and I'm not sure how to verify if/how it matches the chunk's block_size.

zakarumych

Keep it up!

zakarumych · 2019-09-06T19:32:13Z

memory/src/allocator/dynamic.rs

@@ -307,14 +307,14 @@ where
            .next_back()
        {
            // Allocate block for the chunk.
-            let (block, allocated) = self.alloc_from_entry(device, chunk_size, 1)?;
+            let (block, allocated) = self.alloc_from_entry(device, chunk_size, 1, None)?;


You should pass block size here and below in this function to uphold invariant that all blocks are aligned to their size.

Done here 47e0b35

zakarumych · 2019-09-06T19:35:23Z

memory/src/allocator/dynamic.rs

-            let mask = ((1 << count) - 1) << index;
-            self.blocks &= !mask;
-            Some(index)
+        for i in index..MAX_BLOCKS_PER_CHUNK {


Make it all in one loop please.

I'm realizing now that the code is wrong, but that the original code also doesn't make sense to me: it finds an index into the bit-array where there is a 1, ie. a free slot, then creates a mask for count bits, but assumes that there are count * 1 bits at index, which isn't verified anywhere? Am I missing something?

Ah nevermind, I get it - that's what the first loop does 👍

Getting this to work is much more complicated than I expected. ~~I think the new code here: 6c3313a is correct, but it's more complicated. I'm not sure how to simplify it~~, or if it's possible to figure out the right index without trying each index, eg. if block_size = 432411 and align = 16, I don't know of a simple formula to solve for n in (432411 * n) % 16 = 0. If block_size < align, then it's fairly easy though, but that's a special case.

Furthermore, there is a case where for eg. you have blocks = 00000111100, and count = 4, index = 2, which would work fine if it was all aligned. But now imagine index = 3 is the one that's correctly aligned, but there aren't enough blocks to fulfill that. That's why I'm checking each bit in the range 3..3+count here. Again, I'm sure there's a faster way, but I want to make sure this is even correct.

Ok, I cleaned things up a bit - here's the current state: https://github.com/cloudhead/rendy/blob/887826ec0cb311acd862ef0b8a667a6c592a2ae5/memory/src/allocator/dynamic.rs#L633-L660

I'm not sure what you meant by making it in one loop - did you mean combining the loops together?

zakarumych

There's still one problem left. And I added few suggestions.

Sorry for late review.

zakarumych · 2019-09-12T06:43:27Z

memory/src/allocator/dynamic.rs

+            let mask: u64 = ((1 << count) - 1) << i;
+
+            // Exit if we don't have enough free blocks at the current offset.
+            if (self.blocks & mask).count_ones() < count {


This check can be simplified to self.blocks & mask == mask.
Or alternatively blocks & (1 << i) == (1 << i). So instead of looping over every index we can just iterate over 1s in blocks.

Ah yes, good point. count_ones is cheap though, as it uses the popcount instruction.

zakarumych · 2019-09-12T06:44:48Z

memory/src/allocator/dynamic.rs

+
+            // Exit if we don't have enough free blocks at the current offset.
+            if (self.blocks & mask).count_ones() < count {
+                return None;


If check fails we should look further instead of returning.
For example if self.blocks is b000110110 and we need 2 blocks with align = 2 * block_size then first pair doesn't fit, but this check will fail at next indicex (and at one after it'd fail too) and another free pair that statisfies requirements won't be examined.

zakarumych · 2019-09-12T06:49:29Z

memory/src/allocator/dynamic.rs

+
+        // We may not be aligned at `index`, but may find alignment
+        // at subsequent indices.
+        for i in index..MAX_BLOCKS_PER_CHUNK {


No reason to look past MAX_BLOCKS_PER_CHUNK - count.

cloudhead · 2019-09-12T08:02:34Z

@omni-viral ready for another look 👍

zakarumych

Now it seems correct.
Would you mind considering optimization below?

zakarumych · 2019-09-15T12:11:54Z

memory/src/allocator/dynamic.rs

+
+        // We may not be aligned at `index`, but may find alignment
+        // at subsequent indices.
+        for i in index..MAX_BLOCKS_PER_CHUNK - count {


Iterating over indices of set bits should be faster than iterating over all and checking mask.
And checking alignment doesn't require division (% operator is compiled to division instruiction).

So I think this loop can be rewritten as

while blocks != 0 { let index = blocks.trailing_zeros(); blocks &= !(1 << index); if (index as u64 * block_size) & (align - 1) == 0 { return Some(index); } }

Done. I had to also mask self.blocks - maybe there's a way to reuse the other mask?

zakarumych

LGTM! Thanks!

This patch ensures that even when the dynamic allocator fulfills an allocation request by returning multiple small blocks, the memory is correctly aligned. This wasn't the case previously, as alignment was only guaranteed to match the block size, so when fulfilling a large allocation with multiple smaller blocks, the memory would be aligned to the smaller block size. The validation triggered by the Vulkan layer before this fix would look like this: VALIDATION [VUID-vkBindImageMemory-memoryOffset-01048 (0)]: vkBindImageMemory(): memoryOffset is 0x43300 but must be an integer multiple of the VkMemoryRequirements::alignment value 0x1000, returned from a call to vkGetImageMemoryRequirements with image. The Vulkan spec states: memoryOffset must be an integer multiple of the alignment member of the VkMemoryRequirements structure returned from a call to vkGetImageMemoryRequirements [...]

cloudhead mentioned this pull request Sep 4, 2019

Validation error when creating texture of size 64x32 gfx-rs/wgpu#323

Closed

zakarumych requested changes Sep 5, 2019

View reviewed changes

cloudhead force-pushed the master branch from f5f7a2e to e2982d1 Compare September 5, 2019 21:57

zakarumych requested changes Sep 7, 2019

View reviewed changes

cloudhead force-pushed the master branch from 47e0b35 to 1fdefa1 Compare September 7, 2019 20:46

zakarumych requested changes Sep 12, 2019

View reviewed changes

zakarumych reviewed Sep 15, 2019

View reviewed changes

cloudhead force-pushed the master branch from df5278c to d513d2b Compare September 15, 2019 13:45

zakarumych approved these changes Sep 15, 2019

View reviewed changes

zakarumych merged commit 778b47f into amethyst:master Sep 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix texture corruption due to buffer mis-align #200

Fix texture corruption due to buffer mis-align #200

cloudhead commented Sep 4, 2019

cloudhead commented Sep 4, 2019

cloudhead commented Sep 4, 2019 •

edited

Loading

zakarumych left a comment

zakarumych Sep 5, 2019

cloudhead Sep 5, 2019 •

edited

Loading

zakarumych left a comment

zakarumych Sep 6, 2019

cloudhead Sep 7, 2019

zakarumych Sep 6, 2019

cloudhead Sep 7, 2019

cloudhead Sep 7, 2019

cloudhead Sep 7, 2019 •

edited

Loading

cloudhead Sep 7, 2019

cloudhead Sep 8, 2019 •

edited

Loading

zakarumych left a comment

zakarumych Sep 12, 2019

cloudhead Sep 12, 2019

zakarumych Sep 12, 2019

zakarumych Sep 12, 2019

cloudhead commented Sep 12, 2019

zakarumych left a comment

zakarumych Sep 15, 2019 •

edited

Loading

cloudhead Sep 15, 2019

zakarumych left a comment

Fix texture corruption due to buffer mis-align #200

Fix texture corruption due to buffer mis-align #200

Conversation

cloudhead commented Sep 4, 2019

cloudhead commented Sep 4, 2019

cloudhead commented Sep 4, 2019 • edited Loading

zakarumych left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloudhead Sep 5, 2019 • edited Loading

Choose a reason for hiding this comment

zakarumych left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloudhead Sep 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloudhead Sep 8, 2019 • edited Loading

Choose a reason for hiding this comment

zakarumych left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloudhead commented Sep 12, 2019

zakarumych left a comment

Choose a reason for hiding this comment

zakarumych Sep 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zakarumych left a comment

Choose a reason for hiding this comment

cloudhead commented Sep 4, 2019 •

edited

Loading

cloudhead Sep 5, 2019 •

edited

Loading

cloudhead Sep 7, 2019 •

edited

Loading

cloudhead Sep 8, 2019 •

edited

Loading

zakarumych Sep 15, 2019 •

edited

Loading