Skip alloc when updating animation path cache #11330

nicopap · 2024-01-13T14:44:54Z

Not always, but skip it if the new length is smaller.

For context, path_cache is a Vec<Vec<Option<Entity>>>.

Objective

Previously, when setting a new length to the path_cache, we would:

Deallocate all existing Vec<Option<Entity>>
Deallocate the path_cache
Allocate a new Vec<Vec<Option<Entity>>>, where each item is an empty Vec, and would have to be allocated when pushed to.

This is a lot of allocations!

Solution

Use Vec::resize_with.

With this change, what occurs is:

We clear each Vec<Option<Entity>>, keeping the allocation, but making the memory of each Vec re-usable
We only append new Vec to path_cache when it is too small.

Fixes See if removing a vec allocation in animation code improves performance #11328

Note on performance

I didn't benchmark it, I just ran a diff on the generated assembly (ran with --profile stress-test and --native). I found this PR has 20 less instructions in apply_animation (out of 2504).

Though on a purely abstract level, I can deduce this leads to less allocation.

More information on profiling allocations in rust: https://nnethercote.github.io/perf-book/heap-allocations.html

Future work

I think a jagged vec would be much more pertinent. Because it allocates everything in a single contiguous buffer.

This would avoid dancing around allocations, and reduces the overhead of one *mut T and two usize per row, also removes indirection, improving cache efficiency. I think it would both improve code quality and performance.

Not always, but skip it if the new length is smaller. For context, `path_cache` is a `Vec<Vec<Option<Entity>>>`. Previously, when setting a new length to the `path_cache`, we would: 1. Deallocate all existing `Vec<Option<Entity>>` 2. Deallocate the `path_cache` 3. Allocate a new `Vec<Vec<Option<Entity>>>`, where each item is an empty `Vec`, and would have to be allocated when pushed to. With this change, what occurs is: 1. We `clear` each `Vec<Option<Entity>>`, keeping the allocation, but making the memory of each `Vec` re-usable 2. We only append new `Vec` to `path_cache` when it is too small. **Future work** I think a [jagged vec](https://en.wikipedia.org/wiki/Jagged_array) would be much more pertinent. Because it allocates everything in a single contiguous buffer. This would avoid dancing around allocations, and reduces the overhead of one `*mut T` and two `usize` per row, also removes indirection, improving cache efficiency.

In bevyengine#11330 I found out that `Parent::get` didn't get inlined, **even with LTO on**! Not sure what's up with that, but marking functions that consist of a single call as `inline(always)` has no downside. `inline(always)` may increase compilation time proportional to how many time the function is called **and the size of the function marked with `inline`**. Since we mark as `inline` no-ops functions, there is no cost to it. I also took the opportunity to `inline` other functions. I'm not as confident that marking functions calling other functions as `inline` works similarly to very simple functions, so I used `inline` over `inline(always)`.

atlv24

Thanks!! Would love to see perf numbers, this is obviously better but by how much?

mockersf · 2024-01-13T17:51:17Z

This doesn't happen on any perf sensitive scenario so it probably doesn't have any impact... still it's probably better

# Objective In #11330 I found out that `Parent::get` didn't get inlined, **even with LTO on**! This means that just to access a field, we have an instruction cache invalidation, we will move some registers to the stack, will jump to new instructions, move the field into a register, then do the same dance in the other direction to go back to the call site. ## Solution Mark trivial functions as `#[inline]`. `inline(always)` may increase compilation time proportional to how many time the function is called **and the size of the function marked with `inline`**. Since we mark as `inline` functions that consists in a single instruction, the cost is absolutely negligible. I also took the opportunity to `inline` other functions. I'm not as confident that marking functions calling other functions as `inline` works similarly to very simple functions, so I used `inline` over `inline(always)`, which doesn't have the same downsides as `inline(always)`. More information on inlining in rust: https://nnethercote.github.io/perf-book/inlining.html

alice-i-cecile requested a review from mockersf January 13, 2024 14:56

alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times A-Animation Make things move and change over time labels Jan 13, 2024

nicopap mentioned this pull request Jan 13, 2024

Inline trivial methods in bevy_hierarchy #11332

Merged

atlv24 approved these changes Jan 13, 2024

View reviewed changes

dmyyy approved these changes Jan 13, 2024

View reviewed changes

alice-i-cecile added this pull request to the merge queue Jan 13, 2024

Merged via the queue into bevyengine:main with commit 78b5f32 Jan 13, 2024
25 checks passed

nicopap deleted the do-not-allocate-animation branch January 14, 2024 10:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip alloc when updating animation path cache #11330

Skip alloc when updating animation path cache #11330

nicopap commented Jan 13, 2024 •

edited

Loading

atlv24 left a comment

mockersf commented Jan 13, 2024

Skip alloc when updating animation path cache #11330

Skip alloc when updating animation path cache #11330

Conversation

nicopap commented Jan 13, 2024 • edited Loading

Objective

Solution

Note on performance

Future work

atlv24 left a comment

Choose a reason for hiding this comment

mockersf commented Jan 13, 2024

nicopap commented Jan 13, 2024 •

edited

Loading