-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip alloc when updating animation path cache #11330
Merged
alice-i-cecile
merged 1 commit into
bevyengine:main
from
nicopap:do-not-allocate-animation
Jan 13, 2024
Merged
Skip alloc when updating animation path cache #11330
alice-i-cecile
merged 1 commit into
bevyengine:main
from
nicopap:do-not-allocate-animation
Jan 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Not always, but skip it if the new length is smaller. For context, `path_cache` is a `Vec<Vec<Option<Entity>>>`. Previously, when setting a new length to the `path_cache`, we would: 1. Deallocate all existing `Vec<Option<Entity>>` 2. Deallocate the `path_cache` 3. Allocate a new `Vec<Vec<Option<Entity>>>`, where each item is an empty `Vec`, and would have to be allocated when pushed to. With this change, what occurs is: 1. We `clear` each `Vec<Option<Entity>>`, keeping the allocation, but making the memory of each `Vec` re-usable 2. We only append new `Vec` to `path_cache` when it is too small. **Future work** I think a [jagged vec](https://en.wikipedia.org/wiki/Jagged_array) would be much more pertinent. Because it allocates everything in a single contiguous buffer. This would avoid dancing around allocations, and reduces the overhead of one `*mut T` and two `usize` per row, also removes indirection, improving cache efficiency.
alice-i-cecile
added
C-Performance
A change motivated by improving speed, memory usage or compile times
A-Animation
Make things move and change over time
labels
Jan 13, 2024
nicopap
added a commit
to nicopap/bevy
that referenced
this pull request
Jan 13, 2024
In bevyengine#11330 I found out that `Parent::get` didn't get inlined, **even with LTO on**! Not sure what's up with that, but marking functions that consist of a single call as `inline(always)` has no downside. `inline(always)` may increase compilation time proportional to how many time the function is called **and the size of the function marked with `inline`**. Since we mark as `inline` no-ops functions, there is no cost to it. I also took the opportunity to `inline` other functions. I'm not as confident that marking functions calling other functions as `inline` works similarly to very simple functions, so I used `inline` over `inline(always)`.
atlv24
approved these changes
Jan 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!! Would love to see perf numbers, this is obviously better but by how much?
This doesn't happen on any perf sensitive scenario so it probably doesn't have any impact... still it's probably better |
dmyyy
approved these changes
Jan 13, 2024
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jan 13, 2024
# Objective In #11330 I found out that `Parent::get` didn't get inlined, **even with LTO on**! This means that just to access a field, we have an instruction cache invalidation, we will move some registers to the stack, will jump to new instructions, move the field into a register, then do the same dance in the other direction to go back to the call site. ## Solution Mark trivial functions as `#[inline]`. `inline(always)` may increase compilation time proportional to how many time the function is called **and the size of the function marked with `inline`**. Since we mark as `inline` functions that consists in a single instruction, the cost is absolutely negligible. I also took the opportunity to `inline` other functions. I'm not as confident that marking functions calling other functions as `inline` works similarly to very simple functions, so I used `inline` over `inline(always)`, which doesn't have the same downsides as `inline(always)`. More information on inlining in rust: https://nnethercote.github.io/perf-book/inlining.html
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-Animation
Make things move and change over time
C-Performance
A change motivated by improving speed, memory usage or compile times
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Not always, but skip it if the new length is smaller.
For context,
path_cache
is aVec<Vec<Option<Entity>>>
.Objective
Previously, when setting a new length to the
path_cache
, we would:Vec<Option<Entity>>
path_cache
Vec<Vec<Option<Entity>>>
, where each item is an emptyVec
, and would have to be allocated when pushed to.This is a lot of allocations!
Solution
Use
Vec::resize_with
.With this change, what occurs is:
clear
eachVec<Option<Entity>>
, keeping the allocation, but making the memory of eachVec
re-usableVec
topath_cache
when it is too small.Note on performance
I didn't benchmark it, I just ran a diff on the generated assembly (ran with
--profile stress-test
and--native
). I found this PR has 20 less instructions inapply_animation
(out of 2504).Though on a purely abstract level, I can deduce this leads to less allocation.
More information on profiling allocations in rust: https://nnethercote.github.io/perf-book/heap-allocations.html
Future work
I think a jagged vec would be much more pertinent. Because it allocates everything in a single contiguous buffer.
This would avoid dancing around allocations, and reduces the overhead of one
*mut T
and twousize
per row, also removes indirection, improving cache efficiency. I think it would both improve code quality and performance.