Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double the capacity when BlobVec is full #11167

Merged
merged 1 commit into from Jan 22, 2024

Conversation

garychia
Copy link
Contributor

@garychia garychia commented Jan 1, 2024

Objective

Solution

  • Double the capacity of a full BlobVec before pushing a new element.

@mockersf mockersf added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Jan 1, 2024
@stepancheg
Copy link
Contributor

stepancheg commented Jan 1, 2024

This solution won't fix most of the issues.

In many cases, bevy calls BlobVec::reserve_exact which has exactly the same problem, and after reserve_exact, push won't double the capacity.

Proper fix should be (basically, copy Vec behavior):

  • add reserve function (in addition to reserve_exact) which would double the capacity (or maybe do something smarter, see Vec)
  • call reserve where reserve_exact is called (perhaps in a separate PR)
  • push should just call reserve(1) (instead of reserve_exact(1))

@garychia
Copy link
Contributor Author

garychia commented Jan 2, 2024

Sure. I will leave push mostly unchanged and implement the reserve function instead. At this point I'm not able to come out with a fancy solution. My reserve function basically just ensures the capacity will at least double if there is no enough space.

Comment on lines 127 to 129
let extra_space = self.capacity.max(additional - available_space);
// SAFETY: `additional - available_space > 0` so `extra_space` is non-zero
let increment = unsafe { NonZeroUsize::new_unchecked(extra_space) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Let's call it extra_capacity because "space" can mean both "len" and "capacity"
  • Also let's name both variables extra_space and increment the same, because they are the same
  • Also new_unchecked is not really needed here, safe new() + unwrap should work equally fine
    • compiler is able to get rid of this trivial check
    • but even if it doesn't, this code is executed rarely anyway

@stepancheg
Copy link
Contributor

Overall, looks good.

@stepancheg
Copy link
Contributor

stepancheg commented Jan 3, 2024

If we want to make this code more perfect, further change might be this: split reserve function into reserve and do_reserve.

reserve body would be this:

#[inline]
fn reserve(&mut self, additional: usize) {
  if self.cap - self.len < additional {
    self.do_reserve(additional);
  }
}

#[cold]
fn do_reserve(&mut self additional: usize) { ... }

The idea is this. push function is marked #[inline], and it should be fast. reserve is not marked inline, so it might not be inlined into push even if in most cases this function is simple integer subtraction and comparison.

If we mark reserve #[inline] as is, reserve will be inlined into push, but resulting push function might be too large to be inlined into code that calls push.

Splitting reserve function fixes this issue.

Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status quo seems to have been an artifact of #1525 when we forked from hecs. It seems like hecs does also use a doubling strategy as well, as of July 2021: Ralith/hecs@a8545a2, which was roughly 4-5 months after ECS V2. Checked this with @Ralith and @cart on Discord: https://discord.com/channels/691052431525675048/749335865876021248/1192580492164341820.

With the history here established, I think this is a good idea in general, but if you check the usage of BlobVec::push, it's only used in Column::push, which is only used in ComponentSparseSet::insert, Resource::insert, and their variants. This won't impact any of the table based storage, which is what almost all components use, but this should improve performance when mass inserting/spawning SparseSet components.

Code generally looks good, though there are some things I want addressed.

/// Similar to `reserve_exact`. This method ensures that the capacity will grow at least `self.capacity()` if there is no
/// enough space to hold `additional` more elements.
#[cold]
fn do_reserve(&mut self, additional: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd follow the style the Rust project uses for their Vec implementation, and scope the cold function within the reserve function itself: https://github.com/rust-lang/rust/blob/6bc08a725f888a06ea3c6844f3d0cc2d2ebc5142/library/alloc/src/raw_vec.rs#L294.

#[cold]
fn do_reserve(&mut self, additional: usize) {
let available_space = self.capacity - self.len;
if available_space < additional && self.item_layout.size() > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are already doing the checks in reserve, we don't need to be repeating them here. We can just assume we have already met the requisite conditions.

Comment on lines 129 to 134
if slf.item_layout.size() > 0 {
let increment = slf.capacity.max(additional - (slf.capacity - slf.len));
let increment = NonZeroUsize::new(increment).unwrap();
// SAFETY: not called for ZSTs
unsafe { slf.grow_exact(increment) };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #10799 merged, this code is actually meant to call grow_exact for ZST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we no longer need that check and just call grow_exact immediately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't need check if slf.item_layout.size() > 0.

Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic checks out

@atlv24
Copy link
Contributor

atlv24 commented Jan 20, 2024

benchmarks are a bit noisy but we do win in the big sparse_set case

$ critcmp before after
group                                     after                                  before
-----                                     -----                                  ------
add_remove/sparse_set                     1.00   494.9±11.07µs        ? ?/sec    1.02   504.0±31.49µs        ? ?/sec
add_remove/table                          1.02   766.3±14.54µs        ? ?/sec    1.00   753.0±14.05µs        ? ?/sec
add_remove_big/sparse_set                 1.00   497.3±11.90µs        ? ?/sec    1.08   535.2±76.61µs        ? ?/sec
add_remove_big/table                      1.01  1751.1±35.25µs        ? ?/sec    1.00  1726.0±23.07µs        ? ?/sec
added_archetypes/archetype_count/100      1.03     38.0±0.24µs        ? ?/sec    1.00     36.9±0.18µs        ? ?/sec
added_archetypes/archetype_count/1000     1.02    403.9±5.28µs        ? ?/sec    1.00    395.1±2.28µs        ? ?/sec
added_archetypes/archetype_count/10000    1.02      7.1±0.17ms        ? ?/sec    1.00      7.0±0.22ms        ? ?/sec
added_archetypes/archetype_count/200      1.00     73.5±0.65µs        ? ?/sec    1.00     73.6±0.51µs        ? ?/sec
added_archetypes/archetype_count/2000     1.02   831.5±10.76µs        ? ?/sec    1.00    816.4±8.05µs        ? ?/sec
added_archetypes/archetype_count/500      1.02    197.7±1.11µs        ? ?/sec    1.00    194.4±1.10µs        ? ?/sec
added_archetypes/archetype_count/5000     1.02      2.7±0.08ms        ? ?/sec    1.00      2.6±0.07ms        ? ?/sec
insert_simple/base                        1.01    251.4±3.25µs        ? ?/sec    1.00    247.8±2.45µs        ? ?/sec
insert_simple/unbatched                   1.01   581.4±19.07µs        ? ?/sec    1.00   578.4±14.18µs        ? ?/sec
no_archetypes/system_count/0              1.00      4.7±0.05ns        ? ?/sec    1.00      4.7±0.03ns        ? ?/sec
no_archetypes/system_count/100            1.00   838.1±13.51ns        ? ?/sec    1.00   836.7±10.64ns        ? ?/sec
no_archetypes/system_count/20             1.00    163.3±1.67ns        ? ?/sec    1.04    169.6±6.60ns        ? ?/sec
no_archetypes/system_count/40             1.01    347.0±5.50ns        ? ?/sec    1.00   343.8±10.40ns        ? ?/sec
no_archetypes/system_count/60             1.00    503.1±9.43ns        ? ?/sec    1.01    506.0±8.72ns        ? ?/sec
no_archetypes/system_count/80             1.01    678.1±5.77ns        ? ?/sec    1.00    674.6±9.58ns        ? ?/sec

one run was particularly harsh on these two benches:

group                                     after                                  before
-----                                     -----                                  ------
insert_simple/unbatched                   1.09   629.3±14.79µs        ? ?/sec    1.00   578.4±14.18µs        ? ?/sec
no_archetypes/system_count/40             1.28    440.7±1.86ns        ? ?/sec    1.00   343.8±10.40ns        ? ?/sec

growing by 1.5x instead of 2x looks pretty much the same

$ critcmp before after
group                                     after                                  before
-----                                     -----                                  ------
add_remove/sparse_set                     1.00   486.2±12.53µs        ? ?/sec    1.04   504.0±31.49µs        ? ?/sec
add_remove/table                          1.00   751.5±15.29µs        ? ?/sec    1.00   753.0±14.05µs        ? ?/sec
add_remove_big/sparse_set                 1.00   499.5±12.71µs        ? ?/sec    1.07   535.2±76.61µs        ? ?/sec
add_remove_big/table                      1.02  1768.9±29.77µs        ? ?/sec    1.00  1726.0±23.07µs        ? ?/sec
added_archetypes/archetype_count/100      1.02     37.8±0.62µs        ? ?/sec    1.00     36.9±0.18µs        ? ?/sec
added_archetypes/archetype_count/1000     1.02    402.5±3.34µs        ? ?/sec    1.00    395.1±2.28µs        ? ?/sec
added_archetypes/archetype_count/10000    1.00      7.0±0.20ms        ? ?/sec    1.00      7.0±0.22ms        ? ?/sec
added_archetypes/archetype_count/200      1.02     74.7±0.37µs        ? ?/sec    1.00     73.6±0.51µs        ? ?/sec
added_archetypes/archetype_count/2000     1.02    830.4±9.54µs        ? ?/sec    1.00    816.4±8.05µs        ? ?/sec
added_archetypes/archetype_count/500      1.02    198.0±1.64µs        ? ?/sec    1.00    194.4±1.10µs        ? ?/sec
added_archetypes/archetype_count/5000     1.02      2.6±0.06ms        ? ?/sec    1.00      2.6±0.07ms        ? ?/sec
insert_simple/base                        1.00    245.4±4.32µs        ? ?/sec    1.01    247.8±2.45µs        ? ?/sec
insert_simple/unbatched                   1.00   576.1±15.01µs        ? ?/sec    1.00   578.4±14.18µs        ? ?/sec
no_archetypes/system_count/0              1.00      4.7±0.03ns        ? ?/sec    1.00      4.7±0.03ns        ? ?/sec
no_archetypes/system_count/100            1.02    850.8±6.63ns        ? ?/sec    1.00   836.7±10.64ns        ? ?/sec
no_archetypes/system_count/20             1.00    168.6±6.18ns        ? ?/sec    1.01    169.6±6.60ns        ? ?/sec
no_archetypes/system_count/40             1.00    342.3±9.41ns        ? ?/sec    1.00   343.8±10.40ns        ? ?/sec
no_archetypes/system_count/60             1.00   500.7±10.08ns        ? ?/sec    1.01    506.0±8.72ns        ? ?/sec
no_archetypes/system_count/80             1.01    678.4±7.95ns        ? ?/sec    1.00    674.6±9.58ns        ? ?/sec

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 21, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 22, 2024
Merged via the queue into bevyengine:main with commit 8ad1b93 Jan 22, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BlobVec::push is linear
7 participants