Skip to content

Amortize the cost of freeing entities#22658

Merged
alice-i-cecile merged 14 commits intobevyengine:mainfrom
ElliottjPierce:amortize-the-cost-of-freeing-entities
Feb 3, 2026
Merged

Amortize the cost of freeing entities#22658
alice-i-cecile merged 14 commits intobevyengine:mainfrom
ElliottjPierce:amortize-the-cost-of-freeing-entities

Conversation

@ElliottjPierce
Copy link
Contributor

Objective

The biggest drawback of #18670 was that it made freeing Entity's back to the allocator 4x slower. That meant a 20% regression in despawn performance. This PR vastly improves the performance of the entity allocator for freeing entities.

Solution

Add a local free list in pace in the main entity allocator. This is an ArrayVec called quick_free. When an entity is freed, add it to the quick_free. If it is full, flush the array to the full shared allocator.

Currently the array has length 64, taking 512 bytes. Since this is directly included in the already massive World type, I don't think this is an issue, and I would guess boxing it would hurt performance here. It also means that there will be at most 64 freed entities that simply can't be allocated. This reduces the worst case maximum entity count from 4,294,967,296 to 4,294,967,232 (big deal).

This also adds a new free_many function that is very fast compared to doing them one by one.

Testing

  • CI and benches.

Showcase

Here are some rough benchmarks on my M2 MAX:

group                                        post_quick_free_list                   pre_quick_free_list                    pre_remote_reservation
-----                                        --------------------                   -------------------                    ----------------------
entity_allocator_free/10000_entities         1.00     29.7±0.48µs        ? ?/sec    1.31     38.9±0.97µs        ? ?/sec    1.00     29.8±0.85µs        ? ?/sec
entity_allocator_free/100_entities           1.00   393.3±26.21ns        ? ?/sec    1.35   531.8±26.34ns        ? ?/sec    1.14   446.7±11.32ns        ? ?/sec
entity_allocator_free/1_entities             1.00      4.6±2.17ns        ? ?/sec    42.27  195.3±32.49ns        ? ?/sec    4.25     19.6±8.67ns        ? ?/sec
entity_allocator_free_bulk/10000_entities    1.00      8.7±0.36µs        ? ?/sec
entity_allocator_free_bulk/100_entities      1.00   240.9±31.01ns        ? ?/sec
entity_allocator_free_bulk/1_entities        1.00   206.8±39.95ns        ? ?/sec

Looking at the cost of freeing 1,000 entities, this makes the new allocator exactly as fast as the pre-#18670 one, 30% faster than main. The new free_many takes 8.7µs to free 1,000 entities where the optimized free takes 29.7, so another big win there.

This should make up the 20% regression to despawning. It might be even faster than pre-#18670 if we increase 64 to 128 or something, but I think that's unnecessary. This could also much improve performance for despawning scenes if we can find a way to make use of free_many, but that's a different task.

@ElliottjPierce ElliottjPierce added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 23, 2026
@ElliottjPierce
Copy link
Contributor Author

Ah, this is playing with the order entities are allocated in, which some tests still depend on. I'll fix that real quick.

Copy link
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We really need doc comments for quick_free, and I'm going to wait for those before approving, but the rest of my comments are just nits and possibilities.

/// This is in contrast to the [`RemoteAllocator`], which may be cloned freely.
pub(super) struct Allocator {
shared: Arc<SharedAllocator>,
quick_free: ArrayVec<Entity, 64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use an ordinary Vec? You can use Vec::with_capacity(64) and then check quick_free.len() == quick_free.capacity(). Storing these on a separate heap allocation instead of inline in World might even be better for cache locality, since most uses of World won't need this data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned about cache locality, but you're not wrong. IMO, we either do a bigger heap Vec or a smaller/similarly sized ArrayVec. I'm totally fine with either; just depends on what we are more interested in optimizing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm also fine with either! This is just another "prefer the simpler thing", where std types like Vec feel simpler than crates like ArrayVec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried boxing it and doubling its capacity:

  • Much worse perf for freeing less than 64 entities.
  • Much better perf for freeing between 64 and 128 entities.
  • Broke even for 1000 entities.

So I don't see a strong case to not box it, so now it's boxed. I did try just a Vec but that made performance ~50% worse unfortunately.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections in principle and this is generally well-made. More docs are always good though, and @chescock's review suggestions are unsurprisingly excellent.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 3, 2026
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Feb 3, 2026
Merged via the queue into bevyengine:main with commit 4cc92a0 Feb 3, 2026
42 checks passed
viridia pushed a commit to viridia/bevy that referenced this pull request Feb 3, 2026
# Objective

The biggest drawback of bevyengine#18670 was that it made freeing `Entity`'s back
to the allocator 4x slower. That meant a 20% regression in despawn
performance. This PR vastly improves the performance of the entity
allocator for freeing entities.

## Solution

Add a local free list in pace in the main entity allocator. This is an
`ArrayVec` called `quick_free`. When an entity is freed, add it to the
`quick_free`. If it is full, flush the array to the full shared
allocator.

Currently the array has length 64, taking 512 bytes. Since this is
directly included in the already massive `World` type, I don't think
this is an issue, and I would guess boxing it would hurt performance
here. It also means that there will be at most 64 freed entities that
simply can't be allocated. This reduces the worst case maximum entity
count from 4,294,967,296 to 4,294,967,232 (big deal).

This also adds a new `free_many` function that is very fast compared to
doing them one by one.

## Testing

- CI and benches.

---

## Showcase

Here are some rough benchmarks on my M2 MAX:

```txt
group                                        post_quick_free_list                   pre_quick_free_list                    pre_remote_reservation
-----                                        --------------------                   -------------------                    ----------------------
entity_allocator_free/10000_entities         1.00     29.7±0.48µs        ? ?/sec    1.31     38.9±0.97µs        ? ?/sec    1.00     29.8±0.85µs        ? ?/sec
entity_allocator_free/100_entities           1.00   393.3±26.21ns        ? ?/sec    1.35   531.8±26.34ns        ? ?/sec    1.14   446.7±11.32ns        ? ?/sec
entity_allocator_free/1_entities             1.00      4.6±2.17ns        ? ?/sec    42.27  195.3±32.49ns        ? ?/sec    4.25     19.6±8.67ns        ? ?/sec
entity_allocator_free_bulk/10000_entities    1.00      8.7±0.36µs        ? ?/sec
entity_allocator_free_bulk/100_entities      1.00   240.9±31.01ns        ? ?/sec
entity_allocator_free_bulk/1_entities        1.00   206.8±39.95ns        ? ?/sec
```

Looking at the cost of freeing 1,000 entities, this makes the new
allocator exactly as fast as the pre-bevyengine#18670 one, 30% faster than main.
The new `free_many` takes 8.7µs to free 1,000 entities where the
optimized `free` takes `29.7`, so another big win there.

This should make up the 20% regression to despawning. It might be even
faster than pre-bevyengine#18670 if we increase 64 to 128 or something, but I
think that's unnecessary. This could also much improve performance for
despawning scenes if we can find a way to make use of `free_many`, but
that's a different task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants