Reduced `TableRow` `as` Casting #10811

bushrat011899 · 2023-11-30T03:26:07Z

Objective

Fixes Change TableRow::new to accept u32 #10806

Solution

Replaced new and index methods for both TableRow and TableId with from_* and as_* methods. These remove the need to perform casting at call sites, reducing the total number of casts in the Bevy codebase. Within these methods, an appropriate debug_assertion ensures the cast will behave in an expected manner (no wrapping, etc.). I am using a debug_assertion instead of an assert to reduce any possible runtime overhead, however minimal. This choice is something I am open to changing (or leaving up to another PR) if anyone has any strong arguments for it.

Changelog

ComponentSparseSet::sparse stores a TableRow instead of a u32 (private change)
Replaced TableRow::new and TableRow::index methods with TableRow::from_* and TableRow::as_*, with debug_assertions protecting any internal casting.
Replaced TableId::new and TableId::index methods with TableId::from_* and TableId::as_*, with debug_assertions protecting any internal casting.
All TableId methods are now const

Migration Guide

TableRow::new -> TableRow::from_usize
TableRow::index -> TableRow::as_usize
TableId::new -> TableId::from_usize
TableId::index -> TableId::as_usize

Notes

I have chosen to remove the index and new methods for the following chain of reasoning:

Across the codebase, new was called with a mixture of u32 and usize values. Likewise for index.
Choosing new to either be usize or u32 would break half of these call-sites, requiring as casting at the site.
Adding a second method new_u32 or new_usize avoids the above, bu looks visually inconsistent.
Therefore, they should be replaced with from_* and as_* methods instead.

Worth noting is that by updating ComponentSparseSet, there are now zero instances of interacting with the inner value of TableRow as a u32, it is exclusively used as a usize value (due to interactions with methods like len and slice indexing). I have left the as_u32 and from_u32 methods as the "proper" constructors/getters.

Localizes all instances of `as` casting into a pair of properly checked functions with `debug_assertions`. By using `debug_assertions` any runtime costs are avoided.

alice-i-cecile · 2023-11-30T03:27:26Z

@stepancheg, could I get your review here?

alice-i-cecile

Much improved type safety, faithful translation to the new API, and even some nice new const fn annotations.

Totally content to defer to you on the exact names for the methods. Thanks!

stepancheg · 2023-11-30T03:52:33Z

debug_assert does not look right.

debug_assert is meant to verify internal invariants, while here we use it mostly blindly: if you didn't read the documentation of the function, you can easily make a mistake.

This is kind of problem. We run debug build, don't trigger any assertions, but then run code in production and something breaks, because in production data size is usually larger. Like for one user in thousand the program will crash in non-reproducible use case.

For example, it is very easy to get overflow in this fictional scenario:

let index: usize = ... // some valid index, happened to be equal to u32::MAX
let next_index = index + 1;
let next_index = TableRow::from_usize(next_index);
if next_index.as_usize() == table_len { ... }

here we quietly missed integer overflow.

I believe, it won't be possible to observe overhead of assert!: it is extremely cheap compared to all other things this code is doing.

If we have to use debug_assert, we can split from_usize into two functions:

from_usize with assert!
from_usize_unchecked with debug_assert! where it's clear at call site that the overflow may happen

Another way to deal with it is to just switch all the indices to u32. For example, switch QueryIterationCursor.current_row to u32 since we cannot have more than u32::MAX entities even in sparse set IIUC.

crates/bevy_ecs/src/storage/table.rs

stepancheg · 2023-11-30T04:10:13Z

Another possibility is instead to switch TableRow to usize. As far as I see, it is not stored anywhere, only used as temporary value, so increasing the size won't create memory overhead.

stepancheg · 2023-11-30T04:11:22Z

Also TableRow::from_xxx should probably also check that the index is not equal to TableRow::INVALID.

bushrat011899 · 2023-11-30T04:29:02Z

debug_assert does not look right.

debug_assert is meant to verify internal invariants, while here we use it mostly blindly: if you didn't read the documentation of the function, you can easily make a mistake.

This is kind of problem. We run debug build, don't trigger any assertions, but then run code in production and something breaks, because in production data size is usually larger. Like for one user in thousand the program will crash in non-reproducible use case.

As I see it, debug_assert is for hot-path assert where the assertion may not be required. In release, with the assertion disabled, this code is still safe and free of undefined behaviour, so the worst case scenario is reading the value of the wrong Entity in a Table. A bug, but would require 4,294,967,296 entities of the same archetype to induce. I suspect something else in Bevy would break first! Let alone the performance implications of trying to Query 4 billion entities from a single table.

Further, end users do have the option of compiling in release mode with debug_assertions enabled. This will decrease performance, because assertions do add overhead, but there are many instances in Bevy of problems which cannot be observed without debug_assertions enabled (~100 from a quick search). As an example, you can create the Mesh of a negative-radius Cylinder in release mode, but not in debug.

I believe, it won't be possible to observe overhead of assert!: it is extremely cheap compared to all other things this code is doing.

I see this PR as a stepping stone to better safety, and maybe not the final goal. The changes I've made are definitely zero impact in standard release builds, while assert is likely to be zero impact. I personally agree with you that assert is probably fine to use here, but it is a part of the looped hot path in several locations within Bevy (Query will run this method for every Entity in the world for every iter as one example). By accepting this PR with debug_assert, it becomes trivial to make a followup benchmarking a full application with either debug_assert or assert.

If we have to use debug_assert, we can split from_usize into two functions:
* `from_usize` with `assert!`

* `from_usize_unchecked` with `debug_assert!` where it's caller responsibility to guarantee that the value does not overflow

Personally, I don't like adding unchecked methods where it can be avoided. I'm not saying that's wrong here, but I personally don't want to add more "unsafe" methods.

Another way to deal with it is to just switch all the indices to u32. For example, switch QueryIterationCursor.current_row to u32 since we cannot have more than u32::MAX entities even in sparse set IIUC.

That's not practically possible because of how it's used in relation to container len and indexing [..] methods, which always work with usize. Eventually, someone will need to cast TableRow into a usize for one reason or another. By bringing those casts into TableRow, we at least have oversight on where it happens and can discuss appropriate safety.

Another possibility is instead to switch TableRow to usize. As far as I see, it is not stored anywhere, only used as temporary value, so increasing the size won't create memory overhead.

It's implicitly stored in ComponentSparseSet::sparse. In main, it's stored by-proxy as a u32, whereas in this PR I replace it with the proper TableRow value instead. I do agree that using usize internally would sidestep the issue, but I suspect the choice is deliberate based on an assumption that no table will contain more than u32::MAX entities. Someone else would need to weigh in to confirm or deny that though.

Also TableRow::from_xxx should probably also check that the index is not equal to TableRow::INVALID.

Perhaps, but that's definitely some extra new behaviour that should also still be in a debug_assert.

Using forward-reverse casting to ensure value is invariant.

stepancheg · 2023-11-30T04:44:16Z

Further, end users do have the option of compiling in release mode with debug_assertions enabled.

Developers using Bevy have this option. Users of apps developed with Bevy, not so much.

This will decrease performance, because assertions do add overhead

Different assertions have different overhead. Many debug assertions in bevy are expensive (some debug assertions even add fields to structs).

This one is cheap. In particular, every time Vec is used (like a[i]), there's range check. And there is a lot of such checks. Including in code this PR modifies. (Moreover, check against constant is even cheaper than check against variable, because there's no memory dependency.)

A bug, but would require 4,294,967,296 entities of the same archetype to induce. I suspect something else in Bevy would break first!

This is not very convincing argument.

Let alone the performance implications of trying to Query 4 billion entities from a single table.

I can imagine another scenario: create 4 million entries slowly accidentally forgetting to delete them. I expect Bevy to crash in this scenario rather than quietly overflow because there's some undetected bug during next refactoring.

That's not practically possible because of how it's used in relation to container len and indexing [..] methods

I'm sorry, I don't understand why it is not possible to switch indices to u32. When usize is needed, just cast it to usize.

Perhaps, but that's definitely some extra new behaviour that should also still be in a debug_assert

Do you have any evidence that unconditional assert for integer range is expensive?

Anyway, just in case, I'm not project maintainer, I just post my opinion. The decision how to do it should be made by someone else.

bushrat011899 · 2023-11-30T05:49:04Z

I can imagine another scenario: create 4 million entries slowly accidentally forgetting to delete them. I expect Bevy to crash in this scenario rather than quietly overflow because there's some undetected bug during next refactoring.

This will create a single table storing 32GB of data at a minimum (An Entity is 8 bytes). Let alone if these entities had any components. Again, I'm not saying it can't happen, but it will be incredibly clear something has gone wrong even without the assertion.

I don't want to get into an argument over this, because I do agree with you that assert should be fine in this context. I benched with debug_assert and assert and couldn't measure any more a 3% performance difference in the iter_simple/system bench (properly warmed up, randomised run order, repeated over 30 minutes on a single machine). I just don't have a good proof for that <3% for the same reasons you've previously outlined.

I personally believe this PR still provides added safety even with only a debug_assert. This removes 8 as statements across bevy_ecs that each could've caused this kind of issue. Now, all those cases are centralised to a single area behind a debug_assert. I know there are some out there who would want more proof on the performance impact (or lack there of) for an additional standard assert in the hot path. By putting the debug_assert there now, it opens up the door for a very simple PR to turn these into assert, making the performance difference much easier to measure. I think it might even be a good PR to go over all the debug_assert statements in general and see which ones can be promoted into assert statements.

stepancheg · 2023-11-30T06:59:20Z

but it will be incredibly clear something has gone wrong even without the assertion

Not necessarily. Operating system will just start swapping memory in the beginning of the table, and will happily allocate much more before it will be killed.

couldn't measure any more a 3% performance difference

3% is too much. It cannot be that bad.

Can I try to replicate your test? Can you give me instructions what you did, so I would try it myself?

bushrat011899 · 2023-11-30T07:40:37Z

Can I try to replicate your test? Can you give me instructions what you did, so I would try it myself?

As said, I'm not confident in my testing, but what I did was create a second branch which used assert instead of debug_assert in those 4 functions across TableRow and TableId.

Then I ran:

// On assert branch
cargo bench --bench ecs -- iter_simple/system --save-baseline assert

// On debug_assert branch
cargo bench --bench ecs -- iter_simple/system --save-baseline debug_assert

// Finally
cargo bench --bench ecs -- iter_simple/system --load-baseline debug_assert --baseline assert

I generated the baselines in a loop over 10 minutes each, took the average value from each and then generated again until I found that value. Another option would be to save every baseline with a unique name.

The final result was always less than a 3% deviation, and criterion claimed the result was within the noise threshold, indicating a difference not statistically significant, as you and I would expect.

I chose iter_simple/system as it takes about 9us on my laptop to complete, so a better measurement than just the individual function on its own.

Please feel free to bench this however you think is appropriate, I haven't switch to assert because I don't feel confident enough in my testing to say there's no difference, so I'm aiming for a low controversy change just in case.

stepancheg · 2023-11-30T07:51:59Z

The final result was always less than a 3% deviation, and criterion claimed the result was within the noise threshold, indicating a difference not statistically significant

Right.

Please feel free to bench this however you think is appropriate

For benchmarking I use my absh tool. But it requires too binaries. I'll try to extract iter_simple test to a binary and run the benchmark.

crates/bevy_ecs/src/storage/table.rs

stepancheg · 2023-11-30T08:32:34Z

OK, how I benchmarked.

examples/bb.rs

Build with:

cargo build --example bb --release --target aarch64-apple-darwin

(on Apple M1 laptop)

Diff A -> B

(Version A is with debug_assert! and version B is with assert!)

Running test as:

absh -im -a '~/tmp/bb-a' -b '~/tmp/bb-b'

Ran it for 10 minutes, with 95% confidence, it is not worse than 0.7% regression:

Time (in seconds):
A: n=200 mean=3.917 std=0.125 se=0.008 min=3.773 max=5.047 med=3.892
B: n=200 mean=3.923 std=0.100 se=0.007 min=3.742 max=4.671 med=3.904
A: distr=[  ▃▂▆▆█▅▆▄▂▁▁▁                                           ]
B: distr=[  ▁▃▄▇▅▆▄▆▃▃▃ ▁                                          ]
B/A: 1.001 0.996..1.007 (95% conf)

To detect 0.1% regression it needs to be run for hours.

But with 0.1% perf difference such benchmarks cannot be trusted, because compiler optimizations reorder instructions, inline functions differently etc, which may result in perf change in any direction.

alice-i-cecile · 2023-11-30T15:34:59Z

With reliable benchmark results, let's swap over to full asserts here. Thanks for grabbing those.

SkiFire13 · 2023-11-30T16:53:06Z

Both benchmarks showed here are using for loops, could you also provide one with a simple for_each (for example iter_simple_foreach)? The reason being that for_each is known to optimize better, sometimes it can even be autovectorized (see #6547), so introducing a branch there may prevent those optimizations and be more noticeable, while in for loops the same optimizations may have never been performed.

I also noticed that most usages of TableRow::from_usize are bounded by a table.entity_count(), which in turn can never be more than 2^32 due to entities themselves being limited to 2^32 (since their index is a u32). Now, you might argue this argument isn't convincing enough to only have debug_assert in TableRow::from_usize (after all we might miss something, introduce a bug somewhere etc etc). However this could be used to hint the compiler that TableRow::from_usize may never fail. For example, take this code from QueryState::for_each_unchecked_manual:

for row in 0..table.entity_count() { // `table.entity_count()` is a `usize`
    let entity = entities.get_unchecked(row);
    let row = TableRow::from_usize(row); // For the compiler `row` may overflow a `u32`, so it can't elide the internal `assert`

But if instead it was:

let entity_count = table.entity_count();
assert!(entity_count <= 1 << 32);
for row in 0..entity_count {
    let entity = entities.get_unchecked(row);
    let row = TableRow::from_usize(row); // Now it's locally known that `row < entity_count <= 2^32`, so the compiler can elide the check
}

That is, we just added an assert outside the loop that (we hope) allows the compiler to elide the assert in the loop body.

stepancheg · 2023-11-30T17:20:21Z

For the record, I ran the benchmark above running for a night (with -r flags to randomize execution order A/B or B/A), and the result is:

Time (in seconds):
A: n=2917 mean=3.725 std=0.046 se=0.000 min=3.526 max=4.062 med=3.729
B: n=2917 mean=3.729 std=0.048 se=0.000 min=3.540 max=4.048 med=3.733
A: distr=[             ▁▁▁▂▂▃▄▇▇██▆▄▃▂▁                             ]
B: distr=[            ▁▁▁▁▁▂▃▅▅▆██▇▅▄▂▁                             ]
B/A: 1.001 1.001..1.002 (95% conf)

So full assertions is 0.1% regression

which might be is randomness introduced by the compiler
or it isacceptable price to pay for peace of mind in my subjective opinion
plus see a comment above by @SkiFire13 about assertion elision

SkiFire13 · 2023-11-30T17:35:53Z

Just to be clear, I don't think the compiler is currently eliding the assert (it currently has no information to know that table.entities_count() is bounded by 2^32). Instead I'm suggesting to introduce an assert outside the loop that should hint the compiler that it can elide the ones inside.

stepancheg · 2023-11-30T17:39:54Z

Instead I'm suggesting to introduce an assert outside the loop that should hint the compiler that it can elide the ones inside

Right.

Here is compiler explorer.

For code:

    assert!(x <= 1 << 32);
    for i in 0..x {
        assert!(i as u32 as usize == i);
        foo(i as u32);
    }

there's no assertion in the loop, the loop machine code is:

.LBB0_3:
        leaq    1(%rdi), %r15
        callq   *%r14
        movq    %r15, %rdi
        cmpq    %r15, %rbx
        jne     .LBB0_3

(one conditional branch which is exit from loop on iteration variable).

Also removed `as_*` assertion since `usize` is guaranteed to be `u32` or larger in Bevy.

bushrat011899 · 2023-11-30T21:23:32Z

Thanks for the detailed testing @stepancheg! I've updated this PR to use assert instead of debug_assert since I am now confident that there is no practical performance difference.

crates/bevy_ecs/src/storage/table.rs

Also removed `as_*` assertion since `usize` is guaranteed to be `u32` or larger in Bevy.

…/bevy into TableU32Operations

stepancheg · 2023-12-01T08:58:07Z

@bushrat011899 can you please add assertions to the loops as proposed by @SkiFire13? Perhaps as a separate PR.

Added `assert` in `QueryIter::fold_over_table_range` out of the loop to hint compiler on optimisation.

bushrat011899 · 2023-12-02T01:12:30Z

@bushrat011899 can you please add assertions to the loops as proposed by @SkiFire13? Perhaps as a separate PR.

Had to resolve a merge conflict anyway, so I've added in 1 assert statement outside a for-loop where I believe it would still be appropriate.

# Objective - Since #10811,Bevy uses `assert `in the hot path of iteration. The `for_each `method has an assert in the outer loop to help the compiler remove unnecessary branching in the internal loop. - However , ` for` style iterations do not receive the same treatment. it still have a branch check in the internal loop, which could potentially hurt performance. ## Solution - use `TableRow::from_u32 ` instead of ` TableRow::from_usize` to avoid unnecessary branch. Before ![image](https://github.com/bevyengine/bevy/assets/45868716/f6d2a1ac-2129-48ff-97bf-d86713ddeaaf) After ---------------------------------------------------------------------------- ![image](https://github.com/bevyengine/bevy/assets/45868716/bfe5a9ee-ba6c-4a80-85b0-1c6d43adfe8c)

# Objective - Since bevyengine#10811,Bevy uses `assert `in the hot path of iteration. The `for_each `method has an assert in the outer loop to help the compiler remove unnecessary branching in the internal loop. - However , ` for` style iterations do not receive the same treatment. it still have a branch check in the internal loop, which could potentially hurt performance. ## Solution - use `TableRow::from_u32 ` instead of ` TableRow::from_usize` to avoid unnecessary branch. Before ![image](https://github.com/bevyengine/bevy/assets/45868716/f6d2a1ac-2129-48ff-97bf-d86713ddeaaf) After ---------------------------------------------------------------------------- ![image](https://github.com/bevyengine/bevy/assets/45868716/bfe5a9ee-ba6c-4a80-85b0-1c6d43adfe8c)

Replaced TableRow new and index with from_* and as_*

58e5dcd

Localizes all instances of `as` casting into a pair of properly checked functions with `debug_assertions`. By using `debug_assertions` any runtime costs are avoided.

alice-i-cecile added A-ECS Entities, components, systems, and events C-Code-Quality A section of code that is hard to understand or change C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide labels Nov 30, 2023

alice-i-cecile approved these changes Nov 30, 2023

View reviewed changes

stepancheg reviewed Nov 30, 2023

View reviewed changes

crates/bevy_ecs/src/storage/table.rs Outdated Show resolved Hide resolved

Updated TableRow from_* and as_* assertions

3acff73

Using forward-reverse casting to ensure value is invariant.

Bluefinger reviewed Nov 30, 2023

View reviewed changes

crates/bevy_ecs/src/storage/table.rs Outdated Show resolved Hide resolved

Updated TableRow from_* assertions to assert

5a9b2c8

Also removed `as_*` assertion since `usize` is guaranteed to be `u32` or larger in Bevy.

stepancheg reviewed Nov 30, 2023

View reviewed changes

crates/bevy_ecs/src/storage/table.rs Outdated Show resolved Hide resolved

bushrat011899 added 2 commits December 1, 2023 08:28

Updated TableRow from_* assertions to assert

147bb13

Also removed `as_*` assertion since `usize` is guaranteed to be `u32` or larger in Bevy.

Merge branch 'TableU32Operations' of https://github.com/bushrat011899…

b8d7f5d

…/bevy into TableU32Operations

alice-i-cecile approved these changes Nov 30, 2023

View reviewed changes

Bluefinger approved these changes Dec 1, 2023

View reviewed changes

stepancheg approved these changes Dec 1, 2023

View reviewed changes

alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Dec 1, 2023

bushrat011899 added 2 commits December 2, 2023 11:55

Merge remote-tracking branch 'upstream/main' into TableU32Operations

6da634b

Resolve upstream conflicts

f8612ca

Added `assert` in `QueryIter::fold_over_table_range` out of the loop to hint compiler on optimisation.

alice-i-cecile added this pull request to the merge queue Dec 5, 2023

Merged via the queue into bevyengine:main with commit 72adf2a Dec 5, 2023
22 checks passed

nicopap mentioned this pull request Jan 29, 2024

0.13 announcement post bevyengine/bevy-website#891

Merged

re0312 mentioned this pull request Apr 2, 2024

Remove unnecessary branch in query iteration #12844

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced `TableRow` `as` Casting #10811

Reduced `TableRow` `as` Casting #10811

bushrat011899 commented Nov 30, 2023

alice-i-cecile commented Nov 30, 2023

alice-i-cecile left a comment

stepancheg commented Nov 30, 2023 •

edited

Loading

stepancheg commented Nov 30, 2023

stepancheg commented Nov 30, 2023

bushrat011899 commented Nov 30, 2023

stepancheg commented Nov 30, 2023 •

edited

Loading

bushrat011899 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

bushrat011899 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

stepancheg commented Nov 30, 2023 •

edited

Loading

alice-i-cecile commented Nov 30, 2023

SkiFire13 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

SkiFire13 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

bushrat011899 commented Nov 30, 2023

stepancheg commented Dec 1, 2023

bushrat011899 commented Dec 2, 2023

Reduced TableRow as Casting #10811

Reduced TableRow as Casting #10811

Conversation

bushrat011899 commented Nov 30, 2023

Objective

Solution

Changelog

Migration Guide

Notes

alice-i-cecile commented Nov 30, 2023

alice-i-cecile left a comment

Choose a reason for hiding this comment

stepancheg commented Nov 30, 2023 • edited Loading

stepancheg commented Nov 30, 2023

stepancheg commented Nov 30, 2023

bushrat011899 commented Nov 30, 2023

stepancheg commented Nov 30, 2023 • edited Loading

bushrat011899 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

bushrat011899 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

stepancheg commented Nov 30, 2023 • edited Loading

alice-i-cecile commented Nov 30, 2023

SkiFire13 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

SkiFire13 commented Nov 30, 2023

stepancheg commented Nov 30, 2023

bushrat011899 commented Nov 30, 2023

stepancheg commented Dec 1, 2023

bushrat011899 commented Dec 2, 2023

Reduced `TableRow` `as` Casting #10811

Reduced `TableRow` `as` Casting #10811

stepancheg commented Nov 30, 2023 •

edited

Loading

stepancheg commented Nov 30, 2023 •

edited

Loading

stepancheg commented Nov 30, 2023 •

edited

Loading