Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced TableRow as Casting #10811

Merged
merged 7 commits into from
Dec 5, 2023

Conversation

bushrat011899
Copy link
Contributor

Objective

Solution

Replaced new and index methods for both TableRow and TableId with from_* and as_* methods. These remove the need to perform casting at call sites, reducing the total number of casts in the Bevy codebase. Within these methods, an appropriate debug_assertion ensures the cast will behave in an expected manner (no wrapping, etc.). I am using a debug_assertion instead of an assert to reduce any possible runtime overhead, however minimal. This choice is something I am open to changing (or leaving up to another PR) if anyone has any strong arguments for it.


Changelog

  • ComponentSparseSet::sparse stores a TableRow instead of a u32 (private change)
  • Replaced TableRow::new and TableRow::index methods with TableRow::from_* and TableRow::as_*, with debug_assertions protecting any internal casting.
  • Replaced TableId::new and TableId::index methods with TableId::from_* and TableId::as_*, with debug_assertions protecting any internal casting.
  • All TableId methods are now const

Migration Guide

  • TableRow::new -> TableRow::from_usize
  • TableRow::index -> TableRow::as_usize
  • TableId::new -> TableId::from_usize
  • TableId::index -> TableId::as_usize

Notes

I have chosen to remove the index and new methods for the following chain of reasoning:

  • Across the codebase, new was called with a mixture of u32 and usize values. Likewise for index.
  • Choosing new to either be usize or u32 would break half of these call-sites, requiring as casting at the site.
  • Adding a second method new_u32 or new_usize avoids the above, bu looks visually inconsistent.
  • Therefore, they should be replaced with from_* and as_* methods instead.

Worth noting is that by updating ComponentSparseSet, there are now zero instances of interacting with the inner value of TableRow as a u32, it is exclusively used as a usize value (due to interactions with methods like len and slice indexing). I have left the as_u32 and from_u32 methods as the "proper" constructors/getters.

Localizes all instances of `as` casting into a pair of properly checked functions with `debug_assertions`. By using `debug_assertions` any runtime costs are avoided.
@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events C-Code-Quality A section of code that is hard to understand or change C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide labels Nov 30, 2023
@alice-i-cecile
Copy link
Member

@stepancheg, could I get your review here?

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much improved type safety, faithful translation to the new API, and even some nice new const fn annotations.

Totally content to defer to you on the exact names for the methods. Thanks!

@stepancheg
Copy link
Contributor

stepancheg commented Nov 30, 2023

debug_assert does not look right.

debug_assert is meant to verify internal invariants, while here we use it mostly blindly: if you didn't read the documentation of the function, you can easily make a mistake.

This is kind of problem. We run debug build, don't trigger any assertions, but then run code in production and something breaks, because in production data size is usually larger. Like for one user in thousand the program will crash in non-reproducible use case.

For example, it is very easy to get overflow in this fictional scenario:

let index: usize = ... // some valid index, happened to be equal to u32::MAX
let next_index = index + 1;
let next_index = TableRow::from_usize(next_index);
if next_index.as_usize() == table_len { ... }

here we quietly missed integer overflow.

I believe, it won't be possible to observe overhead of assert!: it is extremely cheap compared to all other things this code is doing.

If we have to use debug_assert, we can split from_usize into two functions:

  • from_usize with assert!
  • from_usize_unchecked with debug_assert! where it's clear at call site that the overflow may happen

Another way to deal with it is to just switch all the indices to u32. For example, switch QueryIterationCursor.current_row to u32 since we cannot have more than u32::MAX entities even in sparse set IIUC.

@stepancheg
Copy link
Contributor

Another possibility is instead to switch TableRow to usize. As far as I see, it is not stored anywhere, only used as temporary value, so increasing the size won't create memory overhead.

@stepancheg
Copy link
Contributor

Also TableRow::from_xxx should probably also check that the index is not equal to TableRow::INVALID.

@bushrat011899
Copy link
Contributor Author

debug_assert does not look right.

debug_assert is meant to verify internal invariants, while here we use it mostly blindly: if you didn't read the documentation of the function, you can easily make a mistake.

This is kind of problem. We run debug build, don't trigger any assertions, but then run code in production and something breaks, because in production data size is usually larger. Like for one user in thousand the program will crash in non-reproducible use case.

As I see it, debug_assert is for hot-path assert where the assertion may not be required. In release, with the assertion disabled, this code is still safe and free of undefined behaviour, so the worst case scenario is reading the value of the wrong Entity in a Table. A bug, but would require 4,294,967,296 entities of the same archetype to induce. I suspect something else in Bevy would break first! Let alone the performance implications of trying to Query 4 billion entities from a single table.

Further, end users do have the option of compiling in release mode with debug_assertions enabled. This will decrease performance, because assertions do add overhead, but there are many instances in Bevy of problems which cannot be observed without debug_assertions enabled (~100 from a quick search). As an example, you can create the Mesh of a negative-radius Cylinder in release mode, but not in debug.

I believe, it won't be possible to observe overhead of assert!: it is extremely cheap compared to all other things this code is doing.

I see this PR as a stepping stone to better safety, and maybe not the final goal. The changes I've made are definitely zero impact in standard release builds, while assert is likely to be zero impact. I personally agree with you that assert is probably fine to use here, but it is a part of the looped hot path in several locations within Bevy (Query will run this method for every Entity in the world for every iter as one example). By accepting this PR with debug_assert, it becomes trivial to make a followup benchmarking a full application with either debug_assert or assert.

If we have to use debug_assert, we can split from_usize into two functions:

* `from_usize` with `assert!`

* `from_usize_unchecked` with `debug_assert!` where it's caller responsibility to guarantee that the value does not overflow

Personally, I don't like adding unchecked methods where it can be avoided. I'm not saying that's wrong here, but I personally don't want to add more "unsafe" methods.

Another way to deal with it is to just switch all the indices to u32. For example, switch QueryIterationCursor.current_row to u32 since we cannot have more than u32::MAX entities even in sparse set IIUC.

That's not practically possible because of how it's used in relation to container len and indexing [..] methods, which always work with usize. Eventually, someone will need to cast TableRow into a usize for one reason or another. By bringing those casts into TableRow, we at least have oversight on where it happens and can discuss appropriate safety.

Another possibility is instead to switch TableRow to usize. As far as I see, it is not stored anywhere, only used as temporary value, so increasing the size won't create memory overhead.

It's implicitly stored in ComponentSparseSet::sparse. In main, it's stored by-proxy as a u32, whereas in this PR I replace it with the proper TableRow value instead. I do agree that using usize internally would sidestep the issue, but I suspect the choice is deliberate based on an assumption that no table will contain more than u32::MAX entities. Someone else would need to weigh in to confirm or deny that though.

Also TableRow::from_xxx should probably also check that the index is not equal to TableRow::INVALID.

Perhaps, but that's definitely some extra new behaviour that should also still be in a debug_assert.

Using forward-reverse casting to ensure value is invariant.
@stepancheg
Copy link
Contributor

stepancheg commented Nov 30, 2023

Further, end users do have the option of compiling in release mode with debug_assertions enabled.

Developers using Bevy have this option. Users of apps developed with Bevy, not so much.

This will decrease performance, because assertions do add overhead

Different assertions have different overhead. Many debug assertions in bevy are expensive (some debug assertions even add fields to structs).

This one is cheap. In particular, every time Vec is used (like a[i]), there's range check. And there is a lot of such checks. Including in code this PR modifies. (Moreover, check against constant is even cheaper than check against variable, because there's no memory dependency.)

A bug, but would require 4,294,967,296 entities of the same archetype to induce. I suspect something else in Bevy would break first!

This is not very convincing argument.

Let alone the performance implications of trying to Query 4 billion entities from a single table.

I can imagine another scenario: create 4 million entries slowly accidentally forgetting to delete them. I expect Bevy to crash in this scenario rather than quietly overflow because there's some undetected bug during next refactoring.

That's not practically possible because of how it's used in relation to container len and indexing [..] methods

I'm sorry, I don't understand why it is not possible to switch indices to u32. When usize is needed, just cast it to usize.

Perhaps, but that's definitely some extra new behaviour that should also still be in a debug_assert

Do you have any evidence that unconditional assert for integer range is expensive?

Anyway, just in case, I'm not project maintainer, I just post my opinion. The decision how to do it should be made by someone else.

@bushrat011899
Copy link
Contributor Author

I can imagine another scenario: create 4 million entries slowly accidentally forgetting to delete them. I expect Bevy to crash in this scenario rather than quietly overflow because there's some undetected bug during next refactoring.

This will create a single table storing 32GB of data at a minimum (An Entity is 8 bytes). Let alone if these entities had any components. Again, I'm not saying it can't happen, but it will be incredibly clear something has gone wrong even without the assertion.

I don't want to get into an argument over this, because I do agree with you that assert should be fine in this context. I benched with debug_assert and assert and couldn't measure any more a 3% performance difference in the iter_simple/system bench (properly warmed up, randomised run order, repeated over 30 minutes on a single machine). I just don't have a good proof for that <3% for the same reasons you've previously outlined.

I personally believe this PR still provides added safety even with only a debug_assert. This removes 8 as statements across bevy_ecs that each could've caused this kind of issue. Now, all those cases are centralised to a single area behind a debug_assert. I know there are some out there who would want more proof on the performance impact (or lack there of) for an additional standard assert in the hot path. By putting the debug_assert there now, it opens up the door for a very simple PR to turn these into assert, making the performance difference much easier to measure. I think it might even be a good PR to go over all the debug_assert statements in general and see which ones can be promoted into assert statements.

@stepancheg
Copy link
Contributor

but it will be incredibly clear something has gone wrong even without the assertion

Not necessarily. Operating system will just start swapping memory in the beginning of the table, and will happily allocate much more before it will be killed.

couldn't measure any more a 3% performance difference

3% is too much. It cannot be that bad.

Can I try to replicate your test? Can you give me instructions what you did, so I would try it myself?

@bushrat011899
Copy link
Contributor Author

Can I try to replicate your test? Can you give me instructions what you did, so I would try it myself?

As said, I'm not confident in my testing, but what I did was create a second branch which used assert instead of debug_assert in those 4 functions across TableRow and TableId.

Then I ran:

// On assert branch
cargo bench --bench ecs -- iter_simple/system --save-baseline assert

// On debug_assert branch
cargo bench --bench ecs -- iter_simple/system --save-baseline debug_assert

// Finally
cargo bench --bench ecs -- iter_simple/system --load-baseline debug_assert --baseline assert

I generated the baselines in a loop over 10 minutes each, took the average value from each and then generated again until I found that value. Another option would be to save every baseline with a unique name.

The final result was always less than a 3% deviation, and criterion claimed the result was within the noise threshold, indicating a difference not statistically significant, as you and I would expect.

I chose iter_simple/system as it takes about 9us on my laptop to complete, so a better measurement than just the individual function on its own.

Please feel free to bench this however you think is appropriate, I haven't switch to assert because I don't feel confident enough in my testing to say there's no difference, so I'm aiming for a low controversy change just in case.

@stepancheg
Copy link
Contributor

The final result was always less than a 3% deviation, and criterion claimed the result was within the noise threshold, indicating a difference not statistically significant

Right.

Please feel free to bench this however you think is appropriate

For benchmarking I use my absh tool. But it requires too binaries. I'll try to extract iter_simple test to a binary and run the benchmark.

@stepancheg
Copy link
Contributor

stepancheg commented Nov 30, 2023

OK, how I benchmarked.

examples/bb.rs

Build with:

cargo build --example bb --release --target aarch64-apple-darwin

(on Apple M1 laptop)

Diff A -> B

(Version A is with debug_assert! and version B is with assert!)

Running test as:

absh -im -a '~/tmp/bb-a' -b '~/tmp/bb-b'

Ran it for 10 minutes, with 95% confidence, it is not worse than 0.7% regression:

Time (in seconds):
A: n=200 mean=3.917 std=0.125 se=0.008 min=3.773 max=5.047 med=3.892
B: n=200 mean=3.923 std=0.100 se=0.007 min=3.742 max=4.671 med=3.904
A: distr=[  ▃▂▆▆█▅▆▄▂▁▁▁                                           ]
B: distr=[  ▁▃▄▇▅▆▄▆▃▃▃ ▁                                          ]
B/A: 1.001 0.996..1.007 (95% conf)

To detect 0.1% regression it needs to be run for hours.

But with 0.1% perf difference such benchmarks cannot be trusted, because compiler optimizations reorder instructions, inline functions differently etc, which may result in perf change in any direction.

@alice-i-cecile
Copy link
Member

With reliable benchmark results, let's swap over to full asserts here. Thanks for grabbing those.

@SkiFire13
Copy link
Contributor

Both benchmarks showed here are using for loops, could you also provide one with a simple for_each (for example iter_simple_foreach)? The reason being that for_each is known to optimize better, sometimes it can even be autovectorized (see #6547), so introducing a branch there may prevent those optimizations and be more noticeable, while in for loops the same optimizations may have never been performed.

I also noticed that most usages of TableRow::from_usize are bounded by a table.entity_count(), which in turn can never be more than 2^32 due to entities themselves being limited to 2^32 (since their index is a u32). Now, you might argue this argument isn't convincing enough to only have debug_assert in TableRow::from_usize (after all we might miss something, introduce a bug somewhere etc etc). However this could be used to hint the compiler that TableRow::from_usize may never fail. For example, take this code from QueryState::for_each_unchecked_manual:

for row in 0..table.entity_count() { // `table.entity_count()` is a `usize`
    let entity = entities.get_unchecked(row);
    let row = TableRow::from_usize(row); // For the compiler `row` may overflow a `u32`, so it can't elide the internal `assert`

But if instead it was:

let entity_count = table.entity_count();
assert!(entity_count <= 1 << 32);
for row in 0..entity_count {
    let entity = entities.get_unchecked(row);
    let row = TableRow::from_usize(row); // Now it's locally known that `row < entity_count <= 2^32`, so the compiler can elide the check
}

That is, we just added an assert outside the loop that (we hope) allows the compiler to elide the assert in the loop body.

@stepancheg
Copy link
Contributor

For the record, I ran the benchmark above running for a night (with -r flags to randomize execution order A/B or B/A), and the result is:

Time (in seconds):
A: n=2917 mean=3.725 std=0.046 se=0.000 min=3.526 max=4.062 med=3.729
B: n=2917 mean=3.729 std=0.048 se=0.000 min=3.540 max=4.048 med=3.733
A: distr=[             ▁▁▁▂▂▃▄▇▇██▆▄▃▂▁                             ]
B: distr=[            ▁▁▁▁▁▂▃▅▅▆██▇▅▄▂▁                             ]
B/A: 1.001 1.001..1.002 (95% conf)

So full assertions is 0.1% regression

  • which might be is randomness introduced by the compiler
  • or it isacceptable price to pay for peace of mind in my subjective opinion
  • plus see a comment above by @SkiFire13 about assertion elision

@SkiFire13
Copy link
Contributor

Just to be clear, I don't think the compiler is currently eliding the assert (it currently has no information to know that table.entities_count() is bounded by 2^32). Instead I'm suggesting to introduce an assert outside the loop that should hint the compiler that it can elide the ones inside.

@stepancheg
Copy link
Contributor

Instead I'm suggesting to introduce an assert outside the loop that should hint the compiler that it can elide the ones inside

Right.

Here is compiler explorer.

For code:

    assert!(x <= 1 << 32);
    for i in 0..x {
        assert!(i as u32 as usize == i);
        foo(i as u32);
    }

there's no assertion in the loop, the loop machine code is:

.LBB0_3:
        leaq    1(%rdi), %r15
        callq   *%r14
        movq    %r15, %rdi
        cmpq    %r15, %rbx
        jne     .LBB0_3

(one conditional branch which is exit from loop on iteration variable).

Also removed `as_*` assertion since `usize` is guaranteed to be `u32` or larger in Bevy.
@bushrat011899
Copy link
Contributor Author

Thanks for the detailed testing @stepancheg! I've updated this PR to use assert instead of debug_assert since I am now confident that there is no practical performance difference.

Also removed `as_*` assertion since `usize` is guaranteed to be `u32` or larger in Bevy.
@stepancheg
Copy link
Contributor

@bushrat011899 can you please add assertions to the loops as proposed by @SkiFire13? Perhaps as a separate PR.

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Dec 1, 2023
Added `assert` in `QueryIter::fold_over_table_range` out of the loop to hint compiler on optimisation.
@bushrat011899
Copy link
Contributor Author

@bushrat011899 can you please add assertions to the loops as proposed by @SkiFire13? Perhaps as a separate PR.

Had to resolve a merge conflict anyway, so I've added in 1 assert statement outside a for-loop where I believe it would still be appropriate.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Dec 5, 2023
Merged via the queue into bevyengine:main with commit 72adf2a Dec 5, 2023
22 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Apr 2, 2024
# Objective

- Since #10811,Bevy uses `assert `in the hot path of iteration. The
`for_each `method has an assert in the outer loop to help the compiler
remove unnecessary branching in the internal loop.
- However , ` for` style iterations do not receive the same treatment.
it still have a branch check in the internal loop, which could
potentially hurt performance.

## Solution

- use `TableRow::from_u32 ` instead of ` TableRow::from_usize` to avoid
unnecessary branch.

Before


![image](https://github.com/bevyengine/bevy/assets/45868716/f6d2a1ac-2129-48ff-97bf-d86713ddeaaf)



After
----------------------------------------------------------------------------


![image](https://github.com/bevyengine/bevy/assets/45868716/bfe5a9ee-ba6c-4a80-85b0-1c6d43adfe8c)
chompaa pushed a commit to chompaa/bevy that referenced this pull request Apr 5, 2024
# Objective

- Since bevyengine#10811,Bevy uses `assert `in the hot path of iteration. The
`for_each `method has an assert in the outer loop to help the compiler
remove unnecessary branching in the internal loop.
- However , ` for` style iterations do not receive the same treatment.
it still have a branch check in the internal loop, which could
potentially hurt performance.

## Solution

- use `TableRow::from_u32 ` instead of ` TableRow::from_usize` to avoid
unnecessary branch.

Before


![image](https://github.com/bevyengine/bevy/assets/45868716/f6d2a1ac-2129-48ff-97bf-d86713ddeaaf)



After
----------------------------------------------------------------------------


![image](https://github.com/bevyengine/bevy/assets/45868716/bfe5a9ee-ba6c-4a80-85b0-1c6d43adfe8c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide C-Code-Quality A section of code that is hard to understand or change S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change TableRow::new to accept u32
5 participants