Skip to content

Conversation

@alice-i-cecile
Copy link
Member

Please feel free to amend the technical descriptions liberally! I wasn't sure about the right level of detail to provide.

Fixes #1995.

Copy link
Contributor

@JMS55 JMS55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the level of technical detail, I think it strikes a good balance between interesting and understandable.

In many cases, this is the overwhelming majority of objects: level geometry and props are not typically moving around each frame!
We're now propagating a "dirty bit" up the hierarchy towards ancestors; allowing transform propagation can ignore entire subtrees of the hierarchy if they encounter an entity without the dirty bit.

The results speak for themselves: taken together, our testing on the incredibly beefy [Caldera Hotel] from Call of Duty: Warzone shows that transform propagation took 1.1 ms in 0.15, and 0.1 ms after these changes in 0.16.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note that Caldera is entirely static.

Tbh we should probably check how much perf impact we had on entirely dynamic scenes...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can test many_foxes or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is significantly faster thanks to parallel transform propagation.

Copy link
Member

@aevyrie aevyrie Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3-4x faster was observed on caldera in the parallel propagation PR - this is without static subtree optimizations, with all transforms being recomputed every frame.

3-10x faster was observed using the hierarchy stress tests: bevyengine/bevy#17840 (comment)

The static optimizations did regress some benchmarks, but they were still either on par or faster than 0.15.

alice-i-cecile and others added 2 commits March 30, 2025 01:47
Co-authored-by: JMS55 <47158642+JMS55@users.noreply.github.com>
@alice-i-cecile alice-i-cecile requested a review from JMS55 March 30, 2025 05:53
Co-authored-by: JMS55 <47158642+JMS55@users.noreply.github.com>

The results speak for themselves: taken together, our testing on the incredibly beefy [Caldera Hotel] from Call of Duty: Warzone shows that transform propagation took 1.1 ms in 0.15, and 0.1 ms after these changes in 0.16.
While that's an impressive 11x performance improvement, the absolute magnitude of the time saved is the key metric.
With about 16 ms per frame at 60 FPS, that's 6% of your *entire* game's CPU budget saved, making huge open worlds or incredibly complex CAD assemblies more viable than ever before.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers are more impressive when you consider that for most users, it was taking a much, much longer time, like, 4ms on 0.15. The M4 Max is just stupidly fast. In pcwalton's screenshots, it was taking up ~1/4 of the frame time, which is what really motivated the work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to find a more representative profile comparison on caldera from 0.15, there were so many optimizations, and the aggregate difference is seriously impressive.

@alice-i-cecile alice-i-cecile force-pushed the faster-transform-prop branch from 11d070f to 0109b9f Compare April 1, 2025 04:02
@alice-i-cecile alice-i-cecile requested a review from aevyrie April 1, 2025 04:04
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Apr 1, 2025
Merged via the queue into bevyengine:main with commit 044752c Apr 1, 2025
10 checks passed
@alice-i-cecile alice-i-cecile deleted the faster-transform-prop branch April 1, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write release notes for PR #17840: Parallel Transform Propagation

3 participants