-
Notifications
You must be signed in to change notification settings - Fork 416
Faster transform propagation release notes #2041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster transform propagation release notes #2041
Conversation
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Outdated
Show resolved
Hide resolved
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Show resolved
Hide resolved
JMS55
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the level of technical detail, I think it strikes a good balance between interesting and understandable.
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Outdated
Show resolved
Hide resolved
| In many cases, this is the overwhelming majority of objects: level geometry and props are not typically moving around each frame! | ||
| We're now propagating a "dirty bit" up the hierarchy towards ancestors; allowing transform propagation can ignore entire subtrees of the hierarchy if they encounter an entity without the dirty bit. | ||
|
|
||
| The results speak for themselves: taken together, our testing on the incredibly beefy [Caldera Hotel] from Call of Duty: Warzone shows that transform propagation took 1.1 ms in 0.15, and 0.1 ms after these changes in 0.16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should note that Caldera is entirely static.
Tbh we should probably check how much perf impact we had on entirely dynamic scenes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can test many_foxes or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is significantly faster thanks to parallel transform propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3-4x faster was observed on caldera in the parallel propagation PR - this is without static subtree optimizations, with all transforms being recomputed every frame.
3-10x faster was observed using the hierarchy stress tests: bevyengine/bevy#17840 (comment)
The static optimizations did regress some benchmarks, but they were still either on par or faster than 0.15.
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Outdated
Show resolved
Hide resolved
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Outdated
Show resolved
Hide resolved
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Outdated
Show resolved
Hide resolved
release-content/0.16/release-notes/17840_Parallel_Transform_Propagation.md
Show resolved
Hide resolved
|
|
||
| The results speak for themselves: taken together, our testing on the incredibly beefy [Caldera Hotel] from Call of Duty: Warzone shows that transform propagation took 1.1 ms in 0.15, and 0.1 ms after these changes in 0.16. | ||
| While that's an impressive 11x performance improvement, the absolute magnitude of the time saved is the key metric. | ||
| With about 16 ms per frame at 60 FPS, that's 6% of your *entire* game's CPU budget saved, making huge open worlds or incredibly complex CAD assemblies more viable than ever before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers are more impressive when you consider that for most users, it was taking a much, much longer time, like, 4ms on 0.15. The M4 Max is just stupidly fast. In pcwalton's screenshots, it was taking up ~1/4 of the frame time, which is what really motivated the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to find a more representative profile comparison on caldera from 0.15, there were so many optimizations, and the aggregate difference is seriously impressive.
11d070f to
0109b9f
Compare
Please feel free to amend the technical descriptions liberally! I wasn't sure about the right level of detail to provide.
Fixes #1995.