Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meshlet remove per-cluster data upload #13125

Conversation

JMS55
Copy link
Contributor

@JMS55 JMS55 commented Apr 28, 2024

Objective

  • Per-cluster (instance of a meshlet) data upload is ridiculously expensive in both CPU and GPU time (8 bytes per cluster, millions of clusters, you very quickly run into PCIE bandwidth maximums, and lots of CPU-side copies and malloc).
  • We need to be uploading only per-instance/entity data. Anything else needs to be done on the GPU.

Solution

  • Per instance, upload:
    • meshlet_instance_meshlet_counts_prefix_sum - An exclusive prefix sum over the count of how many clusters each instance has.
    • meshlet_instance_meshlet_slice_starts - The starting index of the meshlets for each instance within the meshlets buffer.
  • A new fill_cluster_buffers pass once at the start of the frame has a thread per cluster, and finds its instance ID and meshlet ID via a binary search of meshlet_instance_meshlet_counts_prefix_sum to find what instance it belongs to, and then uses that plus meshlet_instance_meshlet_slice_starts to find what number meshlet within the instance it is. The shader then writes out the per-cluster instance/meshlet ID buffers for later passes to quickly read from.
  • I've gone from 45 -> 180 FPS in my stress test scene, and saved ~30ms/frame of overall CPU/GPU time.

@JMS55 JMS55 mentioned this pull request Apr 28, 2024
40 tasks
@JMS55 JMS55 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Apr 28, 2024
@JMS55 JMS55 added this to the 0.14 milestone Apr 28, 2024
@JMS55 JMS55 requested a review from IceSentry April 30, 2024 02:04
@JMS55 JMS55 requested a review from superdump May 1, 2024 06:07
@JMS55 JMS55 requested review from Elabajaba and pcwalton and removed request for pcwalton May 3, 2024 04:52
Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand everything, but the code works, doesn't affect anything unrelated and is way faster than main. So LGTM

Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite ingenious, it took me a bit to understand how this worked

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label May 4, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue May 4, 2024
Merged via the queue into bevyengine:main with commit 77ebabc May 4, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants