Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize extract_clusters and prepare_clusters systems #10633

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

rafalh
Copy link
Contributor

@rafalh rafalh commented Nov 18, 2023

Objective

When developing my game I realized extract_clusters and prepare_clusters systems are taking a lot of time despite me creating very little lights. Reducing number of clusters from the default 4096 to 2048 or less greatly improved performance and stabilized FPS (~300 -> 1000+). I debugged it and found out that the main reason for this is cloning VisiblePointLights in extract_clusters system. It contains light entities grouped by clusters that they affect. The problem is that we clone 4096 (assuming the default clusters configuration) vectors every frame. If many of them happen to be non-empty it starts to be a bottleneck because there is a lot of heap allocation. It wouldn't be a problem if we reused those vectors in following frames but we don't.

Solution

Avoid cloning multiple vectors and instead build a single vector containing data for all clusters.

I've recorded a trace in 3d_scene example with disabled v-sync before and after the change.
Mean FPS went from 424 to 990. Mean time for extract_clusters system was reduced from 210 us to 24 us and prepare_clusters from 189 us to 87 us.

image


Changelog

  • Improved performance of extract_clusters and prepare_clusters systems for scenes where lights affect a big part of it.

@rafalh rafalh force-pushed the extract-clusters-opt branch 2 times, most recently from fa3c8af to 5526e88 Compare November 18, 2023 16:51
@JMS55 JMS55 requested a review from superdump November 18, 2023 18:35
@ItsDoot ItsDoot added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Nov 19, 2023
@JMS55 JMS55 added this to the 0.13 milestone Nov 23, 2023
@JMS55 JMS55 self-requested a review November 23, 2023 05:04
Avoid cloning VisiblePointLights struct in extract_clusters because
depending on scene it may result in cloning of thousands non-empty vectors
which ends in a lot of heap allocations and memory copying every frame.
Instead build allocate a single vector and fill it with data from all
source vectors which is much faster.
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run this on some of our more intensive stress test examples? In this particular case, many_lights might be a better test than 3d_scene.

@rafalh
Copy link
Contributor Author

rafalh commented Nov 27, 2023

Can you run this on some of our more intensive stress test examples? In this particular case, many_lights might be a better test than 3d_scene.

Here you are - tracy comparison for many_lights:
image

I showed data for 3d_scene first (which may be not something that we want to optimize) because it has small amount of lights (just 1) that has entire scene in range. As I understand it it is a pessimistic case where nearly all clusters are in range of a point light causing a lot of Vec allocations.
In many_lights example we have a lot of very small lights (radius = 0.3) so most of the clusters are empty and in such case cloning vectors is not a bottleneck.
I am not sure how much this optimization would affect real projects. I think it may affect some with indoor areas or if someone tries to model sun as a point light. But even if it was for simple examples like 3d_scene I think it is worth merging because of high impact on the FPS. Low FPS (when compared to other engines) in simple scenes may discourage people from using Bevy.

Copy link
Contributor

@kristoff3r kristoff3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have hit this is as well multiple times, so it would be great to get it fixed.

I read this pretty thoroughly and it looks correct and like a good approach to me.

Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good find!

@rparrett rparrett added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 25, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2024
@alice-i-cecile
Copy link
Member

CI failure appears to be a network error; retrying the merge.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 29, 2024
Merged via the queue into bevyengine:main with commit 16ce8c6 Jan 29, 2024
23 checks passed
tjamaan pushed a commit to tjamaan/bevy that referenced this pull request Feb 6, 2024
)

# Objective

When developing my game I realized `extract_clusters` and
`prepare_clusters` systems are taking a lot of time despite me creating
very little lights. Reducing number of clusters from the default 4096 to
2048 or less greatly improved performance and stabilized FPS (~300 ->
1000+). I debugged it and found out that the main reason for this is
cloning `VisiblePointLights` in `extract_clusters` system. It contains
light entities grouped by clusters that they affect. The problem is that
we clone 4096 (assuming the default clusters configuration) vectors
every frame. If many of them happen to be non-empty it starts to be a
bottleneck because there is a lot of heap allocation. It wouldn't be a
problem if we reused those vectors in following frames but we don't.

## Solution

Avoid cloning multiple vectors and instead build a single vector
containing data for all clusters.

I've recorded a trace in `3d_scene` example with disabled v-sync before
and after the change.
Mean FPS went from 424 to 990. Mean time for `extract_clusters` system
was reduced from 210 us to 24 us and `prepare_clusters` from 189 us to
87 us.


![image](https://github.com/bevyengine/bevy/assets/160391/ab66aa9d-1fa7-4993-9827-8be76b530972)

---

## Changelog

- Improved performance of `extract_clusters` and `prepare_clusters`
systems for scenes where lights affect a big part of it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants