Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sleeper exhibits performance cliff at >4,000,000 active bodies #284

Open
RossNordby opened this issue Aug 2, 2023 · 1 comment
Open

Comments

@RossNordby
Copy link
Member

RossNordby commented Aug 2, 2023

The IslandSleeper works by initiating a constraint graph traversal at candidate bodies, terminating when a full island has been found or an active body is encountered (thereby forcing the whole island active).

The traversal marks bodies and constraints as visited using a couple of locally allocated IndexSet instances. They are as large as the largest BodyHandle or ConstraintHandle. For a simulation with 8 million bodies, the visited bodies set would be a megabyte (1 bit per body).

While allocating that space isn't much of a problem, clearing it for each new traversal can be. This is for two reasons:

  1. By default, the sleeper tries to analyze 1% of all active bodies per timestep. When there are 8 million active bodies, there are 80,000 traversals per timestep, and so 80,000 clears.
  2. 80,000 * (~1e6 bytes per traversal) / 100e9 bytes per second of memory bandwidth = ~0.8 seconds.

In other words, the cost of the IslandSleeper is weakly quadratic.

The reason why it's not a problem at smaller sizes is that a locally allocated IndexSet can be held entirely in core local cache. The moment a traversal gets large enough to evict itself, you see the bandwidth bound cost. That's why a smaller simulation of, say, 2 million active bodies takes 0.008 seconds to run the sleeper, not 0.2 seconds.

This is not exactly a major near-term priority. While CPUs and the library are both improving, we're still at least a factor of 10 off from needing to worry about simulations of 8 million active bodies in real time use cases.

Something to consider later, or if we have a compelling offline use case. People testing ridiculous simulations for funsies might get confused, I guess.

As a workaround, disabling the sleeper or dramatically reducing the aggressiveness of the sleeper (IslandSleeper.TestedFractionPerFrame and friends) would work.

@RossNordby
Copy link
Member Author

Note for potential attempts at addressing this early: the IslandSleeper would benefit from a revamp. At the moment, it cannot multithread individual traversals, so even at more reasonable simulation scales, a single pile of 30,000 nearly-sleepy bodies can take a hefty chunk of frame time.

The changes required for a multithreaded traversal would also affect how body marking works, so any fix for the performance cliff alone might get obviated by later changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant