GPU rigidbodies exhibit strange behavior and can be indeterministic #99

proc-sim · 2023-02-22T15:01:07Z

proc-sim
Feb 22, 2023

I'm testing CUDA-accelerated sims in PhysX 5.1. I believe I've set all the proper sim flags and I've confirmed everything is running on the GPU as expected but the results I'm getting can be pretty unstable and unpredictable.

Firstly, there are many cases where GPU results don't even come close to matching CPU results. GPU sims often exhibit large, random impulse forces and unstable angular momentum. Since PhysX GPU is a black box in a DLL, I have no way of figuring out why this is occurring...best I've tried to do is setting all GPU memory variables to 8x their default, to ensure all contacts are being processed...but this doesn't seem to have much of an effect.

Here are some examples:

In this first example, 200 boxes are initialized in an inter-penetrating state and after 30 frames converted from kinematic to dynamic. All simulation settings are identical between all examples, except for the relevant CPU/GPU and PGS/TGS solver switches. Notice how much more explosive the GPU sim is (might be hard to see but the GPU velocities generated are almost double the CPU PGS counterparts), and in the TGS GPU sim, many boxes spin erratically at the end without coming to a stop:

pgs_tgs.mp4

In this second example, you can see as I refresh this GPU rigidbody simulation (note the blue progress bar on the bottom) that the simulation is indeterministic and its end state is different nearly every time. This indeterminism is not present in the CPU version of the same simulation. In my experiments, this indeterminism shows up in basically all GPU sims with a moderate number of rigidbodies (1000+ rigidbodies), but is not always present in smaller simulations. This is with 8x the default PxgDynamicsMemoryConfig values, as mentioned above, so I doubt contacts are being lost.

indeterm.mp4

These are just some basic examples...but the problems seem to permeate all GPU setups of moderate complexity in my tests. Is this behavior expected for GPU sims? I was under the impression the GPU solver was just a faster way of doing the same thing that the CPU solver is doing, but it seems like the differences between the two are not superficial...

preist-nvidia · 2023-02-23T09:30:27Z

preist-nvidia
Feb 23, 2023
Maintainer

Hi @Tysoni - thank you for the detailed report. Our goal is to provide determinacy for rigid-body simulation - the results are expected to vary between compute platforms, but two GPU simulations running identical API calls should be deterministic. I assume you are adding the cube actors in the same order each time and into an empty scene - if not, you should check out the enhanced determinism scene flag.

I am not 100% sure if initial interpenetration is expected to break determinacy, however, and have to double-check with the team here.

Lastly, the CPU vs GPU behavior differences for TGS look not ok. Would you happen to have a PhysX snippet with the setup for reproducing these results that we could look at?

0 replies

PierreTerdiman · 2023-02-24T17:17:18Z

PierreTerdiman
Feb 24, 2023
Collaborator

I made a test with just some initially overlapping boxes and so far:

a) they all pretty much behave the same, in a similar "explosive" way. One way to avoid this initial explosion is to use the "max depenetration velocity" on the rigid bodies. In any case for me GPU / CPU / TGS / PGS all explode in similar ways.

b) the initial penetration does not break determinism in any of the combinations either.

I didn't start from kinematic bodies though so maybe it's the switch from kinematic to dynamic bodies that breaks something.

0 replies

proc-sim · 2023-02-24T17:24:40Z

proc-sim
Feb 24, 2023
Author

Thanks for testing this @PierreTerdiman.

I'm currently working on trying to reproduce the behavior I'm seeing in a simpler snippet. Currently my PhysX implementation is in tyFlow which is a plugin for 3ds Max, which is a pretty complicated framework on its own...so it's tricky breaking it down into a simpler example while capturing all of the same variables and same resulting artifacts.

I don't think the kinematic to dynamic switch is the cause...I see the same issue without that switch. I only added it so it would be easier to see the beginning configuration in the video - otherwise it would only be displayed for 1 frame.

I'll keep this thread updated with my results and hopefully a reproducible snippet soon...

0 replies

proc-sim · 2023-02-26T05:23:00Z

proc-sim
Feb 26, 2023
Author

@preist-nvidia @PierreTerdiman

Ok, after more than a dozen hours of testing over the last few days, I have managed to track down the cause of the behavior (the indeterminism, at least) and create a reproducible example. I've logged it as a bug here:

#102

This bug is clearly responsible for the indeterminism I was experiencing...but I'm not sure if it's responsible for the discrepancy between solver velocities. It does tend to introduce gigantic velocities into the sim in some configurations (even NaNs), so it's possible that in my earlier tests in this thread it was responsible for the faster moving CUDA ribidbodies. I will follow up with my findings once it's fixed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU rigidbodies exhibit strange behavior and can be indeterministic #99

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

GPU rigidbodies exhibit strange behavior and can be indeterministic #99

proc-sim Feb 22, 2023

Replies: 4 comments

preist-nvidia Feb 23, 2023 Maintainer

PierreTerdiman Feb 24, 2023 Collaborator

proc-sim Feb 24, 2023 Author

proc-sim Feb 26, 2023 Author

proc-sim
Feb 22, 2023

preist-nvidia
Feb 23, 2023
Maintainer

PierreTerdiman
Feb 24, 2023
Collaborator

proc-sim
Feb 24, 2023
Author

proc-sim
Feb 26, 2023
Author