Infinite fall memory fix for #14 #30

robert-hrusecky · 2020-10-06T19:40:31Z

This is just a simple de-allocation strategy that removes SAPRegions immediately when all proxies exit them. This appears to plug up the memory leak.

Since the regions are stored in a HashMap, I have a suspicion that this may be all that we need performance wise. New allocations of regions should only need to do actual memory allocations when the HashMap internally resizes. So if the number of regions stays mostly constant, I expect to see little overhead.

I'm making this PR now mostly to get the benchmarks to run. If you are happy with it already, go ahead and merge it. I can also work on adding the opt-out-of-de-allocation feature discussed in #14.

Closes #14

SAPRegions which became empty in the last frame need to be updated one more time in order to remove the last proxy.

robert-hrusecky · 2020-10-07T04:39:52Z

Since the CI benchmark failed and I'm not quite sure how to rerun it or what the issue is, I ran a few of the benchmarks locally with the feature simd-stable. It seems to have little impact on performance. Let me know what you think, thanks.

stephanemagnenat · 2020-10-07T07:43:35Z

src/geometry/broad_phase_multi_sap.rs

@@ -640,11 +651,18 @@ impl BroadPhase {
        }
    }

+    fn update_regions(&mut self) {


Just a thought/question about that: it seems that currently as soon as deleted_any is true, you go through all regions to check which need update and which are to be discarded. It seems a good start, but isn't there a scalability issue for a large world in which many regions are not having any change, and a small set are changing a lot? An alternative could be to collect regions that need update or deletion, and touch only these. As I do not have the full design in mind, maybe there is a trivial reason why that would not work, but when I saw this code, this question came to my mind, so I wrote it down here, in case it can be useful.

Unless I am misunderstanding you, I don't believe this change adds a major scalability issue, at least from an algorithmic viewpoint. The previous version and the version with my changes both iterate over all the regions, which is of course O(#of regions). Perhaps you mean that we could avoid making multiple passes? This is probably correct, but it would require a mutate-while-iterating pattern which is usually tricky to do in Rust.

Indeed your change does not change complexity, I put a note here because while reading this code I realised this potential scalability problem, that was present already. Sorry, I should have commented the original code instead. I now created an issue with that thought.

What came to my mind for this note was that, assuming we only process objects that are awake, if a large world has a lot of objects and only a subset is awake, we still process all regions. This is in contrast to a tree structure that keeps note whether there are any awake object in its sub-tree, that could have some better scalability properties. But it seems that it is an optimisation for later probably...

Regarding the double loop, we could avoid it by storing the points (keys) of regions with 0 proxies into a temporary vector, and then iterate this vector and delete the corresponding entries from the regions hashmap. I am not sure that it would be more performant, I guess it depends on the ratio of regions with 0 proxies compared to the total number of regions.

Ah yes! Sorry I misunderstood. That's definitely a good thought in my opinion.

sebcrozet · 2020-10-10T08:29:25Z

Thank you for this PR!
I triggered the benchmark runs manually because there seem to be an issue with our GitHub Action workflow. It seems that it does not work for pull-requests originating from forks.

On this PR's benchmark run, two graphs stand out:

The first one (the Boxes scenario) is really odd because it means that this changes cause the boxes to fall asleep sooner than before. We could see this as being an improvement, though I will need to investigate exactly why this happens to make sure we are not just breaking something.

The second graph shows that there appear to be a small performance regression for the Ball-Joint scenario. This scenario is interesting because it should result on lots of broad-phase cells to be freed, and then reallocated later, because it simulates some kind of tissue hanging (left: initial configuration, right: during the simulation):

So maybe we are paying the price of the additional allocations here? This is something worth investigating more closely as well as this may be a case where this automatic deallocation behavior is not desired (just like @stephanemagnenat suggested in #14 (comment)).

robert-hrusecky · 2020-10-10T19:14:13Z

I think that something to keep in mind here is that the initial cost of allocation/deallocation of regions should be fairly cheap for the reason I mentioned above. I have a feeling the performance regressions we are seeing in the jointBall scenario come after reallocation of regions, when the vectors inside newly allocated regions need to be resized to add more elements. Therefore I think we should try having a pool of "unused" regions that have been moved out of the region HashMap. Although these regions are empty, their internal vectors would hold onto any capacity generated during the simulation. When allocating, we can check the pool first for regions before allocating fresh ones. Kinda like a thrift store for memory allocations.

A simpler option would be to just allocate the vectors with larger initial capacity, but this does not avoid the initial memory allocations and adds memory usage to simulations which may not need it.

I could be completely wrong though, as my changes do cause a few more region updates to happen. We could just be seeing the effects of those. Perhaps deallocation is actually more costly than I'm realizing. I'll try out option 1 soon and see if it helps.

For the boxes scenario, I'll double check that there isn't any obvious visual difference in simulation playback.

robert-hrusecky · 2020-10-10T23:19:28Z

Hm... I am unable to reproduce the issue locally.

Fix merge conflict resulting from "axii" spelling correction

robert-hrusecky · 2020-11-03T00:44:03Z

Hello!

I tried out my region pool idea, and it seems to improve performance slightly on a few of the benchmarks, so I went ahead and pushed the changes here.

I am still unable to reproduce the issues showing up in the public benchmarks. When I run them on my machine, I get these results (averaged over 10 runs to reduce noise):

Boxes benchmark

Joint Ball benchmark

Here are a few benchmarks where the region pool appears to slightly help performance:

Joint Revolute benchmark

Keva Tower benchmark

Here, the green lines are the current code on this branch, the blue lines are the code before adding the region pool, and the yellow lines are the current master.

@sebcrozet, would you trigger the benchmarks to run again for this PR, please? I think we should make sure the initial run was not some weird fluke. I noticed you merged a fix for a determinism bug recently. I haven't looked at it in any amount of depth, but perhaps that could be the cause of the discrepancy?

Also, do you think it would be a good idea to include some sort of cap on the number of empty regions stored in the pool to avoid hanging on to too much memory? Perhaps regions in the pool could be deallocated if they are unused for a certain number of timesteps. This could eventually be made configurable by the user, similar to what @stephanemagnenat suggested.

sebcrozet · 2020-11-03T14:16:47Z

@sebcrozet, would you trigger the benchmarks to run again for this PR, please? I think we should make sure the initial run was not some weird fluke. I noticed you merged a fix for a determinism bug recently. I haven't looked at it in any amount of depth, but perhaps that could be the cause of the discrepancy?

@robert-hrusecky Unfortunately I'm having some ramdom issues with my benchmarking hardware recently (which is moderately old, I may have to change it completely or switch to a VPS) so I can't start a new benchmark now. I will probably investigate this hardware problem next week.

sebcrozet · 2020-11-10T15:57:55Z

@robert-hrusecky Alright, I changed my benchmarking server. I tried with a VPS (with reserved vcores) but performances differences between two benchmark runs were too significant for it to be useful. So I built a new home server, deleted all the previous benchmark results, and ran the benchmarks again for all tags in master. And I manually started the benchmark for your PR.

Now the only weird result I see in the benchmarks is that the objects appear to be sleeping sooner that before in the cubes demo. Though I am not sure why your modification would have this effect. One guess is that with your changes, some contacts are deleted sooner than before (because of the self.update_count = 2 resulting in some other contact pairs being correctly removed). And this may result in a different simulation result where things end up sleeping slightly sooner…

Anyways, this looks all OK for me. I don't think there is a reason to make this optional to the end-user considering it does not affect performances.

Merging this now.

robert-hrusecky added 6 commits October 5, 2020 18:42

Example to isolate memory leak issue

1e0b4dc

Simple fix: Always remove empty SAPRegions

d7ff082

Bug fix: newly empty regions not updating

c25c5c5

SAPRegions which became empty in the last frame need to be updated one more time in order to remove the last proxy.

Fix edge case

b614b3d

Fix corner case: exit on multiple axes

0c1b210

Merge branch 'master' into infinite_fall_memory

dd8e25b

stephanemagnenat reviewed Oct 7, 2020

View reviewed changes

stephanemagnenat mentioned this pull request Oct 7, 2020

Region processing is O(n) complexity #31

Open

robert-hrusecky added 3 commits October 29, 2020 18:09

Merge branch 'master' into infinite_fall_memory

bcec54e

Fix merge conflict resulting from "axii" spelling correction

Merge branch 'master' into infinite_fall_memory

8b81a3f

Add a cache of empty regions avoiding reallocation

1097c63

sebcrozet merged commit 2102aeb into dimforge:master Nov 10, 2020

sebcrozet mentioned this pull request Nov 10, 2020

Sap serialization: skip cache data #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite fall memory fix for #14 #30

Infinite fall memory fix for #14 #30

robert-hrusecky commented Oct 6, 2020

robert-hrusecky commented Oct 7, 2020

stephanemagnenat Oct 7, 2020 •

edited

Loading

robert-hrusecky Oct 7, 2020

stephanemagnenat Oct 7, 2020 •

edited

Loading

stephanemagnenat Oct 7, 2020 •

edited

Loading

robert-hrusecky Oct 7, 2020

sebcrozet commented Oct 10, 2020

robert-hrusecky commented Oct 10, 2020

robert-hrusecky commented Oct 10, 2020

robert-hrusecky commented Nov 3, 2020 •

edited

Loading

sebcrozet commented Nov 3, 2020 •

edited

Loading

sebcrozet commented Nov 10, 2020

Infinite fall memory fix for #14 #30

Infinite fall memory fix for #14 #30

Conversation

robert-hrusecky commented Oct 6, 2020

robert-hrusecky commented Oct 7, 2020

stephanemagnenat Oct 7, 2020 • edited Loading

Choose a reason for hiding this comment

robert-hrusecky Oct 7, 2020

Choose a reason for hiding this comment

stephanemagnenat Oct 7, 2020 • edited Loading

Choose a reason for hiding this comment

stephanemagnenat Oct 7, 2020 • edited Loading

Choose a reason for hiding this comment

robert-hrusecky Oct 7, 2020

Choose a reason for hiding this comment

sebcrozet commented Oct 10, 2020

robert-hrusecky commented Oct 10, 2020

robert-hrusecky commented Oct 10, 2020

robert-hrusecky commented Nov 3, 2020 • edited Loading

Boxes benchmark

Joint Ball benchmark

Joint Revolute benchmark

Keva Tower benchmark

sebcrozet commented Nov 3, 2020 • edited Loading

sebcrozet commented Nov 10, 2020

stephanemagnenat Oct 7, 2020 •

edited

Loading

stephanemagnenat Oct 7, 2020 •

edited

Loading

stephanemagnenat Oct 7, 2020 •

edited

Loading

robert-hrusecky commented Nov 3, 2020 •

edited

Loading

sebcrozet commented Nov 3, 2020 •

edited

Loading