-
-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite fall memory fix for #14 #30
Infinite fall memory fix for #14 #30
Conversation
SAPRegions which became empty in the last frame need to be updated one more time in order to remove the last proxy.
Since the CI benchmark failed and I'm not quite sure how to rerun it or what the issue is, I ran a few of the benchmarks locally with the feature |
@@ -640,11 +651,18 @@ impl BroadPhase { | |||
} | |||
} | |||
|
|||
fn update_regions(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought/question about that: it seems that currently as soon as deleted_any
is true, you go through all regions to check which need update and which are to be discarded. It seems a good start, but isn't there a scalability issue for a large world in which many regions are not having any change, and a small set are changing a lot? An alternative could be to collect regions that need update or deletion, and touch only these. As I do not have the full design in mind, maybe there is a trivial reason why that would not work, but when I saw this code, this question came to my mind, so I wrote it down here, in case it can be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am misunderstanding you, I don't believe this change adds a major scalability issue, at least from an algorithmic viewpoint. The previous version and the version with my changes both iterate over all the regions, which is of course O(#of regions)
. Perhaps you mean that we could avoid making multiple passes? This is probably correct, but it would require a mutate-while-iterating pattern which is usually tricky to do in Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed your change does not change complexity, I put a note here because while reading this code I realised this potential scalability problem, that was present already. Sorry, I should have commented the original code instead. I now created an issue with that thought.
What came to my mind for this note was that, assuming we only process objects that are awake, if a large world has a lot of objects and only a subset is awake, we still process all regions. This is in contrast to a tree structure that keeps note whether there are any awake object in its sub-tree, that could have some better scalability properties. But it seems that it is an optimisation for later probably...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the double loop, we could avoid it by storing the points (keys) of regions with 0 proxies into a temporary vector, and then iterate this vector and delete the corresponding entries from the regions
hashmap. I am not sure that it would be more performant, I guess it depends on the ratio of regions with 0 proxies compared to the total number of regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! Sorry I misunderstood. That's definitely a good thought in my opinion.
Thank you for this PR! On this PR's benchmark run, two graphs stand out: The first one (the Boxes scenario) is really odd because it means that this changes cause the boxes to fall asleep sooner than before. We could see this as being an improvement, though I will need to investigate exactly why this happens to make sure we are not just breaking something. The second graph shows that there appear to be a small performance regression for the Ball-Joint scenario. This scenario is interesting because it should result on lots of broad-phase cells to be freed, and then reallocated later, because it simulates some kind of tissue hanging (left: initial configuration, right: during the simulation): So maybe we are paying the price of the additional allocations here? This is something worth investigating more closely as well as this may be a case where this automatic deallocation behavior is not desired (just like @stephanemagnenat suggested in #14 (comment)). |
I think that something to keep in mind here is that the initial cost of allocation/deallocation of regions should be fairly cheap for the reason I mentioned above. I have a feeling the performance regressions we are seeing in the jointBall scenario come after reallocation of regions, when the vectors inside newly allocated regions need to be resized to add more elements. Therefore I think we should try having a pool of "unused" regions that have been moved out of the region A simpler option would be to just allocate the vectors with larger initial capacity, but this does not avoid the initial memory allocations and adds memory usage to simulations which may not need it. I could be completely wrong though, as my changes do cause a few more region updates to happen. We could just be seeing the effects of those. Perhaps deallocation is actually more costly than I'm realizing. I'll try out option 1 soon and see if it helps. For the boxes scenario, I'll double check that there isn't any obvious visual difference in simulation playback. |
Fix merge conflict resulting from "axii" spelling correction
Hello! I tried out my region pool idea, and it seems to improve performance slightly on a few of the benchmarks, so I went ahead and pushed the changes here. I am still unable to reproduce the issues showing up in the public benchmarks. When I run them on my machine, I get these results (averaged over 10 runs to reduce noise): Boxes benchmarkJoint Ball benchmarkHere are a few benchmarks where the region pool appears to slightly help performance: Joint Revolute benchmarkKeva Tower benchmarkHere, the green lines are the current code on this branch, the blue lines are the code before adding the region pool, and the yellow lines are the current master. @sebcrozet, would you trigger the benchmarks to run again for this PR, please? I think we should make sure the initial run was not some weird fluke. I noticed you merged a fix for a determinism bug recently. I haven't looked at it in any amount of depth, but perhaps that could be the cause of the discrepancy? Also, do you think it would be a good idea to include some sort of cap on the number of empty regions stored in the pool to avoid hanging on to too much memory? Perhaps regions in the pool could be deallocated if they are unused for a certain number of timesteps. This could eventually be made configurable by the user, similar to what @stephanemagnenat suggested. |
@robert-hrusecky Unfortunately I'm having some ramdom issues with my benchmarking hardware recently (which is moderately old, I may have to change it completely or switch to a VPS) so I can't start a new benchmark now. I will probably investigate this hardware problem next week. |
@robert-hrusecky Alright, I changed my benchmarking server. I tried with a VPS (with reserved vcores) but performances differences between two benchmark runs were too significant for it to be useful. So I built a new home server, deleted all the previous benchmark results, and ran the benchmarks again for all tags in Now the only weird result I see in the benchmarks is that the objects appear to be sleeping sooner that before in the cubes demo. Though I am not sure why your modification would have this effect. One guess is that with your changes, some contacts are deleted sooner than before (because of the Anyways, this looks all OK for me. I don't think there is a reason to make this optional to the end-user considering it does not affect performances. Merging this now. |
This is just a simple de-allocation strategy that removes
SAPRegions
immediately when all proxies exit them. This appears to plug up the memory leak.Since the regions are stored in a HashMap, I have a suspicion that this may be all that we need performance wise. New allocations of regions should only need to do actual memory allocations when the HashMap internally resizes. So if the number of regions stays mostly constant, I expect to see little overhead.
I'm making this PR now mostly to get the benchmarks to run. If you are happy with it already, go ahead and merge it. I can also work on adding the opt-out-of-de-allocation feature discussed in #14.
Closes #14