New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force garbage collection after running FUN. #124
Conversation
thanks that seems really helpful. |
How much does this affect performance? I seem to recall garbage collection can be pretty slow sometimes. Maybe this should be an option when calling MulticoreParam? |
Also, which is something I always wanted the investigate but never got around to do: is there a risk that garbage collection and finalizers in a child process break the shared memory and triggers memory copies? Was the GC written with forked processing in mind? Can it make things worse? Should there be a way to disable the GC in forks? |
@DarwinAwardWinner my feeling is that the coarse-grain granularity of bplapply makes the performance consequences of @HenrikBengtsson this seems orthogonal to the PR -- if finalizers or gc() in forked processes are a problem, then it's probably better to have an explicit garbage collection and expose these problems, than to rely on intermittent garbage collection / errors. |
Yes, I set A safer solution would be to not second-guess the GC and just cap the heap for workers. This might be possible for |
- set whether to force R garbage collection (expensive!) on every call to FUN() - change default behavior -- force only for MulticoreParam - improves #124
- set whether to force R garbage collection (expensive!) on every call to FUN() - change default behavior -- force only for MulticoreParam - improves #124 - only TRUE or FALSE allowed, with appropriate defaults in constructor
This provides one solution to the problem discussed today on Slack, to wit:
This should use 4 GB for
big
plus another 80 MB per worker, totalling to just under 5 GB. Indeed, my laptop reports about 6 GB RAM used afterbig
is constructed, consistent with a bit of OS overhead. However, running thebplapply
causes my laptop to go into swap, despite having a total of 16 GB RAM that should be more than enough to handle the 800 MB across all workers.I think the underlying problem is that each forked process believes that the entirety of the parent's heap is available. When allocations are made in one child, I assume that the affected space doesn't show up as being used in another child; rather, the pages are copied so that the second child can still use that space for its own allocations. This increases the overall memory usage as expected, but the real issue is that this slips past the garbage collector. Within each child, there is no reason to trigger the GC as - for all it knows - it has plenty of remaining memory in the heap to work with, so why bother? As a consequence, each worker uses a profligate amount of memory across repeated
FUN
calls, roughly equivalent to the size of the heap in the parent.This PR works towards a solution by forcing garbage collection at the end of each evaluation of
FUN
, which allows the code above to proceed without entering swap - total RAM usage hovers around ~7GB, consistent with the budgeting above. There is probably a better place to put this instead of.composeTry
; that was just for convenience. I could also imagine an additional generic to only perform garbage collection in certain parallelization contexts (e.g., forking only) and in a user-tunable manner.