-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage #186
Comments
Just adding some more information: here it is what happens when I use HyMD on Saga (running with 10 MPI processes - plot created using memory-profiler) These are the packages that I load:
And these are the packages installed in my Python environment:
Not sure if this is the problem, but some people reported leaks with HDF5 1.12.1, and a workaround is given here. In my laptop, the leak is not so brutal. But different packages and Python versions (even though HDF5 is in the same version), and I also ran shorter simulations with fewer threads. But still, there seems to be a small leak somewhere. Tracemalloc shows a lot of stuff being deallocated and allocated in each MD step, so debugging this will be a bit trickier than I expected. |
Nice graph! 😅 |
Unfortunately, it's not available.. but I'll check if there's an easybuild for it :) |
Maybe this software might be useful? |
This is the profiling of the main loop for about 2000 steps running on my machine using Blackfire: https://blackfire.io/profiles/4b4e2cd8-6985-49a3-97d5-d6925326f515/graph The memory profiling seems to point to |
Which makes sense since from the Blackfire report it seemed the problem was about an dt = MPI.BYTE.Create_contiguous(itemsize)
dt.Commit()
dtype = numpy.dtype((data.dtype, data.shape[1:]))
recvbuffer = numpy.empty(self.recvlength, dtype=dtype, order='C')
self.comm.Barrier()
# now fire
rt = self.comm.Alltoallv((buffer, (self.sendcounts, self.sendoffsets), dt),
(recvbuffer, (self.recvcounts, self.recvoffsets), dt))
dt.Free()
self.comm.Barrier() Maybe you don't need to assign the variable |
That was the first thing I tried when I saw that call, but unfortunately removing the I'm trying a different OpenMPI version (I'm having to recompile some stuff) to see if that fixes the problem. There were some fixes in OpenMPI 4.1.2 onwards concerning leaks. |
More evidence points to this OpenMPI version as the culprit of the leak. Running The full flamegraph is in this .HTML I opened a ticket with the sigma2 people requesting a newer toolchain with a more recent OpenMPI. |
👋 Hi @hmcezar, I am one of the authors of Thanks a lot for your consideration and for helping us improve the profiler :) |
When running simulations for a lot of steps, it looks like there are memory leaks which make the memory usage increase with time.
We should use
tracemalloc
and other tools to see how we can fix this problem.The text was updated successfully, but these errors were encountered: