New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory consumption in large scale hp-adaptive applications. #9598
Comments
@marcfehling Have you had a look the number of non-zeros in the matrices? I would guess in 3D and high order this might be a problem. |
Yes, @peterrum is right. For most problems with moderate polynomial degrees (say, 2 or 3), you end up with a few kB per unknown accumulated over all data structures. Let it be 3kB/unknowns, and then you'd be able to fit about 40M unknowns into 128 GB, or about 1.2B unknowns onto 32 such nodes. That's about the size you see. But if you increase the polynomial degree, you may end up with substantially more memory in the matrix, and it would not surprise me if you can fit substantially less than 1B unknowns into your 32 nodes. That shouldn't stop you from investigating, but your numbers don't seem outlandish to me. |
just a side comment: I have not seen large performance increase when running 2 MPI ranks per physical core. Computation and memory heavy loads might be a little faster but anything involving MPI communication is a little slower (you have twice as many ranks and latency might go up?). You will be a lot more memory constrained this way. This assumes good process pinning if you run with one rank per physical core. |
I don't think there is much left to do here and leave this issue open - can you verify @marcfehling ? In 3D, one should really not use sparse matrices for polynomial degrees |
I agree with you @kronbichler. Especially since we have a matrixfree version running. I'm closing this issue. |
While running scaling tests on hp-adaptive problems for my PhD project on our local supercomputer, I ran out of memory on large-scale scenarios. For example, this was the case for a problem with a size of about 1 billion degrees of freedom, for which I was occupying 32 nodes with 128GB of RAM each. Each node has two Intel Xeon E5 installed (2x12 cores, SMT enabled, 48 MPI processes per node).
I would like to investigate how the memory demand changes when we switch from
DoFHandler
tohp::DoFHandler
and find out if there are any parts which blow up over time.The text was updated successfully, but these errors were encountered: