Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption in large scale hp-adaptive applications. #9598

Closed
marcfehling opened this issue Mar 2, 2020 · 5 comments
Closed

Memory consumption in large scale hp-adaptive applications. #9598

marcfehling opened this issue Mar 2, 2020 · 5 comments
Assignees

Comments

@marcfehling
Copy link
Member

While running scaling tests on hp-adaptive problems for my PhD project on our local supercomputer, I ran out of memory on large-scale scenarios. For example, this was the case for a problem with a size of about 1 billion degrees of freedom, for which I was occupying 32 nodes with 128GB of RAM each. Each node has two Intel Xeon E5 installed (2x12 cores, SMT enabled, 48 MPI processes per node).

I would like to investigate how the memory demand changes when we switch from DoFHandler to hp::DoFHandler and find out if there are any parts which blow up over time.

@peterrum
Copy link
Member

peterrum commented Mar 2, 2020

@marcfehling Have you had a look the number of non-zeros in the matrices? I would guess in 3D and high order this might be a problem.

@bangerth
Copy link
Member

bangerth commented Mar 2, 2020

Yes, @peterrum is right. For most problems with moderate polynomial degrees (say, 2 or 3), you end up with a few kB per unknown accumulated over all data structures. Let it be 3kB/unknowns, and then you'd be able to fit about 40M unknowns into 128 GB, or about 1.2B unknowns onto 32 such nodes. That's about the size you see.

But if you increase the polynomial degree, you may end up with substantially more memory in the matrix, and it would not surprise me if you can fit substantially less than 1B unknowns into your 32 nodes.

That shouldn't stop you from investigating, but your numbers don't seem outlandish to me.

@tjhei
Copy link
Member

tjhei commented Mar 4, 2020

(2x12 cores, SMT enabled, 48 MPI processes per node

just a side comment: I have not seen large performance increase when running 2 MPI ranks per physical core. Computation and memory heavy loads might be a little faster but anything involving MPI communication is a little slower (you have twice as many ranks and latency might go up?). You will be a lot more memory constrained this way. This assumes good process pinning if you run with one rank per physical core.

@kronbichler
Copy link
Member

I don't think there is much left to do here and leave this issue open - can you verify @marcfehling ? In 3D, one should really not use sparse matrices for polynomial degrees >= 3 because the coupling between unknowns is too dense, it only leads to disappointments. And regarding performance, even for p=2 one leaves a factor of 3-5 on the table against matrix-free methods.

@marcfehling
Copy link
Member Author

I don't think there is much left to do here and leave this issue open.

I agree with you @kronbichler. Especially since we have a matrixfree version running.

I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants