Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid code does not work with large l quantum numbers, especially with stress tensor #1785

Closed
fstein93 opened this issue Dec 8, 2021 · 6 comments · Fixed by #2787
Closed

Comments

@fstein93
Copy link
Contributor

fstein93 commented Dec 8, 2021

The grid code breaks on the GPU for larger l quantum numbers (especially with RI basis sets). In general, the grid code complains about not enough shared memory and crushes on purpose. Additional files on request. I work on Piz Daint, P100.
With stress tensor calculations, this issue occurs already with smaller angular quantum numbers.

@fstein93
Copy link
Contributor Author

fstein93 commented Dec 9, 2021

Until this issue will be fixed: A possible workaround is to enforce the use of the REF backend by setting
&GLOBAL
...Your other options...
&GRID
BACKEND REF
&END GRID
&END GLOBAL

@annahehn
Copy link
Contributor

annahehn commented May 3, 2022

Thanks a lot Alfio, I will try!

@oschuett
Copy link
Member

oschuett commented May 4, 2022

The grid GPU backend always processes entire atom pairs at once. For very large basis sets it thereby runs out of memory. The solution is to split the work into smaller tasks.

Even if one could magically get past the memory problem, the performance for large basis sets would still be terrible because the loops are currently only unrolled up to lp <= 6 to reduce register pressure. The solution is to create multiple specialized kernels.

I plan to address both shortcomings via a larger refactoring towards the end of this year.

@juerghutter
Copy link
Contributor

juerghutter commented May 4, 2022 via email

@oschuett
Copy link
Member

oschuett commented May 4, 2022

Is this problem only related to the GPU version of the code?

Yes, the above limitations for large basis-sets apply only to the GPU backend. The CPU backend also has a limit, but it can be easily raised.

Optimization is of course important, but definitely comes after functionality.

Well, it's optimization of the GPU usage ;-)

Currently, I'm focusing on getting the tensor code ready for the LUMI pilot, but I'm happy to prioritize the grid GPU code instead.

@mtaillefumier
Copy link
Contributor

the GPU code runs out of shared memory because all coefficients for a given gaussian pair are stored in shared memory. Shared memory is also used to compute other things as well.

So for large l we can run out of shared memory relatively easily. The GPU backend suffers more from this than the hip backend (mostly because the hip backend splits the computation of the coefficients and the collocation / integration).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants