-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grid code does not work with large l quantum numbers, especially with stress tensor #1785
Comments
Until this issue will be fixed: A possible workaround is to enforce the use of the REF backend by setting |
Thanks a lot Alfio, I will try! |
The grid GPU backend always processes entire atom pairs at once. For very large basis sets it thereby runs out of memory. The solution is to split the work into smaller tasks. Even if one could magically get past the memory problem, the performance for large basis sets would still be terrible because the loops are currently only unrolled up to I plan to address both shortcomings via a larger refactoring towards the end of this year. |
Is this problem only related to the GPU version of the code?
In any case we need the grid code to work for all use cases (GPU/CPU, basis set sizes, distributed/replicated grids,
general/orthorhombic grids).
Optimization is of course important, but definitely comes after functionality.
…________________________________________
From: Ole Schütt ***@***.***>
Sent: Wednesday, May 4, 2022 11:55 AM
To: cp2k/cp2k
Cc: Subscribed
Subject: Re: [cp2k/cp2k] Grid code does not work with large l quantum numbers, especially with stress tensor (Issue #1785)
The grid GPU backend always processes entire atom pairs at once. For very large basis sets it thereby runs out of memory<https://github.com/cp2k/cp2k/blob/532c618f7484ce7740d11b370a85beb5e63a9c9a/src/grid/gpu/grid_gpu_integrate.cu#L469>. The solution is to split the work it smaller tasks.
And even if one could magically get past the memory problem, the performance for large basis sets would still be terrible because the loops are currently only unrolled up to lp <= 6<https://github.com/cp2k/cp2k/blob/532c618f7484ce7740d11b370a85beb5e63a9c9a/src/grid/gpu/grid_gpu_collocate.cu#L156> to reduce register pressure. The solution is to create multiple specialized kernels.
I plan to address both short comings via a larger refactoring towards the end of this year.
—
Reply to this email directly, view it on GitHub<#1785 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2WEUSGXO57U3YTUVFAFV3VIJCQXANCNFSM5JT5RUMA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Yes, the above limitations for large basis-sets apply only to the GPU backend. The CPU backend also has a limit, but it can be easily raised.
Well, it's optimization of the GPU usage ;-) Currently, I'm focusing on getting the tensor code ready for the LUMI pilot, but I'm happy to prioritize the grid GPU code instead. |
the GPU code runs out of shared memory because all coefficients for a given gaussian pair are stored in shared memory. Shared memory is also used to compute other things as well. So for large l we can run out of shared memory relatively easily. The GPU backend suffers more from this than the hip backend (mostly because the hip backend splits the computation of the coefficients and the collocation / integration). |
The grid code breaks on the GPU for larger l quantum numbers (especially with RI basis sets). In general, the grid code complains about not enough shared memory and crushes on purpose. Additional files on request. I work on Piz Daint, P100.
With stress tensor calculations, this issue occurs already with smaller angular quantum numbers.
The text was updated successfully, but these errors were encountered: