Grid code does not work with large l quantum numbers, especially with stress tensor #1785

fstein93 · 2021-12-08T14:21:05Z

The grid code breaks on the GPU for larger l quantum numbers (especially with RI basis sets). In general, the grid code complains about not enough shared memory and crushes on purpose. Additional files on request. I work on Piz Daint, P100.
With stress tensor calculations, this issue occurs already with smaller angular quantum numbers.

fstein93 · 2021-12-09T09:26:13Z

Until this issue will be fixed: A possible workaround is to enforce the use of the REF backend by setting
&GLOBAL
...Your other options...
&GRID
BACKEND REF
&END GRID
&END GLOBAL

annahehn · 2022-05-03T15:11:23Z

Thanks a lot Alfio, I will try!

oschuett · 2022-05-04T09:55:11Z

The grid GPU backend always processes entire atom pairs at once. For very large basis sets it thereby runs out of memory. The solution is to split the work into smaller tasks.

Even if one could magically get past the memory problem, the performance for large basis sets would still be terrible because the loops are currently only unrolled up to lp <= 6 to reduce register pressure. The solution is to create multiple specialized kernels.

I plan to address both shortcomings via a larger refactoring towards the end of this year.

juerghutter · 2022-05-04T11:52:51Z

Is this problem only related to the GPU version of the code? In any case we need the grid code to work for all use cases (GPU/CPU, basis set sizes, distributed/replicated grids, general/orthorhombic grids). Optimization is of course important, but definitely comes after functionality.

…

________________________________________ From: Ole Schütt ***@***.***> Sent: Wednesday, May 4, 2022 11:55 AM To: cp2k/cp2k Cc: Subscribed Subject: Re: [cp2k/cp2k] Grid code does not work with large l quantum numbers, especially with stress tensor (Issue #1785) The grid GPU backend always processes entire atom pairs at once. For very large basis sets it thereby runs out of memory<https://github.com/cp2k/cp2k/blob/532c618f7484ce7740d11b370a85beb5e63a9c9a/src/grid/gpu/grid_gpu_integrate.cu#L469>. The solution is to split the work it smaller tasks. And even if one could magically get past the memory problem, the performance for large basis sets would still be terrible because the loops are currently only unrolled up to lp <= 6<https://github.com/cp2k/cp2k/blob/532c618f7484ce7740d11b370a85beb5e63a9c9a/src/grid/gpu/grid_gpu_collocate.cu#L156> to reduce register pressure. The solution is to create multiple specialized kernels. I plan to address both short comings via a larger refactoring towards the end of this year. — Reply to this email directly, view it on GitHub<#1785 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2WEUSGXO57U3YTUVFAFV3VIJCQXANCNFSM5JT5RUMA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

oschuett · 2022-05-04T13:45:14Z

Is this problem only related to the GPU version of the code?

Yes, the above limitations for large basis-sets apply only to the GPU backend. The CPU backend also has a limit, but it can be easily raised.

Optimization is of course important, but definitely comes after functionality.

Well, it's optimization of the GPU usage ;-)

Currently, I'm focusing on getting the tensor code ready for the LUMI pilot, but I'm happy to prioritize the grid GPU code instead.

mtaillefumier · 2022-05-04T15:20:08Z

the GPU code runs out of shared memory because all coefficients for a given gaussian pair are stored in shared memory. Shared memory is also used to compute other things as well.

So for large l we can run out of shared memory relatively easily. The GPU backend suffers more from this than the hip backend (mostly because the hip backend splits the computation of the coefficients and the collocation / integration).

alazzaro mentioned this issue May 3, 2022

Offset grid error for RPA computations #2095

Closed

oschuett mentioned this issue Oct 14, 2022

Unable to build successfully with HIP backends #2349

Closed

oschuett mentioned this issue May 15, 2023

grid: Reduce GPU shared memory by processing Cab piecewise #2787

Merged

oschuett closed this as completed in #2787 May 16, 2023

oschuett mentioned this issue May 23, 2023

[grid] reduce shared memory usage #2793

Merged

oschuett mentioned this issue Jun 2, 2023

grid: For GPU backend store large Cab in global memory #2803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grid code does not work with large l quantum numbers, especially with stress tensor #1785

Grid code does not work with large l quantum numbers, especially with stress tensor #1785

fstein93 commented Dec 8, 2021

fstein93 commented Dec 9, 2021 •

edited

annahehn commented May 3, 2022

oschuett commented May 4, 2022 •

edited

juerghutter commented May 4, 2022 via email

oschuett commented May 4, 2022 •

edited

mtaillefumier commented May 4, 2022

Grid code does not work with large l quantum numbers, especially with stress tensor #1785

Grid code does not work with large l quantum numbers, especially with stress tensor #1785

Comments

fstein93 commented Dec 8, 2021

fstein93 commented Dec 9, 2021 • edited

annahehn commented May 3, 2022

oschuett commented May 4, 2022 • edited

juerghutter commented May 4, 2022 via email

oschuett commented May 4, 2022 • edited

mtaillefumier commented May 4, 2022

fstein93 commented Dec 9, 2021 •

edited

oschuett commented May 4, 2022 •

edited

oschuett commented May 4, 2022 •

edited