[Fix] Eliminate unnecessary vkb CPU allocation on GPU path by Cstandardlib · Pull Request #7296 · deepmodeling/abacus-develop

Cstandardlib · 2026-04-28T15:55:57Z

Summary

Problem: vkb.create(nkb, npwx) in pseudopot_cell_vnl::init() always allocates a CPU-side ComplexMatrix, even on GPU path where it's never populated. GPU compute (getvnl()) writes directly to c_vkb/z_vkb (GPU buffers). The only useful artifact was .nc (column dimension = npwx) used as leading dimension in gemm/gemv.
Impact: Wastes nkb × npwx × 16 bytes of CPU memory (~3.2 GB for large systems).
Fix: Skip vkb.create() on GPU path, store dimension in vkbnc member, add lazy-allocation guard for GPU Velocity path.

Changes

File	Change
`vnl_pw.h`	Add `int vkbnc = 0` public member
`vnl_pw.cpp`	Guard `vkb.create()` behind `!use_gpu_`, set `vkbnc = npwx`
`op_pw_nl.cpp`	4× `vkb.nc` → `vkbnc`
`hamilt_pw.cpp`	4× `vkb.nc` → `vkbnc`
`vnl_pw_grad.cpp`	Lazy-allocate `vkb` in `getgradq_vnl()` for GPU Velocity path

Test Plan

✅ GPU build (buniverse.sh --cuda --test): success
✅ Kernel unit tests: 28/28 passed (incl. cal_vnl_op_gpu, cal_vkb1_nl_op_gpu)
✅ GPU integration tests: 38/40 passed (2 pre-existing failures: scf_bpcg, scf_out_wf — identical on clean develop)

Memory Savings

For typical large systems (nkb≈2000, npwx≈100000): ~3.2 GB CPU memory saved on GPU path.

…ll_vnl On GPU path, vkb.create(nkb, npwx) allocates CPU ComplexMatrix memory that is never used — getvnl() writes directly to GPU buffers (c_vkb/z_vkb). The only consumer of vkb.nc metadata is the leading dimension in gemm/gemv. This wastes nkb*npwx*16 bytes of CPU memory (~3.2 GB for large systems). Changes: - Add vkbnc member to store column dimension independently - Guard vkb.create() behind !use_gpu_ in init() - Replace all ppcell->vkb.nc with ppcell->vkbnc (op_pw_nl.cpp, hamilt_pw.cpp) - Add lazy-allocation guard in getgradq_vnl() for GPU Velocity path Tested: GPU build + 28/28 kernel UTs + 38/40 GPU integration tests (2 pre-existing failures: scf_bpcg, scf_out_wf)

Copilot

Pull request overview

This PR reduces CPU memory usage in the plane-wave nonlocal pseudopotential (VNL) GPU execution path by avoiding allocation of the large CPU-side vkb ComplexMatrix when it isn’t populated, while preserving the needed leading-dimension metadata for GEMM/GEMV.

Changes:

Add vkbnc to store vkb’s intended column dimension (npwk_max) even when vkb is not allocated.
Skip vkb.create(nkb, npwx) in pseudopot_cell_vnl::init() when running on GPU, and use vkbnc where vkb.nc was previously used as GEMM/GEMV leading dimension.
Add lazy CPU allocation of vkb in getgradq_vnl() for the GPU Velocity path.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
source/source_pw/module_pwdft/vnl_pw.h	Introduces `vkbnc` to retain `vkb` column dimension without allocating CPU `vkb` on GPU runs.
source/source_pw/module_pwdft/vnl_pw.cpp	Sets `vkbnc` and guards CPU `vkb` allocation behind `!use_gpu_`.
source/source_pw/module_pwdft/op_pw_nl.cpp	Switches GEMM/GEMV leading-dimension argument from `vkb.nc` to `vkbnc`.
source/source_pw/module_pwdft/hamilt_pw.cpp	Switches GEMM/GEMV leading-dimension argument from `vkb.nc` to `vkbnc`.
source/source_pw/module_pwdft/vnl_pw_grad.cpp	Lazily allocates CPU `vkb` when needed for gradient/Velocity workflows on GPU path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This reverts commit b71e3fe.

This reverts commit a0f43dc.

mohanchen

LGTM

Copilot AI review requested due to automatic review settings April 28, 2026 15:55

Copilot started reviewing on behalf of Cstandardlib April 28, 2026 15:56 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread source/source_pw/module_pwdft/vnl_pw_grad.cpp

Comment thread source/source_pw/module_pwdft/vnl_pw_grad.cpp

Cstandardlib and others added 4 commits April 29, 2026 01:06

Update source/source_pw/module_pwdft/vnl_pw_grad.cpp

a0f43dc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update source/source_pw/module_pwdft/vnl_pw_grad.cpp

b71e3fe

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Revert "Update source/source_pw/module_pwdft/vnl_pw_grad.cpp"

b963e3b

This reverts commit b71e3fe.

Revert "Update source/source_pw/module_pwdft/vnl_pw_grad.cpp"

0999a78

This reverts commit a0f43dc.

mohanchen added Memory Memory issues Refactor Refactor ABACUS codes labels Apr 29, 2026

mohanchen approved these changes Apr 29, 2026

View reviewed changes

mohanchen merged commit 4b847b6 into deepmodeling:develop Apr 29, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Eliminate unnecessary vkb CPU allocation on GPU path#7296

[Fix] Eliminate unnecessary vkb CPU allocation on GPU path#7296
mohanchen merged 5 commits into
deepmodeling:developfrom
Cstandardlib:fix/vkb-gpu-memory

Cstandardlib commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mohanchen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Cstandardlib commented Apr 28, 2026

Summary

Changes

Test Plan

Memory Savings

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mohanchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants