Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) #6489

zgn-26714 · 2025-09-06T13:07:42Z

split the PR #6459 into several smaller PRs.
This pull request addresses issue #6377

Modify the transfer function transfer_dm_2d_to_gint. Removed the temporary variable dm_full when transitioning from serial to 2D block parallelism in Hcontainer. At the same time, update the gint_old.cpp file accordingly.
A temporary variable DM2D_tmp was created to split the 2D block parallel dm into smaller blocks, which are then handled by DM2D_tmp. The function transferParallels2Serials is directly called to convert DM2D_tmp into dm_gint. This approach avoids the large memory consumption that would occur if a large matrix dm_full was first used to hold DM2D and then split into smaller matrices dm_gint.

source/source_lcao/module_gint/gint_old.cpp

source/source_lcao/module_gint/temp_gint/gint_common.cpp

…nto dev-1

…D block parallelism to serial in Hcontainer(develop) (deepmodeling#6489) * delete tem Hcontainer to reduce memory usage * simplify the compute code * change DM2D_tmp to dm2d_tmp, use vector instead of new

…w. (#6490) * Feature: add DFT-1/2 and shell DFT-1/2, currently only support PW esolver_ks_pw. * Added Sep, Sep_Cell, and VSep to organize the self-energy potential of DFT-1/2 * Added a new effective potential pot_sep for calculating the self-energy potential * Added initialization of the self-energy potential in the esolver_ks_pw control flow * Added the keyword SEP_FILES in the STRU file for reading self-energy potential files * Added the dfthalf_type keyword in INPUT to enable DFT-1/2 and shell DFT-1/2 * Fix: Compilation error in DeepKS unit tests after adding DFT-1/2 * Fix: Add the additional files to Makefile.Objects * Build(deps): Bump actions/setup-python from 5 to 6 (#6492) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Refactor] Move hardware initializer out from esolver code (#6494) * Move hardware initializer out from esolver * Remove useless codes * Remove finalize code out * Feature: support NVTX profiling via timer_enable_nvtx flag (#6495) * Feature: support NVTX profiling via timer_enable_nvtx flag Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Add timer_enable_nvtx section in markdown Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: Use __USE_NVTX macro to avoid NVTX linking errors in tests. Clarify in docs that timer_enable_nvtx parameter only takes effect on CUDA platforms. Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Optimize Davidson by fusing operators, offloading CPU computation to GPU, and reducing memory transfers (#6493) * Perf: Optimize Diago_DavSubspace with GPU operators by adding and fusing custom kernels. Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: reduce memory allocation and copy in Diago_DavSubspace::diag_zhegvx Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Replace loop-based 2D copy and memset with memcpy_2d_op, memset_2d_op Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: use warp reduce instead of shared memory for better efficiency Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix compilation error Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON and switch to nvtx3 headers when CUDA_VERSION >= 12090 (#6497) * Fix: switch to nvtx3 headers when CUDA_VERSION >= 12090 Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix dsp compilation problem (#6499) * Fix: Fix crash in Debug build with multi-GPU due to forced cudaSetDevice(0) (#6498) Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) (#6489) * delete tem Hcontainer to reduce memory usage * simplify the compute code * change DM2D_tmp to dm2d_tmp, use vector instead of new * Update version to 3.9.0.14 (#6504) * Refactor: Remove the GlobalC from sep_cell and vsep_cell * Removed GlobalC::sep_cell and GlobalC::vsep_cell from GlobalC * Integrated sep_cell into UnitCell * Integrated vsep_cell into esolver_ks_pw * Added empty constructors and destructors for Sep_Pot and Sep_Cell to facilitate unit testing compilation --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Critsium <tsfxwbbzxy@163.com> Co-authored-by: Tianxiang Wang <mail.txwang@qq.com> Co-authored-by: zgn-26714 <3022939753@qq.com> Co-authored-by: Erjie Wu <110683255+ErjieWu@users.noreply.github.com> Co-authored-by: Mohan Chen <mohanchen@pku.edu.cn>

…D block parallelism to serial in Hcontainer(develop) (deepmodeling#6489) * delete tem Hcontainer to reduce memory usage * simplify the compute code * change DM2D_tmp to dm2d_tmp, use vector instead of new

…w. (deepmodeling#6490) * Feature: add DFT-1/2 and shell DFT-1/2, currently only support PW esolver_ks_pw. * Added Sep, Sep_Cell, and VSep to organize the self-energy potential of DFT-1/2 * Added a new effective potential pot_sep for calculating the self-energy potential * Added initialization of the self-energy potential in the esolver_ks_pw control flow * Added the keyword SEP_FILES in the STRU file for reading self-energy potential files * Added the dfthalf_type keyword in INPUT to enable DFT-1/2 and shell DFT-1/2 * Fix: Compilation error in DeepKS unit tests after adding DFT-1/2 * Fix: Add the additional files to Makefile.Objects * Build(deps): Bump actions/setup-python from 5 to 6 (deepmodeling#6492) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Refactor] Move hardware initializer out from esolver code (deepmodeling#6494) * Move hardware initializer out from esolver * Remove useless codes * Remove finalize code out * Feature: support NVTX profiling via timer_enable_nvtx flag (deepmodeling#6495) * Feature: support NVTX profiling via timer_enable_nvtx flag Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Add timer_enable_nvtx section in markdown Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: Use __USE_NVTX macro to avoid NVTX linking errors in tests. Clarify in docs that timer_enable_nvtx parameter only takes effect on CUDA platforms. Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Optimize Davidson by fusing operators, offloading CPU computation to GPU, and reducing memory transfers (deepmodeling#6493) * Perf: Optimize Diago_DavSubspace with GPU operators by adding and fusing custom kernels. Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: reduce memory allocation and copy in Diago_DavSubspace::diag_zhegvx Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Replace loop-based 2D copy and memset with memcpy_2d_op, memset_2d_op Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: use warp reduce instead of shared memory for better efficiency Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix compilation error Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON and switch to nvtx3 headers when CUDA_VERSION >= 12090 (deepmodeling#6497) * Fix: switch to nvtx3 headers when CUDA_VERSION >= 12090 Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix dsp compilation problem (deepmodeling#6499) * Fix: Fix crash in Debug build with multi-GPU due to forced cudaSetDevice(0) (deepmodeling#6498) Signed-off-by：Tianxiang Wang<tianxiang.wang@metax-tech.com>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) (deepmodeling#6489) * delete tem Hcontainer to reduce memory usage * simplify the compute code * change DM2D_tmp to dm2d_tmp, use vector instead of new * Update version to 3.9.0.14 (deepmodeling#6504) * Refactor: Remove the GlobalC from sep_cell and vsep_cell * Removed GlobalC::sep_cell and GlobalC::vsep_cell from GlobalC * Integrated sep_cell into UnitCell * Integrated vsep_cell into esolver_ks_pw * Added empty constructors and destructors for Sep_Pot and Sep_Cell to facilitate unit testing compilation --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Critsium <tsfxwbbzxy@163.com> Co-authored-by: Tianxiang Wang <mail.txwang@qq.com> Co-authored-by: zgn-26714 <3022939753@qq.com> Co-authored-by: Erjie Wu <110683255+ErjieWu@users.noreply.github.com> Co-authored-by: Mohan Chen <mohanchen@pku.edu.cn>

zgn-26714 added 2 commits August 21, 2025 21:13

delete tem Hcontainer to reduce memory usage

7d4fe5a

simplify the compute code

e6a1983

mohanchen added the Refactor Refactor ABACUS codes label Sep 8, 2025

mohanchen reviewed Sep 8, 2025

View reviewed changes

source/source_lcao/module_gint/gint_old.cpp Outdated Show resolved Hide resolved

dzzz2001 reviewed Sep 8, 2025

View reviewed changes

source/source_lcao/module_gint/temp_gint/gint_common.cpp Outdated Show resolved Hide resolved

zgn-26714 added 3 commits September 10, 2025 00:02

Merge branch 'develop' into dev-1

3e883d7

change DM2D_tmp to dm2d_tmp, use vector instead of new

6d2c711

Merge branch 'dev-1' of https://github.com/zgn-26714/abacus-develop i…

1f3488e

…nto dev-1

mohanchen added the Performance Issues related to fail running ABACUS label Sep 11, 2025

mohanchen approved these changes Sep 11, 2025

View reviewed changes

mohanchen merged commit 0817e32 into deepmodeling:develop Sep 11, 2025
14 checks passed

dzzz2001 mentioned this pull request Sep 19, 2025

Fix: fix memory leak in gint module #6515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) #6489

Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) #6489

Uh oh!

zgn-26714 commented Sep 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) #6489

Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) #6489

Uh oh!

Conversation

zgn-26714 commented Sep 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants