Skip to content

Conversation

@dzzz2001
Copy link
Collaborator

@dzzz2001 dzzz2001 commented Feb 25, 2025

Linked Issue

Fix #4247

What's changed?

  • add openmp support to these two functions
  • inline some hotspot functions in cal_f_delta and cal_pdm

Perf Improvement

test case: https://github.com/deepmodeling/abacus-develop/files/15456955/cusolver_mpi_openmp.zip
test command: OMP_NUM_THREADS=6 mpirun -n 2 abacus

cal_f_delta cal_pdm total
before 658.98s 50.09s 815.75s
after inline functions 205.87s 29.56s 343.35s
after inline and add openmp 28.07s 4.97s 140.46s

One lesson learned from issue #4247 is that some simple functions are best placed in header files to facilitate compiler inlining, which can have a significant impact on performance.

@dzzz2001 dzzz2001 requested a review from mohanchen February 25, 2025 13:18
@mohanchen mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Machine Learning Issues related to the DeePKS labels Feb 26, 2025
@mohanchen mohanchen merged commit bcca5b3 into deepmodeling:develop Feb 26, 2025
14 checks passed
@dzzz2001 dzzz2001 deleted the deepks-opt branch February 26, 2025 15:20
Fisherd99 pushed a commit to Fisherd99/abacus-BSE that referenced this pull request Mar 31, 2025
…#5933)

* inline base_matrix function

* add openmp to deepks_force and deepks_pdm

* inline more functions in base_matrix

* inline functions of intarray

* initialize some variables

* fix some format

* fix format

* fix a bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GPU & DCU & HPC GPU and DCU and HPC related any issues Machine Learning Issues related to the DeePKS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When using cusolver, multithreading is slower than multiprocessing

2 participants