Skip to content

Commit

Permalink
GPU: Add multi-device support for ekinetic hpsi (#1428)
Browse files Browse the repository at this point in the history
* add a device template for class psi

* add headfile STL string

* address comments

* add a description for function memory::abacus_resize_memory

* seperate class psi

* fix CI errors

* fix CI errors

* add destructor for psi

* fix cuda building errors

* add copy constructor for template class

* fix CI tests

* add device template for class diag_cg

* GPU: removing std::vector from class diago_cg (#1366)

* removing std::vector from class diago_cg

* fix UT errors

* add multi device support for function ddot_real

* add UT for the new method!

* use template functors as the interface of multi device implementations

* add comments for head files

* fix UT's error

* update functions of math_kernel

* Add UTs for all functors within module_psi

* fix CI error

* Update source/module_hsolver/include/math_kernel.h

* Update source/module_hsolver/src/math_kernel.cpp

* fix cuda complie bug

* add vector_div_constant_op_cpu UT

* fix Hsolver_UTs bug

* update Hsolvers_UTs:
tests for double math_kernel funcitons

* Close USE_CUDA

* replace ZEROS&COPYARRAY function in diago_cg.cpp

* fix format

* add gemm function to math_kernel

* add cgemm_ to blas_connector

* replace some function in davidson

* fix CI error

* Update CMakeLists.txt

* add template for class DiagoDavid

* Add multi-device support for ekinetic hpsi

* fix CI error

* Add multi device support for HPsi(nonlocal_pw)

* add .idea to .gitignore

* fix CI error

* GPU: Address comments of PR#1428 (#1440)

* docs: skeletion of developer docs

* update titles in docs

* adjust titles

* docs: adjust contribution guide

* update again contribution guide

* adjust docs' skeleton

* remove documentation and notes temporarily

* remote documentation title

* add an item in faq

* trial title

* trials of index

* add contribution process

* update contribution guide

* fix typo

* update skeleton

* update hands_on.md

* adjust file position

* separate fictitious force from total force in md_lgv (#1412)

* refactor: append output of H/S(R) matrix

* test: update test 207_NO_KP_OH2

* fix: separate fictitious force in md_lgv

* refactor: remove __LCAO in module_md

* delete useless files in module_md

* init md velocity using Gaussian random

* test: update MD_func_test

* update tests of MD_func again

* fix: fix bugs during merge

* fix: output the cartesian coordinate in cube file (#1426)

* Update docs/community/contribution_guide.md

Co-authored-by: Chun Cai <amoycaic@gmail.com>

* modify words in faq.md

* Build: abacus can compile pw version with cmake (#1397)

* Build: revise some typos

* build:abacus can compile pw version with cmake; add workflow

* change Makefile.vars

* fix wrong tag in build_test.yml

* fix wrong indentations

* merge

* Build: change modules to cmake (#1431)

* fix conclicts

* use {} instead of std::complex<double>

* use Operator_PW as class template

* fix CI error

* remove unused headfile import

Co-authored-by: hongriTianqi <z.hao.1@163.com>
Co-authored-by: Tianqi Zhao <hongriTianqi@users.noreply.github.com>
Co-authored-by: Yu Liu <77716030+YuLiu98@users.noreply.github.com>
Co-authored-by: pxlxingliang <91927439+pxlxingliang@users.noreply.github.com>
Co-authored-by: Chun Cai <amoycaic@gmail.com>
Co-authored-by: Qianrui <76200646+Qianruipku@users.noreply.github.com>

Co-authored-by: Qianrui <76200646+Qianruipku@users.noreply.github.com>
Co-authored-by: haozhihan <haozhi.han@stu.pku.edu.cn>
Co-authored-by: North <haozhi.han@outlook.com>
Co-authored-by: dyzheng <zhengdy@dp.tech>
Co-authored-by: hongriTianqi <z.hao.1@163.com>
Co-authored-by: Tianqi Zhao <hongriTianqi@users.noreply.github.com>
Co-authored-by: Yu Liu <77716030+YuLiu98@users.noreply.github.com>
Co-authored-by: pxlxingliang <91927439+pxlxingliang@users.noreply.github.com>
Co-authored-by: Chun Cai <amoycaic@gmail.com>
  • Loading branch information
10 people committed Oct 26, 2022
1 parent 02f0658 commit 736ccf7
Show file tree
Hide file tree
Showing 70 changed files with 5,259 additions and 1,214 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ STRU_READIN_ADJUST.cif
*.egg
*.egg-info
build
dist
dist
.idea
5 changes: 2 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ if(ENABLE_LCAO)
find_package(Cereal REQUIRED)
include_directories(${CEREAL_INCLUDE_DIR})
add_compile_definitions(USE_CEREAL_SERIALIZATION)

find_package(ELPA REQUIRED)
include_directories(${ELPA_INCLUDE_DIR})
target_link_libraries(${ABACUS_BIN_NAME} ELPA::ELPA)
Expand Down Expand Up @@ -163,6 +162,7 @@ if(USE_CUDA OR USE_CUSOLVER_LCAO)
include_directories(${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
if (USE_CUDA)
add_compile_definitions(__CUDA)
add_compile_definitions(__UT_USE_CUDA)
endif()
if (USE_CUSOLVER_LCAO)
add_compile_definitions(__CUSOLVER_LCAO)
Expand Down Expand Up @@ -193,6 +193,7 @@ if(USE_ROCM)
roc::hipblas
)
add_compile_definitions(__ROCM)
add_compile_definitions(__UT_USE_ROCM)
endif()

if(ENABLE_ASAN)
Expand Down Expand Up @@ -409,5 +410,3 @@ endif()
if(ENABLE_COVERAGE)
coverage_evaluate()
endif()


1 change: 1 addition & 0 deletions source/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ OPTS = -Ofast -march=native -std=c++14 -pedantic -m64 ${INCLUDES}
HONG = -D__LCAO
OBJ_DIR = obj
BIN_DIR = ../bin
SOURCE_DIR = .
ifeq ($(findstring mpi, $(CC)), mpi)
# We do not support EXX in sequential version temporarily.
HONG += -D__MPI -D__EXX -DEXX_H_COMM=2 -DUSE_CEREAL_SERIALIZATION -DEXX_DM=3 -DEXX_H_COMM=2 -DTEST_EXX_LCAO=0 -DTEST_EXX_RADIAL=1
Expand Down
10 changes: 10 additions & 0 deletions source/Makefile.Objects
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,15 @@ VPATH=./src_global:\
./module_xc:\
./module_esolver:\
./module_hsolver:\
./module_hsolver/src:\
./module_hsolver/genelpa:\
./module_elecstate:\
./module_psi:\
./module_psi/src:\
./module_hamilt:\
./module_hamilt/ks_pw:\
./module_hamilt/ks_lcao:\
./module_hamilt/src:\
./module_gint:\
./src_pw:\
./src_lcao:\
Expand Down Expand Up @@ -193,6 +196,10 @@ OBJS_GINT=gint.o\
grid_technique.o\

OBJS_HAMILT=hamilt_pw.o\
operator.o\
operator_pw.o\
ekinetic.o\
nonlocal.o\
ekinetic_pw.o\
veff_pw.o\
nonlocal_pw.o\
Expand All @@ -215,6 +222,7 @@ OBJS_HSOLVER=diago_cg.o\
hsolver_pw.o\
hsolver_pw_sdft.o\
diago_iter_assist.o\
math_kernel.o\

OBJS_HSOLVER_LCAO=hsolver_lcao.o\
diago_blas.o\
Expand Down Expand Up @@ -256,6 +264,8 @@ OBJS_ORBITAL=ORB_atomic.o\
parallel_orbitals.o\

OBJS_PSI=psi.o\
memory_psi.o\
device.o\

OBJS_PW=fft.o\
pw_basis.o\
Expand Down
7 changes: 7 additions & 0 deletions source/module_base/blas_connector.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ extern "C"
void dgemv_(const char*const transa, const int*const m, const int*const n,
const double*const alpha, const double*const a, const int*const lda, const double*const x, const int*const incx,
const double*const beta, double*const y, const int*const incy);

void cgemv_(const char *trans, const int *m, const int *n, const std::complex<float> *alpha,
const std::complex<float> *a, const int *lda, const std::complex<float> *x, const int *incx,
const std::complex<float> *beta, std::complex<float> *y, const int *incy);

void zgemv_(const char *trans, const int *m, const int *n, const std::complex<double> *alpha,
const std::complex<double> *a, const int *lda, const std::complex<double> *x, const int *incx,
Expand All @@ -63,6 +67,9 @@ extern "C"
void dgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k,
const double *alpha, const double *a, const int *lda, const double *b, const int *ldb,
const double *beta, double *c, const int *ldc);
void cgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k,
const std::complex<float> *alpha, const std::complex<float> *a, const int *lda, const std::complex<float> *b, const int *ldb,
const std::complex<float> *beta, std::complex<float> *c, const int *ldc);
void zgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k,
const std::complex<double> *alpha, const std::complex<double> *a, const int *lda, const std::complex<double> *b, const int *ldb,
const std::complex<double> *beta, std::complex<double> *c, const int *ldc);
Expand Down
4 changes: 2 additions & 2 deletions source/module_elecstate/test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ remove_definitions(-D__DEEPKS)
if(ENABLE_LCAO)
AddTest(
TARGET EState_updaterhok_pw
LIBS ${math_libs} planewave_serial base_serial
LIBS ${math_libs} planewave_serial base_serial psi
SOURCES updaterhok_pw_test.cpp
../elecstate_pw.cpp ../elecstate.cpp
../../src_pw/charge.cpp ../../src_parallel/parallel_reduce.cpp
Expand All @@ -18,7 +18,7 @@ if(ENABLE_LCAO)

AddTest(
TARGET EState_psiToRho_lcao
LIBS ${math_libs} ELPA::ELPA base orb cell neighbor planewave
LIBS ${math_libs} ELPA::ELPA base orb cell neighbor planewave psi
SOURCES elecstate_lcao_test.cpp ../elecstate_lcao.cpp ../dm2d_to_grid.cpp
../../src_parallel/parallel_global.cpp ../../src_parallel/parallel_common.cpp ../../src_parallel/parallel_reduce.cpp
../../src_lcao/local_orbital_charge.cpp ../../src_lcao/DM_gamma.cpp ../../src_lcao/DM_k.cpp
Expand Down
12 changes: 6 additions & 6 deletions source/module_esolver/esolver_ks_pw.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -306,15 +306,15 @@ namespace ModuleESolver
// be careful that istep start from 0 and iter start from 1
if((istep==0||istep==1)&&iter==1)
{
hsolver::DiagoIterAssist::need_subspace = false;
hsolver::DiagoIterAssist<double>::need_subspace = false;
}
else
{
hsolver::DiagoIterAssist::need_subspace = true;
hsolver::DiagoIterAssist<double>::need_subspace = true;
}

hsolver::DiagoIterAssist::PW_DIAG_THR = ethr;
hsolver::DiagoIterAssist::PW_DIAG_NMAX = GlobalV::PW_DIAG_NMAX;
hsolver::DiagoIterAssist<double>::PW_DIAG_THR = ethr;
hsolver::DiagoIterAssist<double>::PW_DIAG_NMAX = GlobalV::PW_DIAG_NMAX;
this->phsol->solve(this->p_hamilt, this->psi[0], this->pelec, GlobalV::KS_SOLVER);

// transform energy for print
Expand Down Expand Up @@ -703,8 +703,8 @@ namespace ModuleESolver
{
if(this->phsol != nullptr)
{
hsolver::DiagoIterAssist::need_subspace = false;
hsolver::DiagoIterAssist::PW_DIAG_THR = ethr;
hsolver::DiagoIterAssist<double>::need_subspace = false;
hsolver::DiagoIterAssist<double>::PW_DIAG_THR = ethr;
this->phsol->solve(this->p_hamilt, this->psi[0], this->pelec, GlobalV::KS_SOLVER, true);
}
else
Expand Down
14 changes: 7 additions & 7 deletions source/module_esolver/esolver_sdft_pw.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,14 @@ void ESolver_SDFT_PW::hamilt2density(int istep, int iter, double ethr)
// be careful that istep start from 0 and iter start from 1
if(istep==0&&iter==1)
{
hsolver::DiagoIterAssist::need_subspace = false;
hsolver::DiagoIterAssist<double>::need_subspace = false;
}
else
{
hsolver::DiagoIterAssist::need_subspace = true;
hsolver::DiagoIterAssist<double>::need_subspace = true;
}
hsolver::DiagoIterAssist::PW_DIAG_THR = ethr;
hsolver::DiagoIterAssist::PW_DIAG_NMAX = GlobalV::PW_DIAG_NMAX;
hsolver::DiagoIterAssist<double>::PW_DIAG_THR = ethr;
hsolver::DiagoIterAssist<double>::PW_DIAG_NMAX = GlobalV::PW_DIAG_NMAX;
this->phsol->solve(this->p_hamilt, this->psi[0], this->pelec,this->stowf, istep, iter, GlobalV::KS_SOLVER);
// transform energy for print
GlobalC::en.eband = this->pelec->eband;
Expand Down Expand Up @@ -163,9 +163,9 @@ void ESolver_SDFT_PW::postprocess()
{
int iter = 1;
int istep = 0;
hsolver::DiagoIterAssist::PW_DIAG_NMAX = GlobalV::PW_DIAG_NMAX;
hsolver::DiagoIterAssist::PW_DIAG_THR = std::max(std::min(1e-5, 0.1 * GlobalV::SCF_THR / std::max(1.0, GlobalC::CHR.nelec)),1e-12);
hsolver::DiagoIterAssist::need_subspace = false;
hsolver::DiagoIterAssist<double>::PW_DIAG_NMAX = GlobalV::PW_DIAG_NMAX;
hsolver::DiagoIterAssist<double>::PW_DIAG_THR = std::max(std::min(1e-5, 0.1 * GlobalV::SCF_THR / std::max(1.0, GlobalC::CHR.nelec)),1e-12);
hsolver::DiagoIterAssist<double>::need_subspace = false;
this->phsol->solve(this->p_hamilt, this->psi[0], this->pelec,this->stowf,istep, iter, GlobalV::KS_SOLVER, true);
((hsolver::HSolverPW_SDFT*)phsol)->stoiter.cleanchiallorder();//release lots of memories
}
Expand Down
15 changes: 15 additions & 0 deletions source/module_hamilt/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
add_subdirectory(ks_pw)
add_subdirectory(ks_lcao)

list(APPEND objects
operator.cpp
hamilt_pw.cpp
src/ekinetic.cpp
src/nonlocal.cpp
ks_pw/ekinetic_pw.cpp
ks_pw/veff_pw.cpp
ks_pw/nonlocal_pw.cpp
ks_pw/meta_pw.cpp
ks_pw/velocity_pw.cpp
ks_pw/operator_pw.cpp
)

if(ENABLE_LCAO)
Expand All @@ -24,6 +29,12 @@ if(ENABLE_LCAO)
)
endif()

if (USE_CUDA)
list(APPEND objects src/cuda/ekinetic.cu src/cuda/nonlocal.cu)
elseif(USE_ROCM)
list(APPEND objects src/rocm/ekinetic.cu src/cuda/nonlocal.cu)
endif()

add_library(
hamilt
OBJECT
Expand All @@ -33,3 +44,7 @@ add_library(
if(ENABLE_COVERAGE)
add_coverage(hamilt)
endif()

if (BUILD_TESTING)
add_subdirectory(test)
endif()
10 changes: 6 additions & 4 deletions source/module_hamilt/hamilt_pw.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ HamiltPW::HamiltPW()

if (GlobalV::T_IN_H)
{
Operator<std::complex<double>>* ekinetic = new Ekinetic<OperatorPW>(
// Operator<double>* ekinetic = new Ekinetic<OperatorLCAO<double>>
Operator<std::complex<double>>* ekinetic = new Ekinetic<OperatorPW<double>>(
tpiba2,
gk2,
GlobalC::wfcpw->nks,
GlobalC::wfcpw->npwk_max
);
if(this->ops == nullptr)
Expand All @@ -41,7 +43,7 @@ HamiltPW::HamiltPW()
}
if (GlobalV::VL_IN_H)
{
Operator<std::complex<double>>* veff = new Veff<OperatorPW>(
Operator<std::complex<double>>* veff = new Veff<OperatorPW<double>>(
isk,
&(GlobalC::pot.vr_eff),
GlobalC::wfcpw
Expand All @@ -57,7 +59,7 @@ HamiltPW::HamiltPW()
}
if (GlobalV::VNL_IN_H)
{
Operator<std::complex<double>>* nonlocal = new Nonlocal<OperatorPW>(
Operator<std::complex<double>>* nonlocal = new Nonlocal<OperatorPW<double>>(
isk,
&GlobalC::ppcell,
&GlobalC::ucell
Expand All @@ -71,7 +73,7 @@ HamiltPW::HamiltPW()
this->ops->add(nonlocal);
}
}
Operator<std::complex<double>>* meta = new Meta<OperatorPW>(
Operator<std::complex<double>>* meta = new Meta<OperatorPW<double>>(
tpiba,
isk,
&GlobalC::pot.vofk,
Expand Down
37 changes: 37 additions & 0 deletions source/module_hamilt/include/ekinetic.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#ifndef MODULE_HAMILT_EKINETIC_H
#define MODULE_HAMILT_EKINETIC_H

#include "module_psi/psi.h"
#include <complex>

namespace hamilt {
template <typename FPTYPE, typename Device>
struct ekinetic_pw_op {
void operator() (
const Device* dev,
const int& nband,
const int& npw,
const int& max_npw,
const FPTYPE& tpiba2,
const FPTYPE* gk2_ik,
std::complex<FPTYPE>* tmhpsi,
const std::complex<FPTYPE>* tmpsi_in);
};

#if __CUDA || __UT_USE_CUDA || __ROCM || __UT_USE_ROCM
// Partially specialize functor for psi::GpuDevice.
template <typename FPTYPE>
struct ekinetic_pw_op<FPTYPE, psi::DEVICE_GPU> {
void operator() (
const psi::DEVICE_GPU* dev,
const int& nband,
const int& npw,
const int& max_npw,
const FPTYPE& tpiba2,
const FPTYPE* gk2_ik,
std::complex<FPTYPE>* tmhpsi,
const std::complex<FPTYPE>* tmpsi_in);
};
#endif // __CUDA || __UT_USE_CUDA || __ROCM || __UT_USE_ROCM
} // namespace hamilt
#endif //MODULE_HAMILT_EKINETIC_H
49 changes: 49 additions & 0 deletions source/module_hamilt/include/nonlocal.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#ifndef MODULE_HAMILT_NONLOCAL_H
#define MODULE_HAMILT_NONLOCAL_H

#include "module_psi/psi.h"
#include <complex>

namespace hamilt {
template <typename FPTYPE, typename Device>
struct nonlocal_pw_op {
void operator() (
const Device* dev,
const int& l1,
const int& l2,
const int& l3,
int& sum,
int& iat,
const int& spin,
const int& nkb,
const int& deeq_x,
const int& deeq_y,
const int& deeq_z,
const FPTYPE* deeq,
std::complex<FPTYPE>* ps,
const std::complex<FPTYPE>* becp);
};

#if __CUDA || __UT_USE_CUDA || __ROCM || __UT_USE_ROCM
// Partially specialize functor for psi::GpuDevice.
template <typename FPTYPE>
struct nonlocal_pw_op<FPTYPE, psi::DEVICE_GPU> {
void operator() (
const psi::DEVICE_GPU* dev,
const int& l1,
const int& l2,
const int& l3,
int& sum,
int& iat,
const int& spin,
const int& nkb,
const int& deeq_x,
const int& deeq_y,
const int& deeq_z,
const FPTYPE* deeq,
std::complex<FPTYPE>* ps,
const std::complex<FPTYPE>* becp);
};
#endif // __CUDA || __UT_USE_CUDA || __ROCM || __UT_USE_ROCM
} // namespace hamilt
#endif //MODULE_HAMILT_NONLOCAL_H
20 changes: 17 additions & 3 deletions source/module_hamilt/ks_pw/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
add_library(
operator_ks_pw
OBJECT
list(APPEND operator_ks_pw_srcs
operator_pw.cpp
ekinetic_pw.cpp
veff_pw.cpp
nonlocal_pw.cpp
meta_pw.cpp
velocity_pw.cpp
../operator.cpp
../src/ekinetic.cpp
../src/nonlocal.cpp
)

if (USE_CUDA)
list(APPEND operator_ks_pw_srcs ../src/cuda/ekinetic.cu ../src/cuda/nonlocal.cu)
elseif(USE_ROCM)
list(APPEND operator_ks_pw_srcs ../src/rocm/ekinetic.cu ../src/rocm/nonlocal.cu)
endif()

add_library(
operator_ks_pw
OBJECT
${operator_ks_pw_srcs}
)

if(ENABLE_COVERAGE)
Expand Down

0 comments on commit 736ccf7

Please sign in to comment.