-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: use less memory to calculate stress in pw base #4047
base: develop
Are you sure you want to change the base?
Conversation
Has the efficiency of the new algorithm been tested? Are there any test data available? |
I have not tested many cases, in Mg16Al16 case , time of |
Perhaps the QE code can be used as a reference. |
if (gx[ig].norm2() > 1e-9) { | ||
for(int ig = 0;ig< ngy;ig++) | ||
{ | ||
FPTYPE norm2 = gx[ig * 3] * gx[ig * 3] + gx[ig * 3 + 1] * gx[ig * 3 + 1] + gx[ig * 3 + 2] * gx[ig * 3 + 2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ig * 3 can be calculated and stored first
{ | ||
FPTYPE norm2 = gx[ig * 3] * gx[ig * 3] + gx[ig * 3 + 1] * gx[ig * 3 + 1] + gx[ig * 3 + 2] * gx[ig * 3 + 2]; | ||
dg [ig] = delta * sqrt(norm2) ; | ||
if (norm2 > 1e-9) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why set this value to 1e-9?
@@ -49,7 +49,8 @@ void Stress_PW<FPTYPE, Device>::stress_us(ModuleBase::matrix& sigma, | |||
ModuleBase::matrix dylmk0(ppcell_in->lmaxq * ppcell_in->lmaxq, npw); | |||
for (int ipol = 0; ipol < 3; ipol++) | |||
{ | |||
this->dylmr2(ppcell_in->lmaxq * ppcell_in->lmaxq, npw, rho_basis->gcar, dylmk0, ipol); | |||
double* gcar_ptr = reinterpret_cast<double*>(rho_basis->gcar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the type of gcar? do you have to use cast?
@@ -634,30 +815,32 @@ void Stress_Func<FPTYPE, Device>::dylmr2 ( | |||
// gx = g +/- dg | |||
|
|||
|
|||
ModuleBase::Vector3<FPTYPE> *gx = new ModuleBase::Vector3<FPTYPE> [ngy]; | |||
FPTYPE *gx = new FPTYPE[3 * ngy]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use instead of new?
// loop over all G-vectors | ||
for(int ig=0;ig<npw;ig++) | ||
{ | ||
vkb_ptr[ig] -= 2.0 * ylm_ptr[ig] * vq_deri_ptr[ig] * sk_in[ig] * pref_in[ih] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ig * 3 can be calculated first? multiple of 2.0 can be done after this for loop
for (int ig = 0; ig < npw; ig++) | ||
{ | ||
vq_ptr[ig] = this->Polynomial_Interpolation_nl( | ||
GlobalC::ppcell.tab, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use GlobalC
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_vq_deri(int it, const FPTYPE* gk, int npw) | ||
{ | ||
// calculate beta in G-space using an interpolation table | ||
const int nbeta = GlobalC::ucell.atoms[it].ncpp.nbeta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use GlobalC
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_vq(int it, const FPTYPE* gk, int npw) | ||
{ | ||
// calculate beta in G-space using an interpolation table | ||
const int nbeta = GlobalC::ucell.atoms[it].ncpp.nbeta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use GlobalC
gk[ig*3+2] = tmp.z; | ||
FPTYPE norm = sqrt(tmp.norm2()); | ||
gk[3 * npw + ig] = norm * GlobalC::ucell.tpiba; | ||
gk[4 * npw + ig] = norm<1e-8?0.0:1.0/norm*GlobalC::ucell.tpiba; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalC likes a cancer
template <typename FPTYPE, typename Device> | ||
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_gk(int ik, ModulePW::PW_Basis_K* wfc_basis) | ||
{ | ||
int npw = wfc_basis->npwk[ik]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a const
ModuleBase::timer::tick("Stress_Func","stress_nl"); | ||
} | ||
// cal_gk | ||
template <typename FPTYPE, typename Device> | ||
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_gk(int ik, ModulePW::PW_Basis_K* wfc_basis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need unit test?
} | ||
// cal_vq | ||
template <typename FPTYPE, typename Device> | ||
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_vq(int it, const FPTYPE* gk, int npw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need unit test?
|
||
// cal_vq_deri | ||
template <typename FPTYPE, typename Device> | ||
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_vq_deri(int it, const FPTYPE* gk, int npw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need unit test?
} | ||
// cal_ylm | ||
template <typename FPTYPE, typename Device> | ||
std::vector<FPTYPE> Stress_Func<FPTYPE, Device>::cal_ylm(int lmax, int npw, const FPTYPE* gk_in) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need unit test?
// cal_vkb | ||
// cpu version first, gpu version later | ||
template <typename FPTYPE, typename Device> | ||
void Stress_Func<FPTYPE, Device>::cal_vkb( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need unit test?
// cal_vkb | ||
// cpu version first, gpu version later | ||
template <typename FPTYPE, typename Device> | ||
void Stress_Func<FPTYPE, Device>::cal_vkb_deri( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need unit test?
I will work with @grysgreat to accelerate performance in GPU/DCU, change this PR to draft. |
@@ -162,6 +181,23 @@ struct cal_stress_nl_op<FPTYPE, psi::DEVICE_GPU> { | |||
FPTYPE* stress); | |||
}; | |||
|
|||
// cpu version first, gpu version later | |||
template <typename FPTYPE> | |||
struct cal_vkb_op<FPTYPE, psi::DEVICE_GPU>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct cal_vkb_op<FPTYPE, psi::DEVICE_GPU>{
Did this function called by any other functions ?
This PR has merged changes from @grysgreat and @Religious-J , not ready for review now, I will refactor it later. |
I have refactored stress code structure in this PR.
In case Mg16Al16, the memory cost of stress calculation from 16752 MB to 194 MB.
Linked Issue
Close #3714
Close #4158
Close #3710
Close #4026
Close #3931
Close #4031
Unit Tests and/or Case Tests for my changes
What's changed?
Any changes of core modules? (ignore if not applicable)