-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offload short range in CoulombPBCAA #3842
Conversation
Test this please |
Test this please |
Test this please |
Test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two very minor questions:
- I noticed a couple of pragma unrolls. Did you see any benefit when adding these? In 2022 I would hope that the compilers can make a sensible choice for a small fixed size loop.
- Unless codecov is confused, the coverage on the change is a little low. Are you able to easily expand the coverage to get >44%?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to help get through the PR backlog. Still interested in unroll & test coverage info.
Test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks looks pretty good for me.
I made a couple of suggestions but they are just nice to have.
@@ -122,7 +124,15 @@ void HamiltonianFactory::addCoulombPotential(xmlNodePtr cur) | |||
} | |||
#else | |||
if (applyPBC) | |||
targetH->addOperator(std::make_unique<CoulombPBCAA>(*ptclA, quantum, doForces), title, physical); | |||
{ | |||
if (use_gpu.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input handling is pretty inconsistent. At a certain point it would be better to spend the time on an input class.
@@ -51,7 +51,7 @@ TEST_CASE("Coulomb PBC A-A Ewald3D", "[hamiltonian]") | |||
LRCoulombSingleton::CoulombHandler->initBreakup(ions); | |||
|
|||
|
|||
CoulombPBCAA caa = CoulombPBCAA(ions, false); | |||
CoulombPBCAA caa(ions, false, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many flags in a row it might be worth defining an options struct. Reading all these is painful.
struct CoulombPBOptions {
bool active;
bool computeForces;
bool use_offload;
}
...
CoulombPBOptions options;
options.active = false;
options.forces = false;
options.offload = false;
CoulombPBCAA caa(ions, options);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I saw computeForces labelled as deprecated. I think the list won't grow further.
computeDist is simplified from computeDistancesOffload. So unroll is kept. Need to assess both places to determine the removal.
Need to run code coverage with offload to host. |
Review after #3839
Proposed changes
Offload short range in CoulombPBCAA. The particle range will be chopped into chunks and then compute the pair distance and short range Coulomb chunk by chunk. The reason doing is is to avoid storing AA full table on the device while a large enough chunk size should ensure heavy enough workload per offload kernel. Hard-coded size 64 should be reasonable but will need further investigation.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
epyc-server
Checklist