-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable OpenMP for prod_force
and prod_virial
#1360
Conversation
About 1 ms can be saved in each training step.
Codecov Report
@@ Coverage Diff @@
## devel #1360 +/- ##
==========================================
- Coverage 75.53% 74.32% -1.22%
==========================================
Files 91 91
Lines 7506 7482 -24
==========================================
- Hits 5670 5561 -109
- Misses 1836 1921 +85
Continue to review full report at Codecov.
|
@@ -36,14 +36,17 @@ prod_force_a_cpu( | |||
|
|||
memset(force, 0.0, sizeof(FPTYPE) * nall * 3); | |||
// compute force of a frame | |||
#pragma omp parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two threads with different i_idx may write on the same force force[j_idx * 3 + xxx]
, which gives unpredictable result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that, so that pragma omp for
is inside this loop and before another loop. The current version can pass the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I understand you compute the neighbors of the same atom using multi-threading.
What I do not understand is why you need omp parallel
to generate threads here, but not at L49 where you really need multi-threading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes when box is quite small (i.e. box size < 2 * rcut), the same atom may repeat to appear in the neighbor list. This cause inaccurate results when using OMP.
* revert prod_force OMP in #1360 Sometimes when box is quite small (i.e. box size < 2 * rcut), the same atom may repeat to appear in the neighbor list. This cause inaccurate results when using OMP. * do not update pip Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> * revert pining pip; setting env for setuptools>=64 Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
* revert prod_force OMP in deepmodeling#1360 Sometimes when box is quite small (i.e. box size < 2 * rcut), the same atom may repeat to appear in the neighbor list. This cause inaccurate results when using OMP. * do not update pip Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> * revert pining pip; setting env for setuptools>=64 Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
About 1 ms can be saved in each training step.