-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do some small optimization to ops #943
Conversation
1. avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops 2. remove a duplicated reshape
Codecov Report
@@ Coverage Diff @@
## devel #943 +/- ##
==========================================
+ Coverage 75.67% 75.68% +0.01%
==========================================
Files 92 92
Lines 7671 7671
==========================================
+ Hits 5805 5806 +1
+ Misses 1866 1865 -1
Continue to review full report at Codecov.
|
Have you benchmarked these optimization? Do they help improving the efficiency? |
I just benchmarked it. The answer is no😂 |
I think these optimizations may be more important to CPUs, compared to GPUs. I will recheck this PR. |
@@ -797,12 +798,12 @@ def _filter( | |||
bavg = bavg, | |||
trainable = trainable, | |||
suffix = "_"+str(type_i)) | |||
if type_i == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we have a bug here? if type_i == 0
and (type_input, type_i) in self.exclude_types
we had ret accumulated.
@denghuilu Would the revised code be faster on GPUs? |
I think one cannot see any difference if there are only one or two elements. A system with at least 10 atom types should be tested. |
There is a slight performance penalty on V100 GPU with the water benchmark system: optimize-ops branch
devel branch
Maybe the GPU implementation did not use the stream parallelization. |
Why the testing time of optimize-ops is so long? |
It was fixed by #1419 -- this branch is behind devel. |
merge from devel
It did have some benefits:
|
avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops