do some small optimization to ops #943

njzjz · 2021-08-10T01:43:12Z

avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops

1. avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops 2. remove a duplicated reshape

codecov-commenter · 2021-08-10T01:47:29Z

Codecov Report

Merging #943 (30f8e7c) into devel (0d8fe0a) will increase coverage by 0.01%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##            devel     #943      +/-   ##
==========================================
+ Coverage   75.67%   75.68%   +0.01%     
==========================================
  Files          92       92              
  Lines        7671     7671              
==========================================
+ Hits         5805     5806       +1     
+ Misses       1866     1865       -1

Impacted Files	Coverage Δ
deepmd/fit/polar.py	`49.75% <50.00%> (+0.48%)`	⬆️
deepmd/descriptor/se_a.py	`94.17% <100.00%> (ø)`
deepmd/fit/dipole.py	`93.24% <100.00%> (ø)`
deepmd/fit/ener.py	`90.90% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d8fe0a...30f8e7c. Read the comment docs.

amcadmus · 2021-08-10T02:01:45Z

Have you benchmarked these optimization? Do they help improving the efficiency?

njzjz · 2021-08-10T02:58:19Z

Have you benchmarked these optimization? Do they help improving the efficiency?

I just benchmarked it. The answer is no😂

njzjz · 2022-01-13T13:51:25Z

I think these optimizations may be more important to CPUs, compared to GPUs. I will recheck this PR.

njzjz · 2022-01-14T07:36:32Z

Do some profiling here:
(1) + vs accumulate_n
+ one by one has more ops than accumulate_n once.

+

accumulate_n

(2) concat

wanghan-iapcm · 2022-01-15T05:29:01Z

deepmd/descriptor/se_a.py

@@ -797,12 +798,12 @@ def _filter(
                      bavg = bavg,
                      trainable = trainable,
                      suffix = "_"+str(type_i))
-                  if type_i == 0:


Did we have a bug here? if type_i == 0 and (type_input, type_i) in self.exclude_types we had ret accumulated.

wanghan-iapcm · 2022-01-15T05:30:31Z

@denghuilu Would the revised code be faster on GPUs?

njzjz · 2022-01-16T16:14:37Z

I think one cannot see any difference if there are only one or two elements. A system with at least 10 atom types should be tested.

denghuilu · 2022-02-10T03:09:13Z

There is a slight performance penalty on V100 GPU with the water benchmark system:

optimize-ops branch


DEEPMD INFO    batch     100 training time 3.36 s, testing time 2.34 s
DEEPMD INFO    batch     200 training time 1.73 s, testing time 2.32 s
DEEPMD INFO    batch     300 training time 1.75 s, testing time 2.32 s
DEEPMD INFO    batch     400 training time 1.73 s, testing time 2.41 s
DEEPMD INFO    batch     500 training time 1.72 s, testing time 2.37 s
DEEPMD INFO    batch     600 training time 1.74 s, testing time 2.36 s
DEEPMD INFO    batch     700 training time 1.76 s, testing time 2.43 s
DEEPMD INFO    batch     800 training time 1.77 s, testing time 2.48 s
DEEPMD INFO    batch     900 training time 1.75 s, testing time 2.47 s
DEEPMD INFO    batch    1000 training time 1.72 s, testing time 2.41 s

devel branch

DEEPMD INFO    batch     100 training time 3.03 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.63 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.62 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s

Maybe the GPU implementation did not use the stream parallelization.

wanghan-iapcm · 2022-02-10T05:08:55Z

There is a slight performance penalty on V100 GPU with the water benchmark system:

optimize-ops branch


DEEPMD INFO    batch     100 training time 3.36 s, testing time 2.34 s
DEEPMD INFO    batch     200 training time 1.73 s, testing time 2.32 s
DEEPMD INFO    batch     300 training time 1.75 s, testing time 2.32 s
DEEPMD INFO    batch     400 training time 1.73 s, testing time 2.41 s
DEEPMD INFO    batch     500 training time 1.72 s, testing time 2.37 s
DEEPMD INFO    batch     600 training time 1.74 s, testing time 2.36 s
DEEPMD INFO    batch     700 training time 1.76 s, testing time 2.43 s
DEEPMD INFO    batch     800 training time 1.77 s, testing time 2.48 s
DEEPMD INFO    batch     900 training time 1.75 s, testing time 2.47 s
DEEPMD INFO    batch    1000 training time 1.72 s, testing time 2.41 s

devel branch

DEEPMD INFO    batch     100 training time 3.03 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.63 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.62 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s

Maybe the GPU implementation did not use the stream parallelization.

Why the testing time of optimize-ops is so long?

njzjz · 2022-02-10T05:18:38Z

Why the testing time of optimize-ops is so long?

It was fixed by #1419 -- this branch is behind devel.

merge from devel

denghuilu · 2022-02-10T07:30:46Z

It did have some benefits:

DEEPMD INFO    batch     200 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.56 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.57 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch    1000 training time 1.57 s, testing time 0.02 s

do some small optimization to ops

e72e27b

1. avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops 2. remove a duplicated reshape

amcadmus requested a review from denghuilu August 10, 2021 02:01

njzjz marked this pull request as draft August 10, 2021 02:40

njzjz closed this Aug 10, 2021

njzjz reopened this Jan 13, 2022

njzjz removed the request for review from denghuilu January 13, 2022 13:50

njzjz added 3 commits January 14, 2022 01:52

Merge branch 'devel' into optimize-ops

fdc9436

revert unnecessary changes

0385c47

revert wfc.py as it has been decrepated

1f05364

njzjz marked this pull request as ready for review January 14, 2022 07:36

wanghan-iapcm reviewed Jan 15, 2022

View reviewed changes

wanghan-iapcm requested a review from denghuilu January 15, 2022 05:29

Merge pull request #77 from deepmodeling/devel

30f8e7c

merge from devel

wanghan-iapcm approved these changes Feb 11, 2022

View reviewed changes

wanghan-iapcm merged commit 82c787d into deepmodeling:devel Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do some small optimization to ops #943

do some small optimization to ops #943

njzjz commented Aug 10, 2021 •

edited

Loading

codecov-commenter commented Aug 10, 2021 •

edited

Loading

amcadmus commented Aug 10, 2021

njzjz commented Aug 10, 2021

njzjz commented Jan 13, 2022

njzjz commented Jan 14, 2022

wanghan-iapcm Jan 15, 2022

wanghan-iapcm commented Jan 15, 2022 •

edited

Loading

njzjz commented Jan 16, 2022 •

edited

Loading

denghuilu commented Feb 10, 2022

wanghan-iapcm commented Feb 10, 2022

njzjz commented Feb 10, 2022

denghuilu commented Feb 10, 2022

do some small optimization to ops #943

do some small optimization to ops #943

Conversation

njzjz commented Aug 10, 2021 • edited Loading

codecov-commenter commented Aug 10, 2021 • edited Loading

Codecov Report

amcadmus commented Aug 10, 2021

njzjz commented Aug 10, 2021

njzjz commented Jan 13, 2022

njzjz commented Jan 14, 2022

wanghan-iapcm Jan 15, 2022

Choose a reason for hiding this comment

wanghan-iapcm commented Jan 15, 2022 • edited Loading

njzjz commented Jan 16, 2022 • edited Loading

denghuilu commented Feb 10, 2022

wanghan-iapcm commented Feb 10, 2022

njzjz commented Feb 10, 2022

denghuilu commented Feb 10, 2022

njzjz commented Aug 10, 2021 •

edited

Loading

codecov-commenter commented Aug 10, 2021 •

edited

Loading

wanghan-iapcm commented Jan 15, 2022 •

edited

Loading

njzjz commented Jan 16, 2022 •

edited

Loading