Skip to content
This repository has been archived by the owner on May 27, 2021. It is now read-only.

Add some timer outputs. #365

Merged
merged 4 commits into from Mar 22, 2019
Merged

Add some timer outputs. #365

merged 4 commits into from Mar 22, 2019

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Mar 21, 2019

#354

[ Info: Testing using device GeForce GTX TITAN
────────────────────────────────────────────────────────
                                          Time          
                                  ──────────────────────
        Tot / % measured:               109s / 11.6%    

Section                   ncalls     time   %tot     avg
────────────────────────────────────────────────────────
LLVM IR generation           250    5.95s  47.3%  23.8ms
Julia IR generation          256    2.90s  23.1%  11.3ms
CUDA object generation       170    2.52s  20.0%  14.8ms
PTX assembly generation      210    1.20s  9.57%  5.72ms
────────────────────────────────────────────────────────
Test Summary: | Pass  Total
CUDAnative    |  279    279

Seems reasonable.

@vchuravy
Copy link
Member

@lcw @ali-ramadhan, this might be helpful for you to understand the compilation costs of CLIMA

@maleadt
Copy link
Member Author

maleadt commented Mar 21, 2019

Note that this branch builds on #362 which requires JuliaLang/julia#31428.
After adding some more detailed timers:

 ──────────────────────────────────────────────────────────
                                             Time          
                                     ──────────────────────
          Tot / % measured:                106s / 11.0%    

 Section                     ncalls     time   %tot     avg
 ──────────────────────────────────────────────────────────
 LLVM middle-end                290    6.16s  52.4%  21.2ms
   IR generation                290    4.71s  40.1%  16.2ms
   optimization                 284    1.32s  11.2%  4.65ms
   device library               249   54.0ms  0.46%   217μs
   runtime library              249   42.2ms  0.36%   169μs
   verification                 289   8.20ms  0.07%  28.4μs
   strip debug info              65    212μs  0.00%  3.26μs
 Julia front-end                296    2.53s  21.6%  8.55ms
 CUDA object generation         170    1.83s  15.6%  10.8ms
   linking                      170    1.38s  11.7%  8.11ms
   compilation                  170    455ms  3.87%  2.67ms
 LLVM back-end                  210    1.22s  10.4%  5.80ms
   machine-code generation      207    307ms  2.62%  1.48ms
   preparation                  210   70.6ms  0.60%   336μs
 ──────────────────────────────────────────────────────────

@ali-ramadhan
Copy link

This looks awesome! Can't wait to try it out!

@maleadt maleadt force-pushed the tb/dynamic_parallelism branch 2 times, most recently from b1ecb30 to 8d2131f Compare March 22, 2019 09:54
@bors bors bot closed this Mar 22, 2019
@maleadt maleadt changed the base branch from tb/dynamic_parallelism to master March 22, 2019 14:48
@maleadt maleadt reopened this Mar 22, 2019
@maleadt maleadt merged commit 1d310b1 into master Mar 22, 2019
@bors bors bot deleted the tb/timer_outputs branch March 22, 2019 14:49
@maleadt
Copy link
Member Author

maleadt commented Mar 22, 2019

@ali-ramadhan you should be able to test this on Julia 1.0/1.1/master now.

@ali-ramadhan
Copy link

Thanks for the heads up! Should help a lot with CliMA/Oceananigans.jl#66

Will update packages and give it a try.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants