Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 4x more threads for FFTW #1120

Merged
merged 2 commits into from
Nov 24, 2020
Merged

Use 4x more threads for FFTW #1120

merged 2 commits into from
Nov 24, 2020

Conversation

ali-ramadhan
Copy link
Member

Resolves #1113

Needs new multi-threading benchmarks. Will do as part of #1088 (need to add nice multithreading benchmark script on ar/benchmarks branch).

@codecov
Copy link

codecov bot commented Oct 30, 2020

Codecov Report

Merging #1120 (9db165f) into master (6112c6c) will increase coverage by 1.53%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1120      +/-   ##
==========================================
+ Coverage   55.78%   57.31%   +1.53%     
==========================================
  Files         171      162       -9     
  Lines        4005     3882     -123     
==========================================
- Hits         2234     2225       -9     
+ Misses       1771     1657     -114     
Impacted Files Coverage Δ
src/Oceananigans.jl 66.66% <0.00%> (ø)
src/Fields/field_tuples.jl 72.09% <0.00%> (-1.72%) ⬇️
src/Fields/Fields.jl 25.00% <0.00%> (ø)
src/Models/Models.jl 100.00% <0.00%> (ø)
src/TimeSteppers/store_tendencies.jl 72.72% <0.00%> (ø)
src/TimeSteppers/clock.jl
...dels/IncompressibleModels/non_dimensional_model.jl
...ncompressibleModels/update_hydrostatic_pressure.jl
.../Models/ShallowWaterModels/calculate_tendencies.jl
.../IncompressibleModels/show_incompressible_model.jl
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3108501...8043370. Read the comment docs.

@navidcy navidcy requested a review from glwagner November 1, 2020 06:39
Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moar speed

@navidcy
Copy link
Collaborator

navidcy commented Nov 15, 2020

@glwagner, should FourierFlows.jl be doing this also?

@glwagner
Copy link
Member

I think it's irrelevant if you aren't also using julia's built-in multithreading. But FourierFlows could use it as a default in case any subidiary models use multithreading at some point.

@ali-ramadhan
Copy link
Member Author

Ran some multithreading benchmarks on Tartarus and Satori but the results weren't really different from previous benchmarks. Perhaps it's still good to merge in case this PR improves performance on other machines?

Multithreading on Tartarus

Oceananigans v0.44.1
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V
                              Multithreading benchmarks
┌──────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────────┬──────────┐
│ size │ threads │      min │   median │     mean │      max │     memory │   allocs │
├──────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────────┼──────────┤
│  512 │       1 │ 38.207 s │ 38.207 s │ 38.207 s │ 38.207 s │ 294.28 KiB │     1930 │
│  512 │       2 │ 31.129 s │ 31.129 s │ 31.129 s │ 31.129 s │ 158.41 MiB │ 10341843 │
│  512 │       4 │ 13.182 s │ 13.182 s │ 13.182 s │ 13.182 s │  59.70 MiB │  3877803 │
│  512 │       8 │  7.637 s │  7.637 s │  7.637 s │  7.637 s │  32.84 MiB │  2109253 │
│  512 │      16 │  4.633 s │  4.680 s │  4.680 s │  4.728 s │  17.21 MiB │  1062196 │
│  512 │      32 │  3.950 s │  3.955 s │  3.955 s │  3.960 s │   9.64 MiB │   517538 │
│  512 │      48 │  3.908 s │  4.012 s │  4.012 s │  4.115 s │  10.23 MiB │   472979 │
└──────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────────┴──────────┘
             Multithreading speedup
┌──────┬─────────┬─────────┬─────────┬─────────┐
│ size │ threads │ speedup │  memory │  allocs │
├──────┼─────────┼─────────┼─────────┼─────────┤
│  512 │       1 │     1.0 │     1.0 │     1.0 │
│  512 │       2 │  1.2274 │   551.2 │ 5358.47 │
│  512 │       4 │  2.8984 │ 207.742 │ 2009.22 │
│  512 │       8 │ 5.00278 │ 114.286 │ 1092.88 │
│  512 │      16 │  8.1637 │ 59.8949 │ 550.361 │
│  512 │      32 │ 9.65944 │ 33.5302 │ 268.154 │
│  512 │      48 │ 9.52408 │ 35.6085 │ 245.067 │
└──────┴─────────┴─────────┴─────────┴─────────┘

Multithreading on Satori

Oceananigans v0.44.1
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (powerpc64le-unknown-linux-gnu)
  CPU: unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, pwr9)
  GPU: Tesla V100-SXM2-32GB
                              Multithreading benchmarks
┌──────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────────┬─────────┐
│ size │ threads │      min │   median │     mean │      max │     memory │  allocs │
├──────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────────┼─────────┤
│  512 │       1 │ 41.167 s │ 41.167 s │ 41.167 s │ 41.167 s │ 320.84 KiB │    1930 │
│  512 │       2 │ 33.025 s │ 33.025 s │ 33.025 s │ 33.025 s │  73.95 MiB │ 4804429 │
│  512 │       4 │ 19.869 s │ 19.869 s │ 19.869 s │ 19.869 s │  42.63 MiB │ 2749985 │
│  512 │       8 │ 10.202 s │ 10.202 s │ 10.202 s │ 10.202 s │  19.33 MiB │ 1204214 │
│  512 │      16 │  6.337 s │  6.337 s │  6.337 s │  6.337 s │  11.51 MiB │  651120 │
│  512 │      32 │  4.657 s │  4.658 s │  4.658 s │  4.660 s │   9.25 MiB │  418554 │
│  512 │      64 │  3.979 s │  4.035 s │  4.035 s │  4.091 s │   7.85 MiB │  158974 │
│  512 │     128 │  3.953 s │  3.984 s │  3.984 s │  4.015 s │  12.55 MiB │  130658 │
│  512 │     160 │  3.584 s │  3.591 s │  3.591 s │  3.597 s │  14.82 MiB │  125631 │
└──────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────────┴─────────┘
             Multithreading speedup
┌──────┬─────────┬─────────┬─────────┬─────────┐
│ size │ threads │ speedup │  memory │  allocs │
├──────┼─────────┼─────────┼─────────┼─────────┤
│  512 │       1 │     1.0 │     1.0 │     1.0 │
│  512 │       2 │ 1.24655 │ 236.003 │ 2489.34 │
│  512 │       4 │ 2.07194 │ 136.066 │ 1424.86 │
│  512 │       8 │ 4.03504 │ 61.6839 │ 623.945 │
│  512 │      16 │ 6.49599 │ 36.7483 │ 337.368 │
│  512 │      32 │ 8.83747 │ 29.5136 │ 216.867 │
│  512 │      64 │ 10.2021 │ 25.0563 │ 82.3699 │
│  512 │     128 │ 10.3332 │ 40.0493 │ 67.6984 │
│  512 │     160 │ 11.4654 │ 47.2913 │ 65.0938 │
└──────┴─────────┴─────────┴─────────┴─────────┘

@ali-ramadhan
Copy link
Member Author

Hmmm, using 2 threads seems to allocate the most memory by far. Seems pretty weird.

@navidcy
Copy link
Collaborator

navidcy commented Nov 24, 2020

Wait... were these benchmarks with the latest commit from this PR?
The benchmarks before the PR are in the Docs I guess?

@ali-ramadhan
Copy link
Member Author

Ah sorry I had the old benchmarks open in a different tab and forgot to copy paste them here: #869 (comment)

Tartarus: Julia 1.5.0 + Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
 1 thread:  34.60 s
 4 threads: 12.00 s (2.88x)
 8 threads:  7.00 s (4.94x)
16 threads:  4.93 s (7.02x)
24 threads:  4.59 s (7.54x)
32 threads:  4.25 s (8.14x)
40 threads:  4.06 s (8.52x)
48 threads:  4.19 s (8.26x) [some of the 48 cores were in use]

Satori: Julia 1.4.1 + IBM Power System AC922 (8335-GTH)
  1 thread:  47.20 s
  4 threads: 21.70 s (2.18x)
  8 threads: 11.30 s (4.18x)
 16 threads:  7.16 s (6.59x)
 32 threads:  5.40 s (8.74x)
 64 threads:  4.29 s (11.0x)
128 threads:  4.14 s (11.4x)
160 threads:  4.02 s (11.7x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When multithreading use 4 times more threads for FFTW
3 participants