Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading performance #191

Closed
nsailor opened this issue Dec 10, 2023 · 5 comments
Closed

Multithreading performance #191

nsailor opened this issue Dec 10, 2023 · 5 comments

Comments

@nsailor
Copy link

nsailor commented Dec 10, 2023

Hello,

I have been trying to evaluate the performance of this package with multiple threads but unfortunately setting multithreded=true seems to result in a slowdown (running on a 6-core CPU):

$ julia -t 6
...
julia> using Pigeons

(omitting first run)

julia> pigeons(target = toy_mvn_target(100), multithreaded=false)
┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: More than one threads are available, but explore!() loop is not parallelized as inputs.multithreaded == false
└ @ Pigeons ~/.julia/packages/Pigeons/rBo9q/src/pt/checks.jl:12
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2        7.5   5.64e-05   1.28e+04       -122   2.66e-15      0.167
        4       5.59   6.76e-05   1.47e+04       -119    2.6e-07      0.378
        8       6.04   0.000102   1.86e+04       -115    0.00129      0.329
       16       7.27    0.00016   2.42e+04       -118     0.0134      0.193
       32       6.97    0.00025   3.05e+04       -114      0.107      0.225
       64       7.03   0.000439   4.13e+04       -117     0.0531      0.219
      128       7.23   0.000797   6.08e+04       -114     0.0944      0.196
      256       7.05    0.00142   6.77e+04       -115       0.13      0.217
      512       7.14    0.00259   7.18e+04       -115      0.171      0.207
 1.02e+03       7.19    0.00494   7.91e+04       -115      0.172      0.201
────────────────────────────────────────────────────────────────────────────
PT(checkpoint = false, ...)

julia> pigeons(target = toy_mvn_target(100), multithreaded=true)
┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2        7.5     0.0227   7.49e+05       -122   2.66e-15      0.167
        4       5.59     0.0223   7.57e+05       -119    2.6e-07      0.378
        8       6.04   0.000379   5.62e+04       -115    0.00129      0.329
       16       7.27   0.000604   9.92e+04       -118     0.0134      0.193
       32       6.97   0.000983    1.8e+05       -114      0.107      0.225
       64       7.03    0.00175   3.42e+05       -117     0.0531      0.219
      128       7.23    0.00184   6.61e+05       -114     0.0944      0.196
      256       7.05     0.0036   1.27e+06       -115       0.13      0.217
      512       7.14    0.00811   2.48e+06       -115      0.171      0.207
 1.02e+03       7.19     0.0172   4.91e+06       -115      0.172      0.201
────────────────────────────────────────────────────────────────────────────
PT(checkpoint = false, ...)

julia>

Is this expected?
I tried changing the problem by increasing the number of chains, changing the dimension of the problem, and increasing the number of rounds but the results remain similar.
For instance:

julia> pigeons(target = toy_mvn_target(500), multithreaded=true, n_rounds=17)
┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2       7.59   0.000312   2.13e+04       -586   1.31e-48      0.157
        4       8.05   0.000293   3.04e+04       -591   7.42e-36      0.106
        8       8.59   0.000431   4.91e+04       -586   2.12e-32      0.046
       16       8.49   0.000672   8.77e+04       -583   7.84e-20      0.057
       32       8.71    0.00133   1.65e+05       -586   6.39e-25     0.0326
       64       8.58    0.00256   3.19e+05       -580   3.93e-17     0.0466
      128       8.75    0.00441   6.21e+05       -575   2.85e-18     0.0281
      256       8.76    0.00961   1.24e+06       -575   1.33e-17     0.0263
      512       8.86     0.0206   2.44e+06       -575   8.15e-15     0.0152
 1.02e+03       8.83     0.0285   4.89e+06       -575   2.57e-14     0.0192
 2.05e+03       8.88     0.0468   9.72e+06       -576   1.69e-12     0.0135
  4.1e+03       8.88     0.0902   1.94e+07       -574      3e-13     0.0138
 8.19e+03       8.89      0.149   3.87e+07       -576   1.75e-10     0.0121
 1.64e+04       8.89      0.289   7.74e+07       -573   1.05e-08     0.0119
 3.28e+04        8.9      0.575   1.55e+08       -574   2.01e-09     0.0115
 6.55e+04        8.9       1.16   3.09e+08       -574   7.14e-08      0.011
 1.31e+05       8.91       2.31   6.19e+08       -574   1.53e-07     0.0104
────────────────────────────────────────────────────────────────────────────
PT(checkpoint = false, ...)

julia> pigeons(target = toy_mvn_target(500), multithreaded=false, n_rounds=17)
┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: More than one threads are available, but explore!() loop is not parallelized as inputs.multithreaded == false
└ @ Pigeons ~/.julia/packages/Pigeons/rBo9q/src/pt/checks.jl:12
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2       7.59   0.000293   1.28e+04       -586   1.31e-48      0.157
        4       8.05   0.000282   1.16e+04       -591   7.42e-36      0.106
        8       8.59   0.000482   1.16e+04       -586   2.12e-32      0.046
       16       8.49   0.000855   1.27e+04       -583   7.84e-20      0.057
       32       8.71    0.00157   1.46e+04       -586   6.39e-25     0.0326
       64       8.58    0.00347   1.62e+04       -580   3.93e-17     0.0466
      128       8.75    0.00584   1.62e+04       -575   2.85e-18     0.0281
      256       8.76     0.0119   2.71e+04       -575   1.33e-17     0.0263
      512       8.86     0.0196   2.18e+04       -575   8.15e-15     0.0152
 1.02e+03       8.83     0.0182   3.43e+04       -575   2.57e-14     0.0192
 2.05e+03       8.88     0.0367   3.95e+04       -576   1.69e-12     0.0135
  4.1e+03       8.88     0.0725   5.35e+04       -574      3e-13     0.0138
 8.19e+03       8.89      0.144    5.4e+04       -576   1.75e-10     0.0121
 1.64e+04       8.89      0.287   6.08e+04       -573   1.05e-08     0.0119
 3.28e+04        8.9      0.575   6.24e+04       -574   2.01e-09     0.0115
 6.55e+04        8.9       1.15    6.4e+04       -574   7.14e-08      0.011
 1.31e+05       8.91        2.3    6.4e+04       -574   1.53e-07     0.0104
────────────────────────────────────────────────────────────────────────────
PT(checkpoint = false, ...)

julia>

Interestingly, at least in the second case the threads do seem to be busy.
It also seems that the multithreaded version is allocating significantly more memory.
Am I missing something?

@alexandrebouchard
Copy link
Member

Thanks for pointing this out. The reason for this is that the toy_mvn_target() has a special explorer performing iid sampling. We use this when we need very fast running tests (e.g. in CI) but in terms of performance profile this is atypical. Since the explorer is so quick the overhead of multithreading is not worth it. Typical explorers use algorithms such as slice sampling or gradient informed samplers, and for those algorithms we see improvements once the problem is large enough. I will post here in a few minutes a preliminary benchmark on this...

@alexandrebouchard
Copy link
Member

speedup-parallel.pdf

@alexandrebouchard
Copy link
Member

Info on the above plot:

  • targets: same as you used (mvn_toy_normal) of but with the key difference than I force the more realistic explorers using explorer = AutoMALA(). I show models of increasing dimensionality, where one would increase the number of chains prescribed by the theory (N = sqrt{d}) (x-axis) and hence the number of threads.
  • 10 repeats for each regime (confidence intervals shown)
  • blue dashed line is the theoretical best case
  • red dashed line is the threshold where there are improvements to use multithreading

@alexandrebouchard
Copy link
Member

alexandrebouchard commented Dec 11, 2023

For completeness, similar results for MPI/distributed mode instead

@nsailor
Copy link
Author

nsailor commented Dec 17, 2023 via email

@nsailor nsailor closed this as completed Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants