-
Notifications
You must be signed in to change notification settings - Fork 4
Description
With @clef-men I am trying to write benchmarks to compare concurrent schedulers, and we notice a "one-domain shift" between Domainslib and Moonpool: if D(N) is the performance of Domainslib with N domains, and M(N) is the performance of Moonpool with N "threads", then generally M(N) is noticeably worse than D(N), but in fact it is very close to D(N-1).
My understanding is that one of the following two hypotheses holds:
-
There is an unintended implementation bug in Moonpool where it uses "one less domain than expected"; for example, maybe the main domain sits idle instead of participating to task completion as one might hope in CPU-bound workloads.
-
There is an intended difference in semantics between
Domainslib.Task.setup_pool ~num_domains:nandWs_pool.create ~num_threads:n, where the Domainslib parameter has to be understood as "number of extra domains, in addition to the main domain", and the Moonpool parameter has to be understood as "total number of domains that will participate in computation".
(2) sounds more likely, but I still consider it an issue, because this is not clearly documented and it results in confusing benchmark results. Given that Domainslib is the dominant scheduler for CPU-bound tasks, I think it would be nice, if Moonpool interprets its own parameter subtly differently, to document it clearly. Hence the present issue.
Two minor remarks:
-
Even assuming that (2) holds, I remain uncertain and confused about a sub-question: Does this mean that the main domain intentionally does not participate in computations, or that it is included in
num_threadsand Moonpool with only spawn (n-1) domains? Given the sensibility of the Multicore OCaml runtime to extra domains above the number of cores, I think it's important that Moonpool users know for sure how many domains in total are going to run when they pass a given~num_threadsparameter. -
I tried to find a clear answer to the question of whether (1) and (2) holds by looking at the Moonpool codebase, and I failed to do so. There are many layers of stuff, with indirections via Picos. I don't know if there is any actionable feedback to extract from this remark, but maybe: if you add more complexity to the implementation, I think it would be nice to also make the documentation clearer and more complete.
A simple repro case
(* fibo.ml *)
let cutoff = 25
let input = 40
let rec fibo_seq n =
if n <= 1 then
n
else
fibo_seq (n - 1) + fibo_seq (n - 2)
let rec fibo_domainslib ctx n =
if n <= cutoff then
fibo_seq n
else
let open Domainslib in
let fut1 = Task.async ctx (fun () -> fibo_domainslib ctx (n - 1)) in
let fut2 = Task.async ctx (fun () -> fibo_domainslib ctx (n - 2)) in
Task.await ctx fut1 + Task.await ctx fut2
let rec fibo_moonpool ctx n =
if n <= cutoff then
fibo_seq n
else
let open Moonpool in
let fut1 = Fut.spawn ~on:ctx (fun () -> fibo_moonpool ctx (n - 1)) in
let fut2 = Fun.spawn ~on:ctx (fun () -> fibo_moonpool ctx (n - 2)) in
Fun.await ctx fut1 + Fun.await ctx fut2
let usage =
"fibo.exe <num_domains> [ domainslib | moonpool | seq ]"
let num_domains =
try int_of_string Sys.argv.(1)
with _ -> failwith usage
let implem =
try Sys.argv.(2)
with _ -> failwith usage
let () =
let output =
match implem with
| "domainslib" ->
let open Moonpool in
let ctx = Ws_pool.create ~num_threads:num_domains in
Ws_pool.run_wait_block ctx (fun () ->
fibo_domainslib ctx input
)
| "moonpool" ->
let pool = Task.setup_pool ~num_domains () in
Task.run pool (fun () ->
fibo_moonpool ctx input
)
| "seq" ->
fibo_seq input
| _ -> failwith usage
in
print_int output;
print_newline ()$ ocamlfind ocamlopt -package domainslib,moonpool -linkpkg fibo.ml -o fibo.exe
$ hyperfine "./fibo.exe 4 domainslib"
Benchmark 1: ./fibo.exe 4 domainslib
Time (mean ± σ): 207.9 ms ± 3.2 ms [User: 999.8 ms, System: 8.3 ms]
Range (min … max): 199.8 ms … 214.5 ms 14 runs
$ hyperfine "./fibo.exe 4 moonpool"
Benchmark 1: ./fibo.exe 4 moonpool
Time (mean ± σ): 262.2 ms ± 3.3 ms [User: 1003.3 ms, System: 14.8 ms]
Range (min … max): 258.2 ms … 267.7 ms 11 runs
$ hyperfine "./fibo.exe 5 moonpool"
Benchmark 1: ./fibo.exe 5 moonpool
Time (mean ± σ): 211.1 ms ± 4.0 ms [User: 1002.0 ms, System: 16.6 ms]
Range (min … max): 204.9 ms … 216.7 ms 14 runsNote: this repro case is pretty close to your own benchs/fib_rec.ml benchmark, but unfortunately in that benchmark you did not make the number of Domainslib domains a parameter (it only takes recommended_domain_count), and so you could not observe the difference at equal parameters.
Lines 75 to 79 in d957f7b
| let dl_pool = | |
| lazy | |
| (let n = Domain.recommended_domain_count () in | |
| Printf.printf "use %d domains\n%!" n; | |
| Domainslib.Task.setup_pool ~num_domains:n ()) |