document the difference between `num_threads` on Moonpool and `num_domains` on Domainslib

With @clef-men I am trying to write benchmarks to compare concurrent schedulers, and we notice a "one-domain shift" between Domainslib and Moonpool: if D(N) is the performance of Domainslib with N domains, and M(N) is the performance of Moonpool with N "threads", then generally M(N) is noticeably worse than D(N), but in fact it is very close to D(N-1).

My understanding is that one of the following two hypotheses holds:

1. There is an unintended implementation bug in Moonpool where it uses "one less domain than expected"; for example, maybe the main domain sits idle instead of participating to task completion as one might hope in CPU-bound workloads.

2. There is an intended difference in semantics between `Domainslib.Task.setup_pool ~num_domains:n` and `Ws_pool.create ~num_threads:n`, where the Domainslib parameter has to be understood as "number of extra domains, in addition to the main domain", and the Moonpool parameter has to be understood as "total number of domains that will participate in computation".

(2) sounds more likely, but I still consider it an issue, because this is not clearly documented and it results in confusing benchmark results. Given that Domainslib is the dominant scheduler for CPU-bound tasks, I think it would be nice, if Moonpool interprets its own parameter subtly differently, to document it clearly. Hence the present issue.

Two minor remarks:

- Even assuming that (2) holds, I remain uncertain and confused about a sub-question: Does this mean that the main domain intentionally does not participate in computations, or that it is included in `num_threads` and Moonpool with only spawn (n-1) domains? Given the sensibility of the Multicore OCaml runtime to extra domains above the number of cores, I think it's important that Moonpool users know for sure how many domains in total are going to run when they pass a given `~num_threads` parameter.

- I tried to find a clear answer to the question of whether (1) and (2) holds by looking at the Moonpool codebase, and I failed to do so. There are many layers of stuff, with indirections via Picos. I don't know if there is any actionable feedback to extract from this remark, but maybe: if you add more complexity to the implementation, I think it would be nice to also make the documentation clearer and more complete.


### A simple repro case

```ocaml
(* fibo.ml *)
let cutoff = 25
let input = 40

let rec fibo_seq n =
  if n <= 1 then
    n
  else
    fibo_seq (n - 1) + fibo_seq (n - 2)

let rec fibo_domainslib ctx n =
  if n <= cutoff then
    fibo_seq n
  else
    let open Domainslib in
    let fut1 = Task.async ctx (fun () -> fibo_domainslib ctx (n - 1)) in
    let fut2 = Task.async ctx (fun () -> fibo_domainslib ctx (n - 2)) in
    Task.await ctx fut1 + Task.await ctx fut2

let rec fibo_moonpool ctx n =
  if n <= cutoff then
    fibo_seq n
  else
    let open Moonpool in
    let fut1 = Fut.spawn ~on:ctx (fun () -> fibo_moonpool ctx (n - 1)) in
    let fut2 = Fun.spawn ~on:ctx (fun () -> fibo_moonpool ctx (n - 2)) in
    Fun.await ctx fut1 + Fun.await ctx fut2

let usage =
  "fibo.exe <num_domains> [ domainslib | moonpool | seq ]"

let num_domains =
  try int_of_string Sys.argv.(1)
  with _ -> failwith usage
   

let implem =
  try Sys.argv.(2)
  with _ -> failwith usage

let () =
  let output =
    match implem with
      | "domainslib" ->
         let open Moonpool in
         let ctx = Ws_pool.create ~num_threads:num_domains in
         Ws_pool.run_wait_block ctx (fun () ->
           fibo_domainslib ctx input
         )
      | "moonpool" ->
         let pool = Task.setup_pool ~num_domains () in
         Task.run pool (fun () ->
           fibo_moonpool ctx input
         )
      | "seq" ->
         fibo_seq input
      | _ -> failwith usage
  in
  print_int output;
  print_newline ()
```

```sh
$ ocamlfind ocamlopt -package domainslib,moonpool -linkpkg fibo.ml -o fibo.exe

$ hyperfine "./fibo.exe 4 domainslib"
Benchmark 1: ./fibo.exe 4 domainslib
  Time (mean ± σ):     207.9 ms ±   3.2 ms    [User: 999.8 ms, System: 8.3 ms]
  Range (min … max):   199.8 ms … 214.5 ms    14 runs

$ hyperfine "./fibo.exe 4 moonpool"
Benchmark 1: ./fibo.exe 4 moonpool
  Time (mean ± σ):     262.2 ms ±   3.3 ms    [User: 1003.3 ms, System: 14.8 ms]
  Range (min … max):   258.2 ms … 267.7 ms    11 runs

$ hyperfine "./fibo.exe 5 moonpool"
Benchmark 1: ./fibo.exe 5 moonpool
  Time (mean ± σ):     211.1 ms ±   4.0 ms    [User: 1002.0 ms, System: 16.6 ms]
  Range (min … max):   204.9 ms … 216.7 ms    14 runs
```

Note: this repro case is pretty close to your own benchs/fib_rec.ml benchmark, but unfortunately in that benchmark you did not make the number of Domainslib domains a parameter (it only takes `recommended_domain_count`), and so you could not observe the difference at equal parameters.

https://github.com/c-cube/moonpool/blob/d957f7b54e7034f180d4a0921ccbce828e50f574/benchs/fib_rec.ml#L75-L79

	let dl_pool =
	lazy
	(let n = Domain.recommended_domain_count () in
	Printf.printf "use %d domains\n%!" n;
	Domainslib.Task.setup_pool ~num_domains:n ())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

document the difference between `num_threads` on Moonpool and `num_domains` on Domainslib #41

A simple repro case

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

document the difference between num_threads on Moonpool and num_domains on Domainslib #41

Description

A simple repro case

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

document the difference between `num_threads` on Moonpool and `num_domains` on Domainslib #41