Non-reproducibility of @threads when varying number of threads #49064

miguelbiron · 2023-03-20T15:44:13Z

Hi -- first of all, thank you for an amazing piece of software!

The documentation for TaskLocalRNG reads

As long as the number of threads is not used to make decisions on task creation, simulation results are also independent of the number of available threads / CPUs.

However, when using the @threads macro in a way that does not explicitly make use of nthreads(), we (cc: @alexandrebouchard) still get different results when changing the number of threads (see MWE at the bottom).

We realized that the reason is that @threads creates nthreads() tasks -- and therefore nthreads() RNG streams -- so it is no surprise that the results change with the number of threads.

Since @threads is perhaps the most widely used approach to multithreading in Julia, we think that there should be a !!! warning in the documentation for @threads. It should state that the nthreads()--invariance guarantee from TaskLocalRNG is voided because of the reason outlined above.

Looking forward to reading your opinion on this issue.

Best,
Miguel

The output of versioninfo()

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

How you installed Julia: compiled from source.
A minimal working example (MWE)

using Random
import Base.Threads.@threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
result = zeros(n_iters);
Random.seed!(1);
@threads for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    result[i] = rand();
end
println("Result: $(last(result))")

For 8 threads this gives

Number of threads: 8
Result: 0.25679999169092793

whereas for 1 thread the result is

Number of threads: 1
Result: 0.8785201210435906

The text was updated successfully, but these errors were encountered:

vtjnash · 2023-03-20T18:08:04Z

There is nothing entirely notable here, since the order of computation (with nthreads>1) is not defined, so the value of the last computation is not defined as part of the @threads API guarantees. This is stated in the "semantics" section of the extended help for @threads.

But this also seems to be a curious case, since the number of Tasks spawned is dependent on a configuration parameter of nthreads. And since we split the default RNG at each task spawn, so the result is currently predictable (even with dynamic scheduling for the nth call to rand of the mth call to Task), but that value is not necessarily stable, but is based on how many Tasks are used to split up the work.

miguelbiron · 2023-03-20T18:44:15Z

since the order of computation (with nthreads>1) is not defined,

This is slightly beside the point, because if I manage my own collection of RNGs -- and in fact this is how we solve the reproducibility issue in our application -- I can make the MWE work

using Random
import Base.Threads.@threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
result = zeros(n_iters);
rngs = [MersenneTwister(i) for i in 1:n_iters]
@threads for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    result[i] = rand(rngs[i]);
end
println("Result: $(last(result))")

Now the result is invariant to nthreads()

Number of threads: 1
Result: 0.17467878179018026

Number of threads: 8
Result: 0.17467878179018026

vtjnash · 2023-03-20T20:34:52Z

MersenneTwister will probably fail tests for statistical randomness if you seed it with consecutive numbers like that, and has extremely high memory overhead for that usage. The new Xoshiro type may be better on both counts. We could potentially have @threads re-seed itself on each iteration (like you did), but it is potentially still going to behave unexpectedly (though possibly more predictably).

miguelbiron · 2023-03-20T20:47:16Z

Using MersenneTwister in this example is purely a matter of simplicity. In our actual experiments we use SplitMix64 as implemented in SplittableRandoms.jl. This family of PRNGs offers a principled split method. The modified example below still gives a reproducible answer.

using Random
using SplittableRandoms: SplittableRandom, split
import Base.Threads.@threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
const master_rng = SplittableRandom(1)
result = zeros(n_iters);
rngs = [split(master_rng) for _ in 1:n_iters]
@threads for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    result[i] = rand(rngs[i]);
end
println("Result: $(last(result))")

Number of threads: 1
Result: 0.4394633333251359

Number of threads: 8
Result: 0.4394633333251359

StefanKarpinski · 2023-03-21T14:56:06Z

I'm working on a PR that would leave the parent task's RNG unchanged by the number of child tasks it spawns. This requires keeping a "child counter" in the parent task and seeding the child task's RNG with a combination of the parent task's RNG state and the counter. Splitting off a child only advances the parent's counter, it does not affect the parent's RNG state.

miguelbiron · 2023-03-23T18:13:01Z

That's interesting! However, I don't think that particular fix in and of itself will make @threads invariant to the number of threads. Because the underlying problem is that @threads spawns a number of PRNGs == number of tasks == number of threads.

I want to re-iterate that we're not in any way advocating for @threads to become reproducible when changing the number of threads. That is a design choice that has pros and cons. What we're suggesting, instead, is that the documentation should clarify that @threads voids the reproducibility guarantee that is highlighted in the docs for TaskLocalRNG.

andrewjradcliffe · 2023-03-24T00:53:15Z

Xoshiro with jump functions seems to suit this use case. If one is needs deterministic provision of non-overlapping RNGs (e.g. testing purposes), one can choose a parent, advance the parent state by 2^64 or 2^128, create a new Xoshiro instance by copying the parent, advance the parent, etc. (see code below).

parent = Xoshiro(); # note that period is 2^256
xos = Vector{Xoshiro}(undef, 8);
for i = 1:8
    xos[i] = copy(parent)
    long_jump!(parent) # advance by 2^128
end

Assuming that each thread uses a single RNG instance, it's virtually impossible for their states to overlap. In general, if one needs to control RNG state and use threads, the only option is to deterministically equip each task (thread) with its own RNG in the manner that one desires.

As an aside, it's probably worth taking a look at Vigna's papers so that one can make an informed decision.

miguelbiron · 2023-03-26T21:47:08Z

Having jump polynomials for Xoshiro instead of the current ad-hoc approach to split would certainly be an improvement in terms of the statistical dependence of the split stream.

However, that would not by itself fix the varying-nthreads-non-reproducibility of @threads---which, again, we don't argue has to be fixed, it should only be clarified. Indeed, by replacing @threads with a @sync-@spawn construct, I can achieve the desired reproducibility using the current implementation of Xoshiro.

using Random, Base.Threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
result = zeros(n_iters);
Random.seed!(1);
@elapsed @sync for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    @spawn result[i] = rand();
end
println("Result: $(last(result))")

Number of threads: 1
Result: 0.5404193718328161

Number of threads: 8
Result: 0.5404193718328161

The reason that this works is simply that the @sync-@spawn approach uses

number of tasks == number of PRNG stream == n_iters

and thus it is truly independent of nthreads(). Incidentally, it is only slighter slower than @threads, so this approach could be suggested in the docs as an alternative @threads that guarantees reproducibiliy.

vtjnash added the domain:docs This change adds or pertains to documentation label Mar 20, 2023

giordano added the domain:multithreading Base.Threads and related functionality label Mar 20, 2023

jmkuhn mentioned this issue Apr 26, 2023

Random.seed! does not yield reproducible numbers #49522

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-reproducibility of @threads when varying number of threads #49064

Non-reproducibility of @threads when varying number of threads #49064

miguelbiron commented Mar 20, 2023

vtjnash commented Mar 20, 2023

miguelbiron commented Mar 20, 2023

vtjnash commented Mar 20, 2023

miguelbiron commented Mar 20, 2023

StefanKarpinski commented Mar 21, 2023

miguelbiron commented Mar 23, 2023 •

edited

Loading

andrewjradcliffe commented Mar 24, 2023

miguelbiron commented Mar 26, 2023

Non-reproducibility of @threads when varying number of threads #49064

Non-reproducibility of @threads when varying number of threads #49064

Comments

miguelbiron commented Mar 20, 2023

vtjnash commented Mar 20, 2023

miguelbiron commented Mar 20, 2023

vtjnash commented Mar 20, 2023

miguelbiron commented Mar 20, 2023

StefanKarpinski commented Mar 21, 2023

miguelbiron commented Mar 23, 2023 • edited Loading

andrewjradcliffe commented Mar 24, 2023

miguelbiron commented Mar 26, 2023

miguelbiron commented Mar 23, 2023 •

edited

Loading