Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-reproducibility of @threads when varying number of threads #49064

Open
miguelbiron opened this issue Mar 20, 2023 · 8 comments
Open

Non-reproducibility of @threads when varying number of threads #49064

miguelbiron opened this issue Mar 20, 2023 · 8 comments
Labels
domain:docs This change adds or pertains to documentation domain:multithreading Base.Threads and related functionality

Comments

@miguelbiron
Copy link

Hi -- first of all, thank you for an amazing piece of software!

The documentation for TaskLocalRNG reads

As long as the number of threads is not used to make decisions on task creation, simulation results are also independent of the number of available threads / CPUs.

However, when using the @threads macro in a way that does not explicitly make use of nthreads(), we (cc: @alexandrebouchard) still get different results when changing the number of threads (see MWE at the bottom).

We realized that the reason is that @threads creates nthreads() tasks -- and therefore nthreads() RNG streams -- so it is no surprise that the results change with the number of threads.

Since @threads is perhaps the most widely used approach to multithreading in Julia, we think that there should be a !!! warning in the documentation for @threads. It should state that the nthreads()--invariance guarantee from TaskLocalRNG is voided because of the reason outlined above.

Looking forward to reading your opinion on this issue.

Best,
Miguel


  1. The output of versioninfo()
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
  1. How you installed Julia: compiled from source.
  2. A minimal working example (MWE)
using Random
import Base.Threads.@threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
result = zeros(n_iters);
Random.seed!(1);
@threads for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    result[i] = rand();
end
println("Result: $(last(result))")

For 8 threads this gives

Number of threads: 8
Result: 0.25679999169092793

whereas for 1 thread the result is

Number of threads: 1
Result: 0.8785201210435906
@vtjnash vtjnash added the domain:docs This change adds or pertains to documentation label Mar 20, 2023
@giordano giordano added the domain:multithreading Base.Threads and related functionality label Mar 20, 2023
@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 20, 2023

There is nothing entirely notable here, since the order of computation (with nthreads>1) is not defined, so the value of the last computation is not defined as part of the @threads API guarantees. This is stated in the "semantics" section of the extended help for @threads.

But this also seems to be a curious case, since the number of Tasks spawned is dependent on a configuration parameter of nthreads. And since we split the default RNG at each task spawn, so the result is currently predictable (even with dynamic scheduling for the nth call to rand of the mth call to Task), but that value is not necessarily stable, but is based on how many Tasks are used to split up the work.

@miguelbiron
Copy link
Author

since the order of computation (with nthreads>1) is not defined,

This is slightly beside the point, because if I manage my own collection of RNGs -- and in fact this is how we solve the reproducibility issue in our application -- I can make the MWE work

using Random
import Base.Threads.@threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
result = zeros(n_iters);
rngs = [MersenneTwister(i) for i in 1:n_iters]
@threads for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    result[i] = rand(rngs[i]);
end
println("Result: $(last(result))")

Now the result is invariant to nthreads()

Number of threads: 1
Result: 0.17467878179018026
Number of threads: 8
Result: 0.17467878179018026

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 20, 2023

MersenneTwister will probably fail tests for statistical randomness if you seed it with consecutive numbers like that, and has extremely high memory overhead for that usage. The new Xoshiro type may be better on both counts. We could potentially have @threads re-seed itself on each iteration (like you did), but it is potentially still going to behave unexpectedly (though possibly more predictably).

@miguelbiron
Copy link
Author

Using MersenneTwister in this example is purely a matter of simplicity. In our actual experiments we use SplitMix64 as implemented in SplittableRandoms.jl. This family of PRNGs offers a principled split method. The modified example below still gives a reproducible answer.

using Random
using SplittableRandoms: SplittableRandom, split
import Base.Threads.@threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
const master_rng = SplittableRandom(1)
result = zeros(n_iters);
rngs = [split(master_rng) for _ in 1:n_iters]
@threads for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    result[i] = rand(rngs[i]);
end
println("Result: $(last(result))")
Number of threads: 1
Result: 0.4394633333251359
Number of threads: 8
Result: 0.4394633333251359

@StefanKarpinski
Copy link
Sponsor Member

I'm working on a PR that would leave the parent task's RNG unchanged by the number of child tasks it spawns. This requires keeping a "child counter" in the parent task and seeding the child task's RNG with a combination of the parent task's RNG state and the counter. Splitting off a child only advances the parent's counter, it does not affect the parent's RNG state.

@miguelbiron
Copy link
Author

miguelbiron commented Mar 23, 2023

That's interesting! However, I don't think that particular fix in and of itself will make @threads invariant to the number of threads. Because the underlying problem is that @threads spawns a number of PRNGs == number of tasks == number of threads.

I want to re-iterate that we're not in any way advocating for @threads to become reproducible when changing the number of threads. That is a design choice that has pros and cons. What we're suggesting, instead, is that the documentation should clarify that @threads voids the reproducibility guarantee that is highlighted in the docs for TaskLocalRNG.

@andrewjradcliffe
Copy link
Contributor

Xoshiro with jump functions seems to suit this use case. If one is needs deterministic provision of non-overlapping RNGs (e.g. testing purposes), one can choose a parent, advance the parent state by 2^64 or 2^128, create a new Xoshiro instance by copying the parent, advance the parent, etc. (see code below).

parent = Xoshiro(); # note that period is 2^256
xos = Vector{Xoshiro}(undef, 8);
for i = 1:8
    xos[i] = copy(parent)
    long_jump!(parent) # advance by 2^128
end

Assuming that each thread uses a single RNG instance, it's virtually impossible for their states to overlap. In general, if one needs to control RNG state and use threads, the only option is to deterministically equip each task (thread) with its own RNG in the manner that one desires.

As an aside, it's probably worth taking a look at Vigna's papers so that one can make an informed decision.

@miguelbiron
Copy link
Author

Having jump polynomials for Xoshiro instead of the current ad-hoc approach to split would certainly be an improvement in terms of the statistical dependence of the split stream.

However, that would not by itself fix the varying-nthreads-non-reproducibility of @threads---which, again, we don't argue has to be fixed, it should only be clarified. Indeed, by replacing @threads with a @sync-@spawn construct, I can achieve the desired reproducibility using the current implementation of Xoshiro.

using Random, Base.Threads

println("Number of threads: $(Threads.nthreads())")

const n_iters = 10000;
result = zeros(n_iters);
Random.seed!(1);
@elapsed @sync for i in 1:n_iters
    # in a real problem, do some expensive calculation here...
    @spawn result[i] = rand();
end
println("Result: $(last(result))")
Number of threads: 1
Result: 0.5404193718328161
Number of threads: 8
Result: 0.5404193718328161

The reason that this works is simply that the @sync-@spawn approach uses

number of tasks == number of PRNG stream == n_iters

and thus it is truly independent of nthreads(). Incidentally, it is only slighter slower than @threads, so this approach could be suggested in the docs as an alternative @threads that guarantees reproducibiliy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation domain:multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

5 participants