Tasks: don't advance task RNG on task spawn #49110

StefanKarpinski · 2023-03-22T20:50:48Z

Previously we had this unfortunate behavior:

julia> Random.seed!(123)
TaskLocalRNG()

julia> randn()
-0.6457306721039767

julia> Random.seed!(123)
TaskLocalRNG()

julia> fetch(@async nothing)

julia> randn()
0.4922456865251828

In other words: the mere act of spawning a child task affects the parent task's RNG (by advancing it four times). This PR preserves the desirable parts of the previous situation: when seeded, the parent and child RNG streams are reproducible. Moreover, it fixes the undesirable behavior:

julia> Random.seed!(123)
TaskLocalRNG()

julia> randn()
-0.6457306721039767

julia> Random.seed!(123)
TaskLocalRNG()

julia> fetch(@async nothing)

julia> randn()
-0.6457306721039767

In other words: the parent RNG is unaffected by spawning a child.

The design is based on the SplitMix 1 and DotMix 2 RNGs, but with some simplifications based on observing that the task tree is always binary and therefore we only need binary pedigree coefficients: when a task forks a child, zero is appended to its pedigree, and the child's pedigree is the same with a one in the last place. Thus all the pedigree coefficients are binary. How does this help matters? In the proof of collision resistance in the DotMix paper, working in a prime modulus is only necessary to guarantee that the difference between pedigree coordinates has a multiplicative inverse. If the coefficients are binary, then the difference is always 1, which means we can work in any modulus, including Z/2^64, which lets us use native integer arithmetic.

Similar to SplitMix, instead of explicitly storing pedigree coordinates, we store each task's dot product and derive each child task's dot product by adding to the parent dot product, the random weight coefficeint for the current tree depth. No multiplication is required, we just add the random weight for the current tree depth to the parent's dot product to get the child's dot product.

Pseudorandom weights for the SplitMix dot product are generated using an internal PCG RNG, specifically the PCG-RXS-M-XS variant. We chose this RNG because we need something small and fast but unlikely to have artifacts which might sabbotage the collision resistance of SplitMix. Since PCG-RXS-M-XS passes BigCrush with only 36 bits of state and we use 64 bits of state, it fits the bill quite well. The major caveat of this RNG is that it produces each value once, which makes it insecure, but this is actually beneficial here: we don't want any repeated weights. We also don't care that this RNG is insecure and invertible since it's only used internally to seed forked child tasks. Rather than the classic LCG multiplier inherited from Knuth, we use the best full width multiplier found by Steele & Vigna 3 searching for high spectrum constants.

We reuse the PCG-RXS-M-XS output function for mixing the bits of the SplitMix dot product, instead of the MurmurHash3 mixing function that it normally uses since we're already using for the internal PCG generator. The choice of mixing output function for SplitMix is essentially arbitrary and only serves to cascade bit differences in dot products.

vtjnash

Amazing!

stdlib/Random/src/Xoshiro.jl

StefanKarpinski · 2023-03-23T17:43:45Z

Just pushed a couple of new commits.

The first one increases instruction-level parallelism in the task spawn by using the previous LCG state, which is perfectly good, to update the SplitMix dot product and do the LCG update in parallel, writing the new value back to both the parent and child tasks. This is a straight improvement, no reason not to do this.

The second commit is speculative: it makes the seeding of child tasks independent of the usage of the parent's main xosiro256++ RNG; instead it only depends on the rngState4 and rngState5 seed values that are also set when the main RNG is seeded. In practice what this means is the following:

julia> Random.seed!(123)
TaskLocalRNG()

julia> randn()
-0.6457306721039767

julia> fetch(@async randn())
-0.8761314978436953

julia> Random.seed!(123)
TaskLocalRNG()

julia> fetch(@async randn())
-0.8761314978436953

julia> randn()
-0.6457306721039767

In other words, using the RNG in the parent doesn't affect the RNG stream of the child—the only thing that determines the childs RNG stream is how the root task was seeded and the task structure, i.e. where in the binary task tree the child is.

I've discussed a bit with @vtjnash and we're not sure about this change. It makes RNG streams and child tasks easier to reason about and less brittle—using the RNG in the parent won't cascade changes into all the children—which seems good. The potential downside is that if a SplitMix dot product collision does happen, then those two tasks will have identical RNG output, whereas if the SplitMix dot product is mixed with the xoshiro256 state, then that will only happen if there is a dot product collision and the xoshiro256 state started out the same, which is extremely unlikely.

StefanKarpinski · 2023-03-23T20:33:33Z

Spurred, somewhat ironically, by thinking about the version of this where the child's main RNG seeding is independent of the parent's main RNG state, I've come up with a further improvement to the original version where the parent's RNG does affect the child's RNG.

One observation is that SplitMix collision avoidance actually only helps in cases where the the xoshiro state is identical in two tasks to start with. In cases where the xoshiro state is not identical, different SplitMix hashes can theoretically put you into identical xoshiro states and nothing about the SplitMix construction helps to prevent that, so SplitMix really only helps when the xoshiro states start out the same.

I also noted that if we focus on consecutive children of the same task with no intervening samples from the main RNG perturbing the xoshiro state, then we would be better off using PCG output to disturb the xoshiro state than using the SplitMix dot product to perturb it: PCG output is guaranteed to be collision free since it produces each possible output once per full RNG cycle (2^64), whereas the SplitMix dot product merely makes collisions unlikely and the birthday paradox tells us that if with a 64-bit hash, if we want the chances of collision to be less than 1 in a billion, then we're in trouble if we start 200k tasks. If we perturb xoshiro's state with the output of PCG, then we could start 2^64 (> 10^19) immediate child tasks without collision.

Looking at my original code, the SplitMix dot product is perturbed via addition by the PCG output, but what the above suggests is: What if we were to perturb the xoshiro state directly with the PCG output? That guarantees that each immediate child gets a unique state. Which to: Why not use each xoshiro state register as a SplitMix dot product? Why not, indeed. Assuming no one advances their xoshiro RNG, parent and sibling tasks are guaranteed not to collide (by PCG invertibility), and cousin tasks differ from their greatest common ancestor by SplitMix dot product, which gives the same collision resistance as before—but now we have 256 bits of dot product hash instead of just 64! In other words, we gain multiple benefits:

Parent and sibling tasks are guaranteed not to collide.
We don't have to allocate space in each task for a SplitMix dot product—we save 64 bits per task.
Instead we get to use all 256 bits of xoshiro256 state as a SplitMix dot product, which gives astronomically better collision avoidance for tasks that are not directly related.

There are some subtleties:

The weights you perturb each xoshiro register have to be effectively independent or having four of them isn't actually helping your collision avoidance—i.e. you have to make sure a collision in one register doesn't make a collision in another register more likely.
PCG generates zero sometimes, which would make the child identical to the parent. That's fine for one register, but you want to make sure that you don't perturb all the registers by zero at the same time.

One obvious option is to draw each of the four weights from the same PCG, which guarantees that they are all distinct and should be unrelated enough to pass statistical muster with BigCrush and the like. There's an argument to be made that the set of possible weights that gives the collision probability isn't really 2^256, but rather 2^64 since that's the size of the PCG state. I'll have to think on that.

Without further ado, here's the improved task split function:

void jl_rng_split(uint64_t dst[5], uint64_t src[5) JL_NOTSAFEPOINT
{
    uint64_t lcg = src[4]; // load internal PCG's LCG state
    dst[0] = src[0] + pcg_out(lcg); lcg = lcg * LCG_MUL + 1;
    dst[1] = src[1] + pcg_out(lcg); lcg = lcg * LCG_MUL + 1;
    dst[2] = src[2] + pcg_out(lcg); lcg = lcg * LCG_MUL + 1;
    dst[3] = src[3] + pcg_out(lcg); lcg = lcg * LCG_MUL + 1;
    dst[4] = lcg;
}

Yep, that's it. Note that we do away with one word of RNG state (the dot product).

oscardssmith · 2023-03-23T20:44:33Z

That's awesome!

StefanKarpinski · 2023-03-25T18:56:56Z

Ok, I pushed that as a commit, but then replaced it with an approach that's easier to vectorize and that seems better. We're back to the child RNG being affected by the parent RNG because it lets us use the main RNG's state for SplitMix dot products, which gives really excellent collision resistance.

StefanKarpinski · 2023-03-27T18:42:48Z

I'm going to squash this and add a comment explaining how it works, but I want to keep history around for posterity, so here's a gist of the commit log with diffs.

StefanKarpinski · 2023-03-28T16:00:13Z

This is ready to go from my POV, it has tests and a long comment explaining how (and why) it works. The CI situation seems like a trash fire but I don't think that's due to this PR.

oscardssmith · 2023-03-30T03:06:57Z

What are the odds we can combine this with your idea for a 192 bit RNG so we don't have to increase the task size?

StefanKarpinski · 2023-03-30T12:50:37Z

I'd prefer to make those changes independently; they're quite orthogonal. I'm also not fully sold on that whereas this is definitely better. Also, this rounds the task size up from 15 words to 16, which seems like a nicer size.

JeffBezanson · 2023-03-30T19:06:22Z

Plus the type tag though...

StefanKarpinski · 2023-03-30T20:25:56Z

Woo! Got the check tests passing.

Previously we had this unfortunate behavior: julia> Random.seed!(123) TaskLocalRNG() julia> randn() -0.6457306721039767 julia> Random.seed!(123) TaskLocalRNG() julia> fetch(@async nothing) julia> randn() 0.4922456865251828 In other words: the mere act of spawning a child task affects the parent task's RNG (by advancing it four times). This PR preserves the desirable parts of the previous situation: when seeded, the parent and child RNG streams are reproducible. Moreover, it fixes the undesirable behavior: julia> Random.seed!(123) TaskLocalRNG() julia> randn() -0.6457306721039767 julia> Random.seed!(123) TaskLocalRNG() julia> fetch(@async nothing) julia> randn() -0.6457306721039767 In other words: the parent RNG is unaffected by spawning a child. The design is documented in detail in a comment preceding the jl_rng_split function.

StefanKarpinski · 2023-03-31T21:41:52Z

Ok, fixed the build on 32-bit platforms! The issue was that I was using the hash function to generate the fifth seed value in setseed! which produces a UInt32 on 32-bit platforms, so calling the function fails. This causes a crash at build time since we call setseed! during the build. I'm using a simple ad hoc mixing of the seed now.

KristofferC · 2023-04-01T07:17:54Z

Would be good with a news entry for this.

StefanKarpinski · 2023-04-01T13:53:27Z

Will do.

This PR adds an optional field to the existing `Xoshiro` struct to be able to faithfully copy the task-local RNG state. Fixes #51255 Redo of #51271 Background context: #49110 added an additional state to the task-local RNG. However, before this PR `copy(default_rng())` did not include this extra state, causing subtle errors in `Test` where `copy(default_rng())` is assumed to contain the full task-local RNG state.

This PR adds an optional field to the existing `Xoshiro` struct to be able to faithfully copy the task-local RNG state. Fixes #51255 Redo of #51271 Background context: #49110 added an additional state to the task-local RNG. However, before this PR `copy(default_rng())` did not include this extra state, causing subtle errors in `Test` where `copy(default_rng())` is assumed to contain the full task-local RNG state. (cherry picked from commit 41b41ab)

vtjnash approved these changes Mar 23, 2023

View reviewed changes

stdlib/Random/src/Xoshiro.jl Outdated Show resolved Hide resolved

StefanKarpinski force-pushed the sk/rng_split branch 2 times, most recently from 0b075f8 to 8d3d550 Compare March 27, 2023 18:33

StefanKarpinski marked this pull request as ready for review March 27, 2023 18:34

StefanKarpinski force-pushed the sk/rng_split branch 2 times, most recently from 66147d0 to ca53644 Compare March 28, 2023 15:31

StefanKarpinski added the backport 1.9 Change should be backported to release-1.9 label Mar 28, 2023

StefanKarpinski force-pushed the sk/rng_split branch from ca53644 to b8e2146 Compare March 28, 2023 15:56

StefanKarpinski removed the backport 1.9 Change should be backported to release-1.9 label Mar 28, 2023

StefanKarpinski force-pushed the sk/rng_split branch from b8e2146 to b27cb44 Compare March 30, 2023 02:39

StefanKarpinski force-pushed the sk/rng_split branch from b27cb44 to 7fab4b4 Compare March 30, 2023 19:37

StefanKarpinski force-pushed the sk/rng_split branch from 7fab4b4 to b5843a6 Compare March 31, 2023 19:47

StefanKarpinski merged commit 7618e64 into master Mar 31, 2023

StefanKarpinski deleted the sk/rng_split branch March 31, 2023 23:22

giordano added the domain:randomness Random number generation and the Random stdlib label Apr 1, 2023

StefanKarpinski mentioned this pull request Apr 1, 2023

NEWS: add news for task-local RNG split change #49217

Merged

Xnartharax pushed a commit to Xnartharax/julia that referenced this pull request Apr 19, 2023

Tasks: don't advance task RNG on task spawn (JuliaLang#49110)

5476a70

JeffBezanson mentioned this pull request Apr 26, 2023

Random.seed! does not yield reproducible numbers #49522

Closed

This was referenced Sep 11, 2023

Add XoshiroSplit type to copy added task RNG state #51271

Closed

Add s4 field to Xoshiro #51332

Merged

veddox mentioned this pull request Oct 4, 2023

ArgParse modifies global random state? carlobaldassi/ArgParse.jl#121

Closed

JonasIsensee mentioned this pull request Nov 11, 2023

File saved in Julia 1.10.0-beta3 cannot be loaded in Julia 1.10.0-rc1 with Random.Xoshiro JuliaIO/JLD2.jl#503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks: don't advance task RNG on task spawn #49110

Tasks: don't advance task RNG on task spawn #49110

StefanKarpinski commented Mar 22, 2023 •

edited

Loading

vtjnash left a comment

StefanKarpinski commented Mar 23, 2023

StefanKarpinski commented Mar 23, 2023 •

edited

Loading

oscardssmith commented Mar 23, 2023

StefanKarpinski commented Mar 25, 2023

StefanKarpinski commented Mar 27, 2023

StefanKarpinski commented Mar 28, 2023

oscardssmith commented Mar 30, 2023

StefanKarpinski commented Mar 30, 2023

JeffBezanson commented Mar 30, 2023

StefanKarpinski commented Mar 30, 2023

StefanKarpinski commented Mar 31, 2023

KristofferC commented Apr 1, 2023

StefanKarpinski commented Apr 1, 2023

Tasks: don't advance task RNG on task spawn #49110

Tasks: don't advance task RNG on task spawn #49110

Conversation

StefanKarpinski commented Mar 22, 2023 • edited Loading

vtjnash left a comment

Choose a reason for hiding this comment

StefanKarpinski commented Mar 23, 2023

StefanKarpinski commented Mar 23, 2023 • edited Loading

oscardssmith commented Mar 23, 2023

StefanKarpinski commented Mar 25, 2023

StefanKarpinski commented Mar 27, 2023

StefanKarpinski commented Mar 28, 2023

oscardssmith commented Mar 30, 2023

StefanKarpinski commented Mar 30, 2023

JeffBezanson commented Mar 30, 2023

StefanKarpinski commented Mar 30, 2023

StefanKarpinski commented Mar 31, 2023

KristofferC commented Apr 1, 2023

StefanKarpinski commented Apr 1, 2023

StefanKarpinski commented Mar 22, 2023 •

edited

Loading

StefanKarpinski commented Mar 23, 2023 •

edited

Loading