MersenneTwister: more efficient integer generation with caching #25058

rfourquet · 2017-12-13T14:39:41Z

This is the last "performance" PR I wanted in 1.0. It depends on the PR on which it is based off of for getting better performance. But the gains are less significative than when I first implemented it some 3 years ago. Hence RFC, to discuss the trade-off.
The idea is that, as MersenneTwister is natively efficient to randomize an array, we can store an array of integers as a MersenneTwister field and randomize it; each time a random integer is requested, we give one of the cached values. This is exactly like we do already for Float64 values. So do we want MersenneTwister to be heavier by some kB for better performance? I don't see any drawback, but I'm not sure. I played a little bit with the size of the cache, the "best" one so far is 10kB, with speed-ups as follows on my machine (Int==Int64):

Int8 -> +20%
Int16 -> +15%
Int32 -> +20%
Int64 -> +60%
Int128 -> +30%

This is WIP: MersenneTwister is untouched, the new algorithm is used when instanciating a helper RNG like r = IntCached{625}(MersenneTwister()); rand(r, Int), where 625 is the size of the cache in terms of UInt128 integers. It will be very fast to remove the WIP if we decide so.

rfourquet · 2017-12-19T12:53:37Z

@nanosoldier runbenchmarks("random", vs=":master")

nanosoldier · 2017-12-19T14:00:18Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

rfourquet · 2017-12-19T15:12:28Z

I tested systematically all reasonable values for the size of the cache, and it appeared that there is a threshold (501 UInt128) beyond which the generation of integers becomes much faster. And I tested on two different machines, with different cache sizes etc. Still, if someone wants to test, you can just compare 501 and 500 for example, by changing this constant in "random/RNGs.jl" (and tell me your findings; if someone wants to test exhaustively, just ask me to push a branch where this is easier to do, without recompiling julia each time). The improvements are very similar but not exactly the same on both machines (e.g. my other one shows no improvement for Int8, Int16, Int32 but 50% improvement for Int128.

Now, this change had a consequence I didn't think about: we test that rand(97:122) gives a reproducible number whatever Int is (Int32 or Int64). But with this PR, rand(Int32) doesn't use anymore the same algorithm as before, so this is broken. Do we really want to have this feature? (I guess yes, as this is in a test). So I temporarily changed again our generator for ranges so the the test passes again, but with some regression (shown in Nanosoldier report). Temporarily, till I find a better fix, which I think I have.

Now, I have a big quesion mark in my head: if I use @btime or @benchmarkable in my REPL, I get the improvements shown in the OP. But if I run the BaseBencharks suite for "random", I get the same as Nanosoldier shows, i.e. no improvement. But this benchmark suite should benchmark rand(Int) etc, so I'm at lost. Someone as any idea what could be happening?

StefanKarpinski · 2017-12-19T15:44:12Z

@jrevels, @andreasnoack ^ any ideas about the nanosoldier results?

jrevels · 2017-12-19T16:12:21Z

Now, I have a big quesion mark in my head: if I use @btime or @benchmarkable in my REPL, I get the improvements shown in the OP. But if I run the BaseBencharks suite for "random", I get the same as Nanosoldier shows, i.e. no improvement. But this benchmark suite should benchmark rand(Int) etc, so I'm at lost. Someone as any idea what could be happening?

Keep in mind that if you run the BaseBenchmarks suite yourself, you technically should be tuning the benchmarks for your machine and Julia build rather than use the tuning parameters cached in the repo. Those parameters were tuned specifically with Nanosoldier in mind. It's not a problem for all benchmarks, but can be for some.

Also, unless one employs low-level randomization schemes between experiments, benchmark performance will theoretically depend on prior machine state, such that running a set of benchmarks in one sequence can produce different results than running the set in a different sequence. Again, some benchmarks are more vulnerable to these effects than others.

Can you show an example where the manual @benchmark run differs from the BaseBenchmarks version for a specific benchmark on your machine? Might help narrow this down.

rfourquet · 2017-12-19T16:53:12Z

Keep in mind that if you run the BaseBenchmarks suite yourself, you technically should be tuning the benchmarks for your machine and Julia build rather than use the tuning parameters cached in the repo.

Ok great info thanks!

Can you show an example where the manual @benchmark run differs from the BaseBenchmarks version for a specific benchmark on your machine? Might help narrow this down.

For example @btime rand($(MersenneTwister()), UInt), gives consistently at least 40% improvements, while the follwowing line in BaseBenchmarks gives consistently no improvement:

    g["rand",     "MersenneTwister", tstr] = @benchmarkable _rand($MT, $(Val{T}()))

I even deleted all the lines in this file and added a simplified:

    g["rand",     "MersenneTwister", "UInt"] = @benchmarkable rand($MT, UInt)

But this doesn't change anything.

oscardssmith · 2017-12-19T16:54:45Z

Could we create fixed fake urandom information so nanosoldier can have more consistent outcomes between tests?

rfourquet · 2017-12-19T17:20:46Z

Could we create fixed fake urandom information so nanosoldier can have more consistent outcomes between tests?

I don't relly understand, but I will guess your are more talking to Jarrett ;-)

rfourquet · 2017-12-19T17:36:50Z

I pushed what is hopefully a fix for the regressions, except for Bool ranges, which is addressed separately at #25190. I won't hope to see improvements with Nanosoldier on the next run, but at least no regressions would be somewhat necessary for a PR claiming perf improvements :)

@nanosoldier runbenchmarks("random", vs=":master")

ararslan · 2017-12-19T20:27:48Z

Looks like Nanosoldier gave a 400 for your request. @nanosoldier runbenchmarks("random", vs=":master")

ararslan · 2017-12-19T20:30:03Z

...and mine too. @nanosoldier runbenchmarks("random", vs=":master")

nanosoldier · 2017-12-19T21:38:12Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

rfourquet · 2017-12-20T10:36:49Z

At least the regressions affecting UInt32 are gone now. But the remaining ones for rand!(::Array{Int}, 1:1000) are weid, as the corresponding code is not even touched... I will request if one or two good souls are willing to test this branch and compare with master:

a = zeros(Int, 1000); @btime rand!($a, 1:1000) (should show no regression nor improvement)
@btime rand($(MersenneTwister()), UInt64) (should show improvement)
@btime rand($(MersenneTwister()), UInt128) (should show improvement)

And if you have even more time to spend, you can checkout instead the branch at #25197 ("rf/rand/cache-float") which contains this PR plus more Float caching, so running @btime rand(Float64) would help!

maleadt · 2017-12-20T15:16:28Z

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3131 (2017-12-20 11:49 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 564ecb0fd4* (0 days old master)
|__/                   |  x86_64-unknown-linux-gnu


julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.283 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.235 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.260 μs (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt64);
  4.144 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  4.122 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  4.042 ns (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt128);
  5.437 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  5.489 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  5.760 ns (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), Float64);
  3.691 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  3.707 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  3.683 ns (0 allocations: 0 bytes)

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3128 (2017-12-20 10:16 UTC)
 _/ |\__'_|_|_|\__'_|  |  rf/rand/cache-int/d32e2671b8* (fork: 1 commits, 0 days)
|__/                   |  x86_64-unknown-linux-gnu


julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.515 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.405 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.322 μs (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt64);
  3.022 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  3.027 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  3.030 ns (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt128);
  4.097 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  4.215 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  4.102 ns (0 allocations: 0 bytes)

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3129 (2017-12-20 10:41 UTC)
 _/ |\__'_|_|_|\__'_|  |  rf/rand/cache-float/a786746c6d* (fork: 2 commits, 0 days)
|__/                   |  x86_64-unknown-linux-gnu


julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  10.830 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  10.837 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  10.832 μs (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), Float64);
  2.535 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  2.728 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  2.540 ns (0 allocations: 0 bytes)

rfourquet · 2017-12-20T16:58:56Z

Thanks so much @maleadt ! this confirms at least that I don't hallucinate with performance improvents :)
I will wait till tomorrow if someone else runs the benchmarks, and I will myself have access to a another computer to see if the improvements get confirmed, Then, I would favor merging this.

rfourquet · 2017-12-23T10:07:13Z

I just checked on yet another computer which also shows similar speed improvements, so I will merge when this turn green.

andreasnoack · 2018-02-02T21:04:28Z

@rfourquet Is it correct that this PR changed the values of rand(Int) such that they are now different from Julia 0.6 when given the same seed? I get

0.6

julia> srand(1); rand(Int)
-2795697449271335806

0.7-

julia> srand(1); rand(Int)
6592875438459116351

Just wanted to confirm since some package tests rely on reproducibility here so we'd need to find a workaround in the packages if the change is intended.

ararslan · 2018-02-02T21:41:00Z

See also #17426, in particular Simon's comment:

In general, we've not preserved the behaviour of random functions and seeds between versions

rfourquet · 2018-02-11T10:27:40Z

@andreasnoack yes the change is intended, sorry for the inconvenience! This should not change anymore before 1.0 (but I may propose a last change for generation from a range).

bjarthur · 2018-02-11T13:38:55Z

@rfourquet is there a way to seed 0.7 such that it will reproduce the results from 0.6?

rfourquet · 2018-02-13T10:19:49Z

The only way I imagine to get the same numbers would be to create a dummy MyCompatInt and to define rand for this type, like Int was defined in 0.6. Or too overwrite this method in 0.7...

mschauer · 2018-02-13T10:28:19Z

This is why we need a way to dispatch to provide a random number generator providing the entropy and a Sampler turning the entropy into random variates with a distribution in a specified way :-(

rfourquet added decision A decision on this change is needed performance Must go faster RNG Random number generation and the Random stdlib triage This should be discussed on a triage call labels Dec 13, 2017

StefanKarpinski removed triage This should be discussed on a triage call decision A decision on this change is needed labels Dec 14, 2017

StefanKarpinski added this to the 1.0 milestone Dec 14, 2017

StefanKarpinski assigned rfourquet Dec 14, 2017

rfourquet force-pushed the rf/rand/cache-int branch from cb2a525 to 450ead6 Compare December 19, 2017 12:52

rfourquet mentioned this pull request Dec 19, 2017

faster rand(::UnitRange{Bool}) #25190

Closed

rfourquet force-pushed the rf/rand/cache-int branch from 450ead6 to e6ccbb1 Compare December 19, 2017 17:34

rfourquet mentioned this pull request Dec 19, 2017

MersenneTwister: more efficient Float64 scalar generation with caching #25197

Merged

rfourquet force-pushed the rf/rand/cache-int branch from e6ccbb1 to d32e267 Compare December 20, 2017 10:16

rfourquet force-pushed the rf/rand/cache-int branch from d32e267 to 2a293e2 Compare December 23, 2017 10:06

rfourquet changed the title ~~RFC/WIP: MersenneTwister: more efficient integer generation with caching~~ MersenneTwister: more efficient integer generation with caching Dec 23, 2017

MersenneTwister: more efficient integer generation with caching

4f78032

rfourquet force-pushed the rf/rand/cache-int branch from 2a293e2 to 4f78032 Compare December 23, 2017 12:46

rfourquet merged commit b385693 into master Dec 23, 2017

rfourquet deleted the rf/rand/cache-int branch December 23, 2017 16:51

andreasnoack mentioned this pull request Feb 7, 2018

DataFrames 0.11+ compat via IndirectArrays GiovineItalia/Gadfly.jl#1090

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MersenneTwister: more efficient integer generation with caching #25058

MersenneTwister: more efficient integer generation with caching #25058

rfourquet commented Dec 13, 2017 •

edited

rfourquet commented Dec 19, 2017

nanosoldier commented Dec 19, 2017

rfourquet commented Dec 19, 2017 •

edited

StefanKarpinski commented Dec 19, 2017

jrevels commented Dec 19, 2017 •

edited

rfourquet commented Dec 19, 2017

oscardssmith commented Dec 19, 2017

rfourquet commented Dec 19, 2017

rfourquet commented Dec 19, 2017

ararslan commented Dec 19, 2017

ararslan commented Dec 19, 2017

nanosoldier commented Dec 19, 2017

rfourquet commented Dec 20, 2017

maleadt commented Dec 20, 2017

rfourquet commented Dec 20, 2017

rfourquet commented Dec 23, 2017

andreasnoack commented Feb 2, 2018

ararslan commented Feb 2, 2018

rfourquet commented Feb 11, 2018

bjarthur commented Feb 11, 2018

rfourquet commented Feb 13, 2018

mschauer commented Feb 13, 2018

MersenneTwister: more efficient integer generation with caching #25058

MersenneTwister: more efficient integer generation with caching #25058

Conversation

rfourquet commented Dec 13, 2017 • edited

rfourquet commented Dec 19, 2017

nanosoldier commented Dec 19, 2017

rfourquet commented Dec 19, 2017 • edited

StefanKarpinski commented Dec 19, 2017

jrevels commented Dec 19, 2017 • edited

rfourquet commented Dec 19, 2017

oscardssmith commented Dec 19, 2017

rfourquet commented Dec 19, 2017

rfourquet commented Dec 19, 2017

ararslan commented Dec 19, 2017

ararslan commented Dec 19, 2017

nanosoldier commented Dec 19, 2017

rfourquet commented Dec 20, 2017

maleadt commented Dec 20, 2017

rfourquet commented Dec 20, 2017

rfourquet commented Dec 23, 2017

andreasnoack commented Feb 2, 2018

0.6

0.7-

ararslan commented Feb 2, 2018

rfourquet commented Feb 11, 2018

bjarthur commented Feb 11, 2018

rfourquet commented Feb 13, 2018

mschauer commented Feb 13, 2018

rfourquet commented Dec 13, 2017 •

edited

rfourquet commented Dec 19, 2017 •

edited

jrevels commented Dec 19, 2017 •

edited