Improve Mersenne Twister performance #81

etam · 2014-04-07T19:16:07Z

On my machine generating 1024*1024*32 random numbers takes about 10 seconds.

Here: http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/mersenne-twister/ (and complete downloads here http://www.fixstars.com/en/opencl/book/sample/) you can find implementation that does the same amount of work in about 0.35s.

kylelutz · 2014-04-08T16:13:35Z

Thanks for the report and the links. I'll try to find some time to take a look and update the code.

etam · 2014-04-13T13:50:49Z

Well. After digging deeper, I found that the implementation there is not the best solution. The "official" one http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/index.html should be used instead.

kylelutz · 2014-04-17T02:19:39Z

Interesting. We should be able to integrate their algorithm (though perhaps as another engine, mtgp_engine).

dacmot · 2017-09-22T19:58:00Z

I'm also very interested in this. On my machine (12 CPU cores + GTX 680) 10MB of uniform_real_distribution with mersenne_twister_engine takes about 1.6 seconds to generate, which is better than for @etam, but still very slow. I'm wondering if the fact that the mersenne_twister_engine code creates a second temporary vector and does two transforms instead of composing the scaling kernel could have something to do with it. Is it even possible to compose kernels with boost::compute?

Also, I was wondering what were the developers' thoughts on adding more engines. Doing a search on GPU random number generations I stumbled on a few including https://github.com/clMathLibraries/clRNG and http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-uniform.html. The MWC64X one is of particular interest for me as I don't need an extremely long period but performance is much more important. Licenses are BSD.

jszuppe · 2017-09-22T20:44:21Z

I'd recommend improving current Philox implementation. Right now in Boost.Compute it's designed badly and has poor performance, It can be improved to achieve 200 - 350 GB/s (50 - 90 GSamples/s) on modern top GPUs (depending on GPU and it's architecture). It's really simple RNG. You can also implement XORWOW, which should achieve similar or higher performance.

dacmot · 2017-09-25T16:17:31Z

There's a Philox implementation? I only see a ThreeFry and a linear congruential along with the MT engine. Also the ThreeFry is not in the API overview and doesn't compile when used in conjunction with uniform_real_distribution since its generate() method doesn't take an scaling kernel.

jszuppe · 2017-09-25T16:25:05Z

Oh, sorry, my mistake, indeed it's ThreeFry. Nonetheless, it's fast. I think that adding Philox to Boost.Compute is the best option to have fast random number generator, and should not be so hard. Unfortunately, recently I don't have enough free time to do it.

kylelutz added the performance label Apr 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Mersenne Twister performance #81

Improve Mersenne Twister performance #81

etam commented Apr 7, 2014 •

edited

kylelutz commented Apr 8, 2014

etam commented Apr 13, 2014

kylelutz commented Apr 17, 2014

dacmot commented Sep 22, 2017 •

edited

jszuppe commented Sep 22, 2017

dacmot commented Sep 25, 2017

jszuppe commented Sep 25, 2017

Improve Mersenne Twister performance #81

Improve Mersenne Twister performance #81

Comments

etam commented Apr 7, 2014 • edited

kylelutz commented Apr 8, 2014

etam commented Apr 13, 2014

kylelutz commented Apr 17, 2014

dacmot commented Sep 22, 2017 • edited

jszuppe commented Sep 22, 2017

dacmot commented Sep 25, 2017

jszuppe commented Sep 25, 2017

etam commented Apr 7, 2014 •

edited

dacmot commented Sep 22, 2017 •

edited