Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Mersenne Twister performance #81

Open
etam opened this issue Apr 7, 2014 · 7 comments
Open

Improve Mersenne Twister performance #81

etam opened this issue Apr 7, 2014 · 7 comments

Comments

@etam
Copy link

etam commented Apr 7, 2014

On my machine generating 1024*1024*32 random numbers takes about 10 seconds.

Here: http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/mersenne-twister/ (and complete downloads here http://www.fixstars.com/en/opencl/book/sample/) you can find implementation that does the same amount of work in about 0.35s.

@kylelutz
Copy link
Collaborator

kylelutz commented Apr 8, 2014

Thanks for the report and the links. I'll try to find some time to take a look and update the code.

@etam
Copy link
Author

etam commented Apr 13, 2014

Well. After digging deeper, I found that the implementation there is not the best solution. The "official" one http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/index.html should be used instead.

@kylelutz
Copy link
Collaborator

Interesting. We should be able to integrate their algorithm (though perhaps as another engine, mtgp_engine).

@dacmot
Copy link

dacmot commented Sep 22, 2017

I'm also very interested in this. On my machine (12 CPU cores + GTX 680) 10MB of uniform_real_distribution with mersenne_twister_engine takes about 1.6 seconds to generate, which is better than for @etam, but still very slow. I'm wondering if the fact that the mersenne_twister_engine code creates a second temporary vector and does two transforms instead of composing the scaling kernel could have something to do with it. Is it even possible to compose kernels with boost::compute?

Also, I was wondering what were the developers' thoughts on adding more engines. Doing a search on GPU random number generations I stumbled on a few including https://github.com/clMathLibraries/clRNG and http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-uniform.html. The MWC64X one is of particular interest for me as I don't need an extremely long period but performance is much more important. Licenses are BSD.

@jszuppe
Copy link
Contributor

jszuppe commented Sep 22, 2017

I'd recommend improving current Philox implementation. Right now in Boost.Compute it's designed badly and has poor performance, It can be improved to achieve 200 - 350 GB/s (50 - 90 GSamples/s) on modern top GPUs (depending on GPU and it's architecture). It's really simple RNG. You can also implement XORWOW, which should achieve similar or higher performance.

@dacmot
Copy link

dacmot commented Sep 25, 2017

There's a Philox implementation? I only see a ThreeFry and a linear congruential along with the MT engine. Also the ThreeFry is not in the API overview and doesn't compile when used in conjunction with uniform_real_distribution since its generate() method doesn't take an scaling kernel.

@jszuppe
Copy link
Contributor

jszuppe commented Sep 25, 2017

Oh, sorry, my mistake, indeed it's ThreeFry. Nonetheless, it's fast. I think that adding Philox to Boost.Compute is the best option to have fast random number generator, and should not be so hard. Unfortunately, recently I don't have enough free time to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants