Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve randomness #222
Continuing from #219:
Yes, it's not good at all (but better on 64bit, on my computer RAND_MAX is 2^31). Some even consider it harmful :) http://sdrv.ms/1e11LXl It's all nicer with C++11, but we can't build on that yet I guess. Oh and the linked presentation also advises against
At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like
I feel like if we're going to have something better than
Hey guys, Dan asked me to contribute. I'm in a bit of a hurry tonight but I can provide a bit of instant advice.
Please under no circumstances use rand() or any of the built-in functions. The problem is not in the number of 'random bits' as you were discussing above but in the actual randomness of the function. Basically it's a piece of junk.
If you can get Mersenne Twister to work then it's the best you'll find at the moment. I have no idea if the claims by that guy at bedaux.net are true (you do need to verify that it's a correct implementation) but it could be an option.
I've been using Random123 recently. I use it because I can generate parallel random streams but you can also use it in a single series operation. http://www.deshawresearch.com/resources_random123.html It's a bunch of C header files (no C code files) so would be easy to incorporate into code generation.
The other one I use is the suite of random number generators in Numerical Recipes in C. They seem to have removed the code from their site some time in the past year but I have a customised version I can provide (not sure about copyright for Brian though) or you can easily find a pdf of the book online. There are, I think, 4 basic types of random number genertors provided in the book. Each of them has both strenghts and shortcomings; one quick point I'll make now, and you can come back to me for details, is that it has been claimed in the RND literature that the guys writing the book clearly knew nothing about random number generation so while their generators work ok they're not brilliant and the periods are generally shorter than claimed in the book.
A very simple generator, which I prefer over Numerical Recipes, is the Marsaglia KISS generator. There are a bunch of versions online. He was considered a genius in the RND community (died a few years ago). No holes have been found in this generator yet and it's about 5-6 lines of code. Just be careful as it is sensitive to some of the inbuilt constants, and unfortunately in his original posting of the code he mixed up two of the numbers. I have a version that seems to be correct that I can upload if you need it.
Finally, have you an ability to link with / include the GNU scientific library? https://www.gnu.org/software/gsl/ That for me would be the ideal solution. Then you have some dedicated eyes on the code who have knowledge of how it should really work and a vested interest in improving it.
The two biggest issues with random number generators are the actual randomness of the numbers and then the period length before you see a number which already was produced and your internal state variables are as they were the previous time (leading to a loop). The original test of a random number generator was DIEHARD (by Marsaglia) and more recently TestU01 http://www.iro.umontreal.ca/~simardr/testu01/tu01.html The only other thing you need to know is that there is a reasonable expectation that we need a far less stringent random number generator for scientific simulation (in biology anyway) than is needed in cryptography (that's the gold standard). In cryptography they also need reverse engineering of the generation protocol (via estimation of the state parameters) to be impossible.
What were you guys referring to above? "At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup"
Let me know what I haven't explained properly.
Thanks Dave, that's incredibly helpful!
I like the idea of the headers only library and one that can be used for parallel implementations too, but is that Random123 doing Mersenne Twister or something else? One thing we would like for Brian is that it be an easily referenced library. It's easy to reference MT as being a 'good' RNG, which might be important for users (so they can say 'uses such and such library for RNGs (citation)').
How big is GSL? We would like to redistribute any files we need with it, and ideally the fewer dependencies the better. We'd also like it to compile on as wide as possible a variety of platforms as possible, which is another reason for avoiding big, complicated libraries.
So Random123 is one of a new type of random number generator, called a counter-based random number generator. It works more like a cryptographic hash function. You give it a fixed input and it always computes the same answer. But you give it an input one digit away from the previous one and you could get anything out relative to the previous number. They are completely uncorrelated, etc. It passes all of the most difficult tests of randomness out there (according to the authors' peer-reviewed paper).
I was initially sceptical of the code at http://www.bedaux.net/mtrand/ It's hard to believe that it's a full mersenne-twister implementation. But I've compared it with the original code and now I'm more inclined to say if you're going for MT you might as well use the original... there's not a lot of difference between them.
I'm not sure how big GSL is but couldn't you include it via a python requirement for PyGSL? The nice thing then would be that it would be updated and is a very highly regarded library. GSL provides mersenne-twister amongst others.
Do you have any requirement for parallelising the generation of random numbers? That's a whole other story and will introduce quite a degree of complication, but mainly it will rule out some generators as unreliable.
Hi Dave, many thanks for all the info! IMHO, the best approach for us is 1) to go for a Mersenne-Twister as the default, since this is a well-established algorithm and also what we are using when generating numpy code and 2) make it easy to switch to a different random number generator for the user, if they prefer to use something from GSL, for example.
In the long term we certainly want to be able to do this, but as you say, this opens a whole new can of worms. I think our short-term solution will be to have a single-thread random number source.
Oh and we forgot to reply to something that you asked a while ago:
This was a technical detail about our random number generation in weave (where we can use all the numpy functions): Brian code allows to use
That sounds like the right call to me too. MT is highly respected. As a second option I'd suggest importing PyGSL.
For the basic MT the original code is here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
It's not long. I'm not sure how you like to include C code in your project. Also, that guy at bedaux.net claims his version is great because it has no external requirements. But I can't see any in the original code either. Don't forget, unlike your python parts, you're going to hardcode the generator into Brian; so if the authors discover a bug you need to be aware of updates and transfer them to Brian. Looks like the best version so far is here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html
Don't take TinyMT as it has far too short a period length for network simulations.
Also, re the stdlib() rand. You really need to stop using it. Just switch to Numpy.random.random() anywhere you use it currently. You can accept minor inconsistencies in the other generators but rand() is just abysmal.