New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve randomness #222

Closed
thesamovar opened this Issue Feb 21, 2014 · 10 comments

Comments

Projects
None yet
3 participants
@thesamovar
Member

thesamovar commented Feb 21, 2014

At the moment in weave and C++ standalone we use rand()/RAND_MAX to generate uniform random numbers. This is extremely poor, giving only 15 bits of randomness on 32 bit gcc. We should improve this. See discussion in #219.

@thesamovar thesamovar added this to the 2.0alpha3 milestone Feb 21, 2014

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 21, 2014

Member

Continuing from #219:

Now that I look into it I think we should maybe do better than drand() (which still uses rand()).

drand() calls rand() 3 times for each number though.

I hadn't realised but RAND_MAX on a 32 bit gcc is only 2^15 meaning rand()/RAND_MAX has only 15 bits of real randomness, whereas there are something like 53 bits of precision between 0 and 1 for a double precision float. I hadn't realised how incredibly lame rand() is.

Yes, it's not good at all (but better on 64bit, on my computer RAND_MAX is 2^31). Some even consider it harmful :) http://sdrv.ms/1e11LXl It's all nicer with C++11, but we can't build on that yet I guess. Oh and the linked presentation also advises against srand(time(NULL).

A thought occurs to me though: aren't we using numpy to generate random numbers in weave? (Generating pools of them and then regenerating by calling back to numpy when it runs out.)

At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup

Member

mstimberg commented Feb 21, 2014

Continuing from #219:

Now that I look into it I think we should maybe do better than drand() (which still uses rand()).

drand() calls rand() 3 times for each number though.

I hadn't realised but RAND_MAX on a 32 bit gcc is only 2^15 meaning rand()/RAND_MAX has only 15 bits of real randomness, whereas there are something like 53 bits of precision between 0 and 1 for a double precision float. I hadn't realised how incredibly lame rand() is.

Yes, it's not good at all (but better on 64bit, on my computer RAND_MAX is 2^31). Some even consider it harmful :) http://sdrv.ms/1e11LXl It's all nicer with C++11, but we can't build on that yet I guess. Oh and the linked presentation also advises against srand(time(NULL).

A thought occurs to me though: aren't we using numpy to generate random numbers in weave? (Generating pools of them and then regenerating by calling back to numpy when it runs out.)

At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Feb 21, 2014

Member

I feel like if we're going to have something better than rand() we should do it really properly rather than a halfway solution like drand(). I wonder if there is some simpleish open source code we can just download and include for our RNG generation? Perhaps we should consult with someone who really knows about RNGs? I have a friend who does this, I'll ask him.

Member

thesamovar commented Feb 21, 2014

I feel like if we're going to have something better than rand() we should do it really properly rather than a halfway solution like drand(). I wonder if there is some simpleish open source code we can just download and include for our RNG generation? Perhaps we should consult with someone who really knows about RNGs? I have a friend who does this, I'll ask him.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Feb 21, 2014

Member

This one looks good, one smaller header and implementation file: http://www.bedaux.net/mtrand/

Member

thesamovar commented Feb 21, 2014

This one looks good, one smaller header and implementation file: http://www.bedaux.net/mtrand/

@daveh19

This comment has been minimized.

Show comment
Hide comment
@daveh19

daveh19 Feb 22, 2014

Hey guys, Dan asked me to contribute. I'm in a bit of a hurry tonight but I can provide a bit of instant advice.

Please under no circumstances use rand() or any of the built-in functions. The problem is not in the number of 'random bits' as you were discussing above but in the actual randomness of the function. Basically it's a piece of junk.

If you can get Mersenne Twister to work then it's the best you'll find at the moment. I have no idea if the claims by that guy at bedaux.net are true (you do need to verify that it's a correct implementation) but it could be an option.

I've been using Random123 recently. I use it because I can generate parallel random streams but you can also use it in a single series operation. http://www.deshawresearch.com/resources_random123.html It's a bunch of C header files (no C code files) so would be easy to incorporate into code generation.

The other one I use is the suite of random number generators in Numerical Recipes in C. They seem to have removed the code from their site some time in the past year but I have a customised version I can provide (not sure about copyright for Brian though) or you can easily find a pdf of the book online. There are, I think, 4 basic types of random number genertors provided in the book. Each of them has both strenghts and shortcomings; one quick point I'll make now, and you can come back to me for details, is that it has been claimed in the RND literature that the guys writing the book clearly knew nothing about random number generation so while their generators work ok they're not brilliant and the periods are generally shorter than claimed in the book.

A very simple generator, which I prefer over Numerical Recipes, is the Marsaglia KISS generator. There are a bunch of versions online. He was considered a genius in the RND community (died a few years ago). No holes have been found in this generator yet and it's about 5-6 lines of code. Just be careful as it is sensitive to some of the inbuilt constants, and unfortunately in his original posting of the code he mixed up two of the numbers. I have a version that seems to be correct that I can upload if you need it.

Finally, have you an ability to link with / include the GNU scientific library? https://www.gnu.org/software/gsl/ That for me would be the ideal solution. Then you have some dedicated eyes on the code who have knowledge of how it should really work and a vested interest in improving it.

The two biggest issues with random number generators are the actual randomness of the numbers and then the period length before you see a number which already was produced and your internal state variables are as they were the previous time (leading to a loop). The original test of a random number generator was DIEHARD (by Marsaglia) and more recently TestU01 http://www.iro.umontreal.ca/~simardr/testu01/tu01.html The only other thing you need to know is that there is a reasonable expectation that we need a far less stringent random number generator for scientific simulation (in biology anyway) than is needed in cryptography (that's the gold standard). In cryptography they also need reverse engineering of the generation protocol (via estimation of the state parameters) to be impossible.

What were you guys referring to above? "At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup"
You know that the only random number you need to generate is a uniform (0,1) one right? From that you generate the others.

Let me know what I haven't explained properly.
Dave.

daveh19 commented Feb 22, 2014

Hey guys, Dan asked me to contribute. I'm in a bit of a hurry tonight but I can provide a bit of instant advice.

Please under no circumstances use rand() or any of the built-in functions. The problem is not in the number of 'random bits' as you were discussing above but in the actual randomness of the function. Basically it's a piece of junk.

If you can get Mersenne Twister to work then it's the best you'll find at the moment. I have no idea if the claims by that guy at bedaux.net are true (you do need to verify that it's a correct implementation) but it could be an option.

I've been using Random123 recently. I use it because I can generate parallel random streams but you can also use it in a single series operation. http://www.deshawresearch.com/resources_random123.html It's a bunch of C header files (no C code files) so would be easy to incorporate into code generation.

The other one I use is the suite of random number generators in Numerical Recipes in C. They seem to have removed the code from their site some time in the past year but I have a customised version I can provide (not sure about copyright for Brian though) or you can easily find a pdf of the book online. There are, I think, 4 basic types of random number genertors provided in the book. Each of them has both strenghts and shortcomings; one quick point I'll make now, and you can come back to me for details, is that it has been claimed in the RND literature that the guys writing the book clearly knew nothing about random number generation so while their generators work ok they're not brilliant and the periods are generally shorter than claimed in the book.

A very simple generator, which I prefer over Numerical Recipes, is the Marsaglia KISS generator. There are a bunch of versions online. He was considered a genius in the RND community (died a few years ago). No holes have been found in this generator yet and it's about 5-6 lines of code. Just be careful as it is sensitive to some of the inbuilt constants, and unfortunately in his original posting of the code he mixed up two of the numbers. I have a version that seems to be correct that I can upload if you need it.

Finally, have you an ability to link with / include the GNU scientific library? https://www.gnu.org/software/gsl/ That for me would be the ideal solution. Then you have some dedicated eyes on the code who have knowledge of how it should really work and a vested interest in improving it.

The two biggest issues with random number generators are the actual randomness of the numbers and then the period length before you see a number which already was produced and your internal state variables are as they were the previous time (leading to a loop). The original test of a random number generator was DIEHARD (by Marsaglia) and more recently TestU01 http://www.iro.umontreal.ca/~simardr/testu01/tu01.html The only other thing you need to know is that there is a reasonable expectation that we need a far less stringent random number generator for scientific simulation (in biology anyway) than is needed in cryptography (that's the gold standard). In cryptography they also need reverse engineering of the generation protocol (via estimation of the state parameters) to be impossible.

What were you guys referring to above? "At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup"
You know that the only random number you need to generate is a uniform (0,1) one right? From that you generate the others.

Let me know what I haven't explained properly.
Dave.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Feb 22, 2014

Member

Thanks Dave, that's incredibly helpful!

I like the idea of the headers only library and one that can be used for parallel implementations too, but is that Random123 doing Mersenne Twister or something else? One thing we would like for Brian is that it be an easily referenced library. It's easy to reference MT as being a 'good' RNG, which might be important for users (so they can say 'uses such and such library for RNGs (citation)').

How big is GSL? We would like to redistribute any files we need with it, and ideally the fewer dependencies the better. We'd also like it to compile on as wide as possible a variety of platforms as possible, which is another reason for avoiding big, complicated libraries.

Member

thesamovar commented Feb 22, 2014

Thanks Dave, that's incredibly helpful!

I like the idea of the headers only library and one that can be used for parallel implementations too, but is that Random123 doing Mersenne Twister or something else? One thing we would like for Brian is that it be an easily referenced library. It's easy to reference MT as being a 'good' RNG, which might be important for users (so they can say 'uses such and such library for RNGs (citation)').

How big is GSL? We would like to redistribute any files we need with it, and ideally the fewer dependencies the better. We'd also like it to compile on as wide as possible a variety of platforms as possible, which is another reason for avoiding big, complicated libraries.

@daveh19

This comment has been minimized.

Show comment
Hide comment
@daveh19

daveh19 Mar 9, 2014

So Random123 is one of a new type of random number generator, called a counter-based random number generator. It works more like a cryptographic hash function. You give it a fixed input and it always computes the same answer. But you give it an input one digit away from the previous one and you could get anything out relative to the previous number. They are completely uncorrelated, etc. It passes all of the most difficult tests of randomness out there (according to the authors' peer-reviewed paper).
In order to use it you'd need to contact the authors for permission. But they're academics so I suspect they'd be quite open to the idea. They seem to just want to be cited: Salmon, J.K., M.A. Moraes, R.O. Dror, and D.E. Shaw. “Parallel Random Numbers: As Easy as 1, 2, 3.” In High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 1–12, 2011.

I was initially sceptical of the code at http://www.bedaux.net/mtrand/ It's hard to believe that it's a full mersenne-twister implementation. But I've compared it with the original code and now I'm more inclined to say if you're going for MT you might as well use the original... there's not a lot of difference between them.

I'm not sure how big GSL is but couldn't you include it via a python requirement for PyGSL? The nice thing then would be that it would be updated and is a very highly regarded library. GSL provides mersenne-twister amongst others.

Do you have any requirement for parallelising the generation of random numbers? That's a whole other story and will introduce quite a degree of complication, but mainly it will rule out some generators as unreliable.

daveh19 commented Mar 9, 2014

So Random123 is one of a new type of random number generator, called a counter-based random number generator. It works more like a cryptographic hash function. You give it a fixed input and it always computes the same answer. But you give it an input one digit away from the previous one and you could get anything out relative to the previous number. They are completely uncorrelated, etc. It passes all of the most difficult tests of randomness out there (according to the authors' peer-reviewed paper).
In order to use it you'd need to contact the authors for permission. But they're academics so I suspect they'd be quite open to the idea. They seem to just want to be cited: Salmon, J.K., M.A. Moraes, R.O. Dror, and D.E. Shaw. “Parallel Random Numbers: As Easy as 1, 2, 3.” In High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 1–12, 2011.

I was initially sceptical of the code at http://www.bedaux.net/mtrand/ It's hard to believe that it's a full mersenne-twister implementation. But I've compared it with the original code and now I'm more inclined to say if you're going for MT you might as well use the original... there's not a lot of difference between them.

I'm not sure how big GSL is but couldn't you include it via a python requirement for PyGSL? The nice thing then would be that it would be updated and is a very highly regarded library. GSL provides mersenne-twister amongst others.

Do you have any requirement for parallelising the generation of random numbers? That's a whole other story and will introduce quite a degree of complication, but mainly it will rule out some generators as unreliable.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Mar 10, 2014

Member

Hi Dave, many thanks for all the info! IMHO, the best approach for us is 1) to go for a Mersenne-Twister as the default, since this is a well-established algorithm and also what we are using when generating numpy code and 2) make it easy to switch to a different random number generator for the user, if they prefer to use something from GSL, for example.

Do you have any requirement for parallelising the generation of random numbers?

In the long term we certainly want to be able to do this, but as you say, this opens a whole new can of worms. I think our short-term solution will be to have a single-thread random number source.

Oh and we forgot to reply to something that you asked a while ago:

What were you guys referring to above? "At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup"
You know that the only random number you need to generate is a uniform (0,1) one right? From that you generate the others.

This was a technical detail about our random number generation in weave (where we can use all the numpy functions): Brian code allows to use rand() and randn() for uniform/normal distributed numbers. ATM we use the shabby stdlib rand() approach to generate the former and use numpy's randn function for the latter -- mostly out of lazyness to write the code to generate it from the uniform number, I guess. This is kind of peripheral to the discussion in this issue, though. I think in weave we should always use numpy's implementation for consistency, here we are mostly discussing our "standalone" mode where we don't want to depend on a big library like numpy in our generated code.

Member

mstimberg commented Mar 10, 2014

Hi Dave, many thanks for all the info! IMHO, the best approach for us is 1) to go for a Mersenne-Twister as the default, since this is a well-established algorithm and also what we are using when generating numpy code and 2) make it easy to switch to a different random number generator for the user, if they prefer to use something from GSL, for example.

Do you have any requirement for parallelising the generation of random numbers?

In the long term we certainly want to be able to do this, but as you say, this opens a whole new can of worms. I think our short-term solution will be to have a single-thread random number source.

Oh and we forgot to reply to something that you asked a while ago:

What were you guys referring to above? "At the moment we are only doing this for normal distributed numbers, not for uniformly distributed ones. I think we decided this because it is relatively costly and uniform random numbers are used in quite a few "sensitive areas" like PoissonGroup"
You know that the only random number you need to generate is a uniform (0,1) one right? From that you generate the others.

This was a technical detail about our random number generation in weave (where we can use all the numpy functions): Brian code allows to use rand() and randn() for uniform/normal distributed numbers. ATM we use the shabby stdlib rand() approach to generate the former and use numpy's randn function for the latter -- mostly out of lazyness to write the code to generate it from the uniform number, I guess. This is kind of peripheral to the discussion in this issue, though. I think in weave we should always use numpy's implementation for consistency, here we are mostly discussing our "standalone" mode where we don't want to depend on a big library like numpy in our generated code.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Mar 10, 2014

Member

Agreed on single-threaded MT as a default, but do we have code for it that isn't too complicated and that is reliable?

Member

thesamovar commented Mar 10, 2014

Agreed on single-threaded MT as a default, but do we have code for it that isn't too complicated and that is reliable?

@daveh19

This comment has been minimized.

Show comment
Hide comment
@daveh19

daveh19 Mar 10, 2014

That sounds like the right call to me too. MT is highly respected. As a second option I'd suggest importing PyGSL.

For the basic MT the original code is here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

It's not long. I'm not sure how you like to include C code in your project. Also, that guy at bedaux.net claims his version is great because it has no external requirements. But I can't see any in the original code either. Don't forget, unlike your python parts, you're going to hardcode the generator into Brian; so if the authors discover a bug you need to be aware of updates and transfer them to Brian. Looks like the best version so far is here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

Don't take TinyMT as it has far too short a period length for network simulations.

Also, re the stdlib() rand. You really need to stop using it. Just switch to Numpy.random.random() anywhere you use it currently. You can accept minor inconsistencies in the other generators but rand() is just abysmal.

daveh19 commented Mar 10, 2014

That sounds like the right call to me too. MT is highly respected. As a second option I'd suggest importing PyGSL.

For the basic MT the original code is here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

It's not long. I'm not sure how you like to include C code in your project. Also, that guy at bedaux.net claims his version is great because it has no external requirements. But I can't see any in the original code either. Don't forget, unlike your python parts, you're going to hardcode the generator into Brian; so if the authors discover a bug you need to be aware of updates and transfer them to Brian. Looks like the best version so far is here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

Don't take TinyMT as it has far too short a period length for network simulations.

Also, re the stdlib() rand. You really need to stop using it. Just switch to Numpy.random.random() anywhere you use it currently. You can accept minor inconsistencies in the other generators but rand() is just abysmal.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 9, 2016

Member

Closed via #706

Member

mstimberg commented Jun 9, 2016

Closed via #706

@mstimberg mstimberg closed this Jun 9, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment