Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normal/Gaussian random number generation #1029

Closed
wants to merge 20 commits into from

Conversation

WebDrake
Copy link
Contributor

A little Christmas present for D :-) This code offers both function and struct interfaces for generating random numbers drawn from a normal/Gaussian distribution, with a choice of algorithms: for now, the Box-Muller algorithm (adapted from Boost's implementation), and Jerro's adaptation of his own Ziggurat implementation. Cf. the following discussions for some background and details:
http://www.digitalmars.com/d/archives/digitalmars/D/Normal_Gaussian_random_number_generation_for_D_179010.html
WebDrake#1
WebDrake#2
WebDrake#3

The different algorithms are implemented as different internal "engines" which can be chosen by the user as template parameters. This approach is proposed as a general solution for incorporating multiple different random number algorithms into Phobos, and can be used in future for other random number generating functions (e.g. for exponentially-distributed random numbers).

The struct implementations are designed to pave the way for functionality akin to Boost/C++11's variate_generator which couples a distribution and a random number generator.

Several issues for consideration when reviewing this code:

  1. The floating-point type used by the internal engines. The current approach infers the type based on the CommonType of mean and standard deviation, defaulting to double if the CommonType is not floating point. However, this means that a different static instance of the engine is created for each floating-point type (float, double, real). If this is undesirable it may be better to assume highest precision (=real) for the engine's internal operations.
  2. Private functions and templates introduced in Jerro's Ziggurat algorithm. Some of these may be useful beyond Ziggurat and if so might be better placed elsewhere: e.g. hasCompileTimeMinMax, isPowerOfTwo.
  3. Jerro's fastUniformFloat is virtually identical in practice to Boost/C++11's uniform_01 random number distribution, which returns a uniformly-distributed random number in the half-open interval [0, 1). It may therefore be worth making it public and naming it accordingly. A difference to Boost is that this version assumes that the underlying RNG will have an integral base type, so if float-based RNGs are planned for the future, it may be worth tweaking the code accordingly.
  4. Likewise, Jerro's fastUniformInt provides a speedy way of generating a uniformly-distributed integer in [0, n) and therefore may be worth making public, possibly with a name change.
  5. General naming conventions used in this code, e.g. for the engines.

Jerro has also generated some interesting results based on tests of randomness (see discussion WebDrake#1) which may be worth taking into account in the bigger picture of std.random development.

.... oh, and -- should we include a partridge in a pear tree? :-)

// this can happen if the function is completelly flat on the interval
// this happens when called from zigguratInitialize using 256 layers
// and single precision.
assert(x0 == x1, "fderiv has the same sign on the entire interval");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce is better in this case because this is not an error of logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@andralex
Copy link
Member

Terrific, thanks. I see Mashiro is already at it, let's review this in good time.

unittest
{
// Check the type rules for normal()
assert(is(typeof(normal(0, 1)) == double));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, such type test should be static assert.
See other Phobos modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@WebDrake
Copy link
Contributor Author

Thanks to everyone for the great feedback -- the latest couple of patches should fix most of those.

A few extra things to add to the mix:

  • Bearophile has pointed me to a feature request for a fast uniform01: http://d.puremagic.com/issues/show_bug.cgi?id=5240 so I think we should rename fastUniformFloat accordingly -- any other opinions on this?
  • Likewise Bearophile has indicated that isPowerOfTwo is a desirable Phobos feature and has suggested renaming it to the shorter isPow2. I guess an appropriate location for this (as a public feature) would be std.math rather than std.random ... ?

is(typeof(ct!(a.max))) && is(typeof(ct!(a.min)));
}

private bool isPowerOfTwo(I)(I i){ return (i & (i - 1)) == 0; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation will wrongly return true if i == 0 -- should be corrected to i && !(i & (i - 1)) to avoid this. See: http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 (thanks to Bearophile for pointing me to this page).

Also suggest renaming to isPow2 and moving to std.math.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by Jerro's latest commit.

jkrempus and others added 3 commits December 25, 2012 20:23
* moved uniform01 so that it is documented right after the standard
  uniform() function

* separated out documentation of normal(), Normal and normalRNG()

* documented member functions for Normal and NormalBoxMullerEngine

* extended/clarified some of the examples

* a few extra unittests based on examples
@WebDrake
Copy link
Contributor Author

One design decision query.

Currently Normal.opCall expects to receive a uniform random number generator as input. It would be nice to be able to have the alternative to pass nothing, and have it use rndGen by default, i.e. to be able to do:

auto nrng = Normal!real(3, 7);

// Generate a number using default RNG (i.e. rndGen)
auto x = nrng();

// Generate number using specified RNG
auto r = Xorshift(1);
auto y = nrng(r);

However, I've been unable to come up with a solution that allows calling Normal.opCall without an RNG parameter. Can anyone suggest a way to do this?

@jkrempus
Copy link
Contributor

I'm not sure this would be a good thing to have. It would make it easy to accidentally call nrng() where you meant to call nrng(r) and they you could get nondeterministic results where you expected deterministic results.

@WebDrake
Copy link
Contributor Author

WebDrake commented Jan 7, 2013

Happy New Year to all, and apologies for the delay in following up here.

First, I hope the design/doc changes already made haven't caused difficulty in proceeding with the review process. Given the design questions I've raised here I wonder if it might have been better to ask for review via another channel than a pull request. However, since we're here, I'll raise the questions here too, and will happily move them to another discussion space if it's better.

The current design follows Boost.Random in having the Normal struct returning variates via opCall, which takes a uniform random number generator as input. In Boost.Random the corresponding classes are called "distributions", and these are coupled with uniform random number generators via another class, variate_generator.

However, there is an alternative, which is to have D's struct implementations of non-uniform random number generation follow the example of e.g. RandomSample, which couples with a uniform random number generator at construction time and then operates as a lazily-evaluated range. For this to really work (or indeed for a VariateGenerator struct akin to Boost's to work), it would be necessary for random number generators to be reworked as reference types (@monarchdodra did work on this, but I believe it was rejected in the short term as a breaking change: #893).

My own inclination, on careful thought, is that it's preferable to keep things as-is and to develop a VariateGenerator in future, but I'd appreciate others' thoughts here.

@jerro: re deterministic vs. nondeterministic results, isn't the same true of uniform() and normal()? I'm not sure I disagree with your thoughts on the inadvisability of allowing an opCall without URNG as parameter, but I'm not sure I see why it's acceptable for the function implementations but not for struct.

@jkrempus
Copy link
Contributor

jkrempus commented Jan 8, 2013

The same is true for uniform() and normal(), I haven't thought about that. So I guess that if we have it for functions, we may as well have it for structs, too.

@monarchdodra
Copy link
Collaborator

For this to really work (or indeed for a VariateGenerator struct akin to Boost's to work), it would be necessary for random number generators to be reworked as reference types (@monarchdodra did work on this, but I believe it was rejected in the short term as a breaking change: #893).

My own inclination, on careful thought, is that it's preferable to keep things as-is and to develop a VariateGenerator in future, but I'd appreciate others' thoughts here.

While I realize no-one likes a breaking change, I really think something needs to be done with PRNGs. The first problem is that a PRNG is both range and container. This means the sizeof of the basic PRNG (for, say, MT19997) is somewhere in the kilobytes (!!!)

One of the PRNG's I was working on: Lagged Fibonacci, could have a payload in mega byte sizes (!!!!!).

Passing these to any algorithm is enough to make any program grind to a halt, if not simply stack overflow.

And still, that's without taking into account the problems of PRNGs generating the same sequences multiple times, at the coder's knowledge...

This leaves us with (IMO), 1 of 2 options:

  1. Make the PRNG a stack based a container, that exposes a range that simply references the container
  2. Make PRNGs reference ranges.

Both are actually essentially the same thing though, and, if anything, 1 introduces even more problems...

I had gotten around to implementing a random2 module as suggested by @andralex (https://github.com/monarchdodra/phobos/blob/random2/std/random2.d) but never pushed it: It requires more work than just code change that (such as DIPs) I haven't done yet. It also requires a lot of effort to support the project. It also brings into question how to migrate modules, and I don't really want to be the one to push this subject...

Well, that's for my 0.02$ here.

@andralex : Do you think it is worth putting more effort into random2 ? I mean, am I putting my effort in the right direction, or should we look for other solutions?

@WebDrake
Copy link
Contributor Author

WebDrake commented Jan 8, 2013

@monarchdodra - let's open this up as a discussion on the mailing list/forum so we can keep the focus here on the current pull request. But broadly I agree with you that we need a better design for PRNGs since as-is it results in a variety of unpleasant effects. As the changes required will potentially break downstream code that surely mandates a random2, but you shouldn't have to be alone in supporting it.

@WebDrake
Copy link
Contributor Author

WebDrake commented Jan 8, 2013

Regarding a prospective VariateGenerator, it seems C++11 decided not to preserve Boost.Random's variate_generator class for reasons outlined here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1933.pdf

@WebDrake
Copy link
Contributor Author

WebDrake commented Jan 8, 2013

One of the PRNG's I was working on: Lagged Fibonacci, could have a payload in mega byte sizes (!!!!!).

Passing these to any algorithm is enough to make any program grind to a halt, if not simply stack overflow.

This is very relevant to the current code: since the struct Normal and the normal random number engines are all currently value types, this means they will suffer from the same problem.

This seems to me a fairly severe design flaw and I'm minded to withdraw the pull request until we have a clear spec for RNG design.

@WebDrake
Copy link
Contributor Author

I just realized this hasn't been closed and has been taking up time in the testing queue. My apologies. I'll resubmit a new normal distribution when std.random2 is done.

@WebDrake WebDrake closed this Aug 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants