Skip to content

boost::compute::normal_distribution variates have wrong standard deviation, and can produce infinite/nan values #587

@jagerman

Description

@jagerman

The normal_distribution implementation in boost::compute has some problems.

First, it generates values with standard deviation about 1.2 times what was requested. The following code demonstrates:

#include <boost/compute/system.hpp>
#include <boost/compute/command_queue.hpp>
#include <boost/compute/algorithm/accumulate.hpp>
#include <boost/compute/algorithm/inner_product.hpp>
#include <boost/compute/container/vector.hpp>
#include <boost/compute/random/default_random_engine.hpp>
#include <boost/compute/random/normal_distribution.hpp>
#include <boost/compute/lambda.hpp>

int main() {
    boost::compute::device device(boost::compute::system::default_device());
    boost::compute::context context(boost::compute::system::default_context());
    boost::compute::command_queue queue(boost::compute::system::default_queue());
    boost::compute::vector<float> vec(1000000, context);
    boost::compute::default_random_engine engine(queue);
    boost::compute::normal_distribution<float> distribution(5.0f, 1.0f);
    distribution.generate(vec.begin(), vec.end(), engine, queue);
    float mean = boost::compute::accumulate(vec.begin(), vec.end(), 0.f) / vec.size();
    float mean_sq = boost::compute::inner_product(vec.begin(), vec.end(), vec.begin(), 0.f) / vec.size();
    float stdev = sqrt(mean_sq - mean*mean);
    std::cout << "mean: " << mean << " (expect 5); stdev: " << stdev << " (expect 1)\n";
}

which outputs:

mean: 5.00037 (expect 5); stdev: 1.20036 (expect 1)

This is caused by these lines in boost/compute/random/normal_distribution.hpp:

            const RealType z1 = sqrt(-2.f * log2(x1)) * cos(2.f * M_PI_F * x2);
            const RealType z2 = sqrt(-2.f * log2(x1)) * sin(2.f * M_PI_F * x2);

both of those log2 calls have to be log (i.e. using the natural logarithm). The ~1.2 scale in the standard deviation comes out of this mathematically: log2(x) = ln(x) / ln(2), and the square root of 1/ln(2) equals 1.2011224. i.e. the standardized variates are too large by a factor of sqrt(1/ln(2)).

I think (but have not verified) that there is another problem here: the generated x1 and x2 values can take on values of either 0 or a value larger than 1:

            const RealType x1 = x.x / (RealType) (UINT_MAX - 1);
            const RealType x2 = x.y / (RealType) (UINT_MAX - 1);

When x.x equals 0, x1 = 0, which results in both z1 and z2 being infinite, which is bad.

When x.x equals UINT_MAX, you can get a value larger than 1 for x1, which gives a positive value for the log, which then results in taking the sqrt of a negative, which is nan. Note, however, that this will only shows up when RealType has sufficient precision: when RealType is float, UINT_MAX-1 (=4294967294) actually rounds to UINT_MAX+1 (=4294967296). If RealType is double, however, you can run into the problem.

Basically, x1 needs to be in (0,1], and it currently violates the strictness of the left of that range, and can, depending on RealType, violate the right end of the range.

See boost/random/normal_distribution.hpp from a pre-1.56.0 version of boost (https://github.com/boostorg/random/blob/8c30753bf49504b5f677b8b410a3787907c86bd0/include/boost/random/normal_distribution.hpp) for a properly-working Box-Muller implementation (pre-1.56 because, in 1.56.0, the Box-Muller implementation was replaced with a ziggurat implementation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions