-
Notifications
You must be signed in to change notification settings - Fork 338
Description
The normal_distribution implementation in boost::compute has some problems.
First, it generates values with standard deviation about 1.2 times what was requested. The following code demonstrates:
#include <boost/compute/system.hpp>
#include <boost/compute/command_queue.hpp>
#include <boost/compute/algorithm/accumulate.hpp>
#include <boost/compute/algorithm/inner_product.hpp>
#include <boost/compute/container/vector.hpp>
#include <boost/compute/random/default_random_engine.hpp>
#include <boost/compute/random/normal_distribution.hpp>
#include <boost/compute/lambda.hpp>
int main() {
boost::compute::device device(boost::compute::system::default_device());
boost::compute::context context(boost::compute::system::default_context());
boost::compute::command_queue queue(boost::compute::system::default_queue());
boost::compute::vector<float> vec(1000000, context);
boost::compute::default_random_engine engine(queue);
boost::compute::normal_distribution<float> distribution(5.0f, 1.0f);
distribution.generate(vec.begin(), vec.end(), engine, queue);
float mean = boost::compute::accumulate(vec.begin(), vec.end(), 0.f) / vec.size();
float mean_sq = boost::compute::inner_product(vec.begin(), vec.end(), vec.begin(), 0.f) / vec.size();
float stdev = sqrt(mean_sq - mean*mean);
std::cout << "mean: " << mean << " (expect 5); stdev: " << stdev << " (expect 1)\n";
}
which outputs:
mean: 5.00037 (expect 5); stdev: 1.20036 (expect 1)
This is caused by these lines in boost/compute/random/normal_distribution.hpp:
const RealType z1 = sqrt(-2.f * log2(x1)) * cos(2.f * M_PI_F * x2);
const RealType z2 = sqrt(-2.f * log2(x1)) * sin(2.f * M_PI_F * x2);
both of those log2 calls have to be log (i.e. using the natural logarithm). The ~1.2 scale in the standard deviation comes out of this mathematically: log2(x) = ln(x) / ln(2), and the square root of 1/ln(2) equals 1.2011224. i.e. the standardized variates are too large by a factor of sqrt(1/ln(2)).
I think (but have not verified) that there is another problem here: the generated x1 and x2 values can take on values of either 0 or a value larger than 1:
const RealType x1 = x.x / (RealType) (UINT_MAX - 1);
const RealType x2 = x.y / (RealType) (UINT_MAX - 1);
When x.x equals 0, x1 = 0, which results in both z1 and z2 being infinite, which is bad.
When x.x equals UINT_MAX, you can get a value larger than 1 for x1, which gives a positive value for the log, which then results in taking the sqrt of a negative, which is nan. Note, however, that this will only shows up when RealType has sufficient precision: when RealType is float, UINT_MAX-1 (=4294967294) actually rounds to UINT_MAX+1 (=4294967296). If RealType is double, however, you can run into the problem.
Basically, x1 needs to be in (0,1], and it currently violates the strictness of the left of that range, and can, depending on RealType, violate the right end of the range.
See boost/random/normal_distribution.hpp from a pre-1.56.0 version of boost (https://github.com/boostorg/random/blob/8c30753bf49504b5f677b8b410a3787907c86bd0/include/boost/random/normal_distribution.hpp) for a properly-working Box-Muller implementation (pre-1.56 because, in 1.56.0, the Box-Muller implementation was replaced with a ziggurat implementation).