PI example bug? #1

Closed
simendsjo opened this Issue Apr 15, 2011 · 9 comments

Comments

Projects
None yet
2 participants

I tried the first PI example in your documentation, and I get different results each time I run it. Seems the 12 first digits are the same each time. I added the following at the end:
writefln("%1.19f", pi);

I've tried without -O and -inline, but I get the same results each time. This is using dmd 2.052 on widows on an Intel Xeon with 4 cores and HT.

FYI, running sequential returns the same number each time.

@simendsjo simendsjo closed this Apr 15, 2011

@simendsjo simendsjo reopened this Apr 15, 2011

Owner

dsimcha commented Apr 15, 2011

Thanks for your report. These results are probably correct. As I note in
the documentation for parallel reduce, the reduction operator must be
associative. Addition in exact arithmetic is associative. Floating point
arithmetic is not associative, but is approximately associative in the
well-behaved cases. Therefore, there will always be some non-determinism in
the low-order bits, at least when work unit size, etc. is varied. As far as
non-determinism across runs on the same hardware with the same settings,
I'll look into it tonight, but I suspect it's still some weird floating
issue, not a bug in std.parallelism.

On Fri, Apr 15, 2011 at 10:33 AM, simendsjo <
reply@reply.github.com>wrote:

I tried the first PI example in your documentation, and I get different
results each time I run it. Seems the 12 first digits are the same each
time. I added the following at the end:
writefln("%1.19f", pi);

I've tried without -O and -inline, but I get the same results each time.
This is using dmd 2.052 on widows on an Intel Xeon with 4 cores and HT.

Reply to this email directly or view it on GitHub:
#1

Owner

dsimcha commented Apr 15, 2011

Thanks for your report. These results are probably correct. As I note in the documentation for parallel reduce, the reduction operator must be associative. Addition in exact arithmetic is associative. Floating point arithmetic is not associative, but is approximately associative in the well-behaved cases. Therefore, there will always be some non-determinism in the low-order bits, at least when work unit size, etc. is varied. As far as non-determinism across runs on the same hardware with the same settings, I'll look into it tonight, but I suspect it's still some weird floating issue, not a bug in std.parallelism.

@dsimcha dsimcha closed this Apr 15, 2011

@dsimcha dsimcha reopened this Apr 15, 2011

Owner

dsimcha commented Apr 15, 2011

I can't reproduce this on my dual core Athlon 64 X2. The results are slightly different depending on how many threads I use, which is expected since floating point addition is only approximately associative. However, across identical runs the results are consistent.

If you can still reproduce this bug, it may be that different cores are in different floating point rounding modes or something, and the answer depends on what core things get scheduled on. Based on rereading the code, I can't see how only the low-order bits could be affected by any kind of concurrency bug.

@dsimcha dsimcha closed this Apr 15, 2011

What do you mean by "identical runs"?
I'm at another computer now, but the results remain the same. I changed the example a bit:

void calcPi() {
    immutable pi = 4.0 * taskPool.reduce!"a + b"(
        std.algorithm.map!getTerm(iota(n))
    );
    writefln("%1.19f", pi);
}

calcPi();
calcPi();
calcPi();

C:\temp>dmd -inline -O -release -ofpi pi

C:\temp>pi
3.1415926555897901559
3.1415926555897906796
3.1415926555897890834

C:\temp>pi
3.1415926555897859763
3.1415926555897986311
3.1415926555897861944

Not sure that's what you meant by identical though..

Oh, and by using --nCpu=2 I get 13/14 decimals the same each run (more than running 4/8)... I don't know much about floating point, so I don't understand much of what you are saying.
Running on multiple cores do what..? "in different floating point rounding mode"? Is parallel computation of floating point somehow dangerous..?

Owner

dsimcha commented Apr 15, 2011

My apologies. I can reproduce this. I actually have two different pi examples and for some reason I thought you meant the other one. Reopening. I still think this is probably related to some obscure floating point minutiae and not a "real" bug, but I'd like to find out for sure before closing it.

As far as rounding modes, Wikipedia has a good description (http://en.wikipedia.org/wiki/Floating_point#Rounding_modes). I don't understand much more. All I know is that rounding modes are set per-CPU and I can't think of any other reason for such strange behavior.

Generally I don't even look at the low order bits of my floating point results because there are so many details (such as compiler optimizations and rounding modes) that can change them and the answer will still be right for all practical purposes. In this case I think looking into it further is justified because I can't think of any reason why the answers wouldn't be exactly the same.

@dsimcha dsimcha reopened this Apr 15, 2011

Owner

dsimcha commented Apr 16, 2011

Another weird observation: The smallest element of the range we're summing is about 5e-10. Therefore, differences that only show up in the 12th decimal place have to be somehow rounding-related or something. We're definitely not skipping any terms, etc.

Owner

dsimcha commented Apr 16, 2011

Thanks for your report. It was extremely interesting to track down. It turns out that it's not a bug in std.parallelism, it's a bug in the way druntime and Windows handle floating point state when creating new threads. See http://d.puremagic.com/issues/show_bug.cgi?id=5847 .

@dsimcha dsimcha closed this Apr 16, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment