stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726

mdboom · 2013-02-04T18:39:17Z

I haven't quite gotten down to the bottom of this one yet, but this is the cause of the current failures on Travis.

On a 64-bit Linux, I get the following for stackplot_test_baseline (and I suspect this is correct):

And on 32-bit:

The problem appears to be somewhere in the calculation of the data limits. On 64-bit, the xmin is exactly 0, on 32-bit it's a very small negative number which causes the locator to add an extra step on the left.

dmcdougall · 2013-01-31T02:43:46Z

I feel like we've had an issue exactly like this in the past with bounding boxes. A cheap-and-dirty hack would be to update the the [xy]lim in the test so that no bad rounding happens in the 32-bit case.

mdboom · 2013-01-31T16:14:24Z

That's not a bad idea for a quick fix to get the tests passing -- but I'd still like to get to the bottom of this, of course.

mdboom · 2013-02-01T20:36:49Z

Okay -- I'm posting this mostly out of desperation to see if anyone else has any ideas.

The difference in these platforms comes down to the affine transformation. At one point, the collection is mapped through a transformation to determine the bounds of the data. While the inputs to _path.cpp::affine_transform are bit-for-bit exactly the same, the outputs are not. It is all double arithmetic throughout, so I am surprised there is a platform difference (as one might expect from long doubles which are different on these platforms).

I've boiled it down to a simple C program:

#include <stdio.h>

int main() {
  double b = 0.0d;
  double x = 80.0d;
  double d = 0x1.5555555555555p-9d;
  double y = 48.0d;
  double f = -0x1.ffffffffffffdp-4d;

  double t = b * x + d * y + f;

  printf("Output %g %a\n", t, t);
}

In the Travis environment (which is a recent i686 Ubuntu with gcc 4.6), I get:

Output 3.46945e-17 0x1.4p-55

On my personal machine, F18 x86_64 with gcc 4.7, I get:

Output 4.16334e-17 0x1.8p-55

It is this small difference that is causing the limit detection to fall over on this test.

We have some ways to solve this: 1) find out why there is this inconsistency and work around it if possible. 2) make the locators to ignore very small crossings of the boundaries -- this of course needs to be some dynamically determined fraction of the overall limits. (That is, if the true range is (-1e-45 to 100), it should just round to (0, 100)).

In the long run, we're probably going to have to do something like 2) in any event to be more robust against floating-point error anyway. Any other thoughts?

WeatherGod · 2013-02-01T20:53:12Z

Additional datapoint: CentOS6, 64-bit, gcc version 4.4.6
Output 4.16334e-17 0x1.8p-55

So, it seems to be dependent on 32/64 bits and not gcc versions. If I can
get my old laptop running tonight, I can confirm with a 32-bit data point.

mdboom · 2013-02-04T18:44:58Z

Alright. This was a fun one.

This article was helpful in decoding some of the quirks of x86 floating-point. It may actually be the case that to trigger this you have to be both x86 (not x86-64) and non-SSE (which is true for the Travis VMs).

http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html

A fix is attached, which (hopefully) should pass on Travis.

mdboom · 2013-02-04T19:06:10Z

The Travis tests are better than master (only one failure now), so I'm ready to count this as a success ;)

stackplot_test_baseline has different results on 32-bit and 64-bit platforms

juliantaylor · 2013-02-19T20:53:08Z

the code kind of looks like its relevant for performance, would it make sense to only use the unoptimized version on the platform its needed via ifdefs?
Also doesn't the line right below the fixed one have the same problem?

maybe one could also use long double instead of double, those still have decent performance on i386

mdboom · 2013-02-20T00:58:13Z

@juliantaylor: This fix isn't really about optimization -- it's able consistent floating point results across platforms. Can you clarify what you mean by "Also doesn't the line right below the fixed one have the same problem?"

Using long double instead of double would actually exacerbate the problem, since they are of different sizes on i386 and x86_64.

juliantaylor · 2013-02-20T18:03:42Z

the else clause below the fixed one has the same expression with lots of temporaries which are not spilled to the stack explicitly, it may have the same inconsistency problem.

long double has the same size on i386 and amd64, 80 bit the size of the fpu stack (with gcc aligned to 16 byte). Using them means you don't lose your excess precision when the variable spills on the stack making it more consistent (but you lose vectorization on amd64)
It may not exist on some non x86 platforms, but this issue is x86 specific.

a drastic alternative would be to ditch support for non SSE x86 cpu's and change the fpu mode to sse globally (-mfpmath=sse with gcc), but it would make all float calculations in matplotlib more consistent.

Fix floating-point in affine transformation on x86 vs v86-64

488e6a2

mdboom added a commit that referenced this pull request Feb 4, 2013

Merge pull request #1726 from mdboom/affine_transform_math

a4e4c68

stackplot_test_baseline has different results on 32-bit and 64-bit platforms

mdboom merged commit a4e4c68 into matplotlib:master Feb 4, 2013

mdboom deleted the affine_transform_math branch August 7, 2014 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726

stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726

mdboom commented Feb 4, 2013

dmcdougall commented Jan 31, 2013

mdboom commented Jan 31, 2013

mdboom commented Feb 1, 2013

WeatherGod commented Feb 1, 2013

mdboom commented Feb 4, 2013

mdboom commented Feb 4, 2013

juliantaylor commented Feb 19, 2013

mdboom commented Feb 20, 2013

juliantaylor commented Feb 20, 2013

stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726

stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726

Conversation

mdboom commented Feb 4, 2013

dmcdougall commented Jan 31, 2013

mdboom commented Jan 31, 2013

mdboom commented Feb 1, 2013

WeatherGod commented Feb 1, 2013

mdboom commented Feb 4, 2013

mdboom commented Feb 4, 2013

juliantaylor commented Feb 19, 2013

mdboom commented Feb 20, 2013

juliantaylor commented Feb 20, 2013