New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stackplot_test_baseline has different results on 32-bit and 64-bit platforms #1726
Conversation
I feel like we've had an issue exactly like this in the past with bounding boxes. A cheap-and-dirty hack would be to update the the |
That's not a bad idea for a quick fix to get the tests passing -- but I'd still like to get to the bottom of this, of course. |
Okay -- I'm posting this mostly out of desperation to see if anyone else has any ideas. The difference in these platforms comes down to the affine transformation. At one point, the collection is mapped through a transformation to determine the bounds of the data. While the inputs to I've boiled it down to a simple C program:
In the Travis environment (which is a recent i686 Ubuntu with gcc 4.6), I get:
On my personal machine, F18 x86_64 with gcc 4.7, I get:
It is this small difference that is causing the limit detection to fall over on this test. We have some ways to solve this: 1) find out why there is this inconsistency and work around it if possible. 2) make the locators to ignore very small crossings of the boundaries -- this of course needs to be some dynamically determined fraction of the overall limits. (That is, if the true range is (-1e-45 to 100), it should just round to (0, 100)). In the long run, we're probably going to have to do something like 2) in any event to be more robust against floating-point error anyway. Any other thoughts? |
Additional datapoint: CentOS6, 64-bit, gcc version 4.4.6 So, it seems to be dependent on 32/64 bits and not gcc versions. If I can |
Alright. This was a fun one. This article was helpful in decoding some of the quirks of x86 floating-point. It may actually be the case that to trigger this you have to be both x86 (not x86-64) and non-SSE (which is true for the Travis VMs). http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html A fix is attached, which (hopefully) should pass on Travis. |
The Travis tests are better than master (only one failure now), so I'm ready to count this as a success ;) |
stackplot_test_baseline has different results on 32-bit and 64-bit platforms
the code kind of looks like its relevant for performance, would it make sense to only use the unoptimized version on the platform its needed via ifdefs? maybe one could also use long double instead of double, those still have decent performance on i386 |
@juliantaylor: This fix isn't really about optimization -- it's able consistent floating point results across platforms. Can you clarify what you mean by "Also doesn't the line right below the fixed one have the same problem?" Using long double instead of double would actually exacerbate the problem, since they are of different sizes on i386 and x86_64. |
the else clause below the fixed one has the same expression with lots of temporaries which are not spilled to the stack explicitly, it may have the same inconsistency problem. long double has the same size on i386 and amd64, 80 bit the size of the fpu stack (with gcc aligned to 16 byte). Using them means you don't lose your excess precision when the variable spills on the stack making it more consistent (but you lose vectorization on amd64) a drastic alternative would be to ditch support for non SSE x86 cpu's and change the fpu mode to sse globally (-mfpmath=sse with gcc), but it would make all float calculations in matplotlib more consistent. |
I haven't quite gotten down to the bottom of this one yet, but this is the cause of the current failures on Travis.
On a 64-bit Linux, I get the following for
stackplot_test_baseline
(and I suspect this is correct):And on 32-bit:
The problem appears to be somewhere in the calculation of the data limits. On 64-bit, the xmin is exactly 0, on 32-bit it's a very small negative number which causes the locator to add an extra step on the left.