New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openlibm acos() gives garbage results on i686 linux with GCC 7.1.0 #21742
Comments
The mystery deepens. I was noticing that I could call
So perhaps there are multiple things going on here. I should note that the
|
Can you post
|
If you can reproduce the issue with |
Native code output:
Disassembly of GCC 7.1 |
Simple C function: // simple.c
double foo(double x) {
return x * x;
} Compiled into shared library with GCC 7.1.0:
Works just fine: julia> ccall(Libdl.dlsym(Libdl.dlopen("libsimple"), "foo"), Float64, (Float64,), 2.5)
6.25
julia> ccall(("foo", "libsimple"), Float64, (Float64,), 0.8)
0.6400000000000001 Disassembly:
|
I can: julia> acos_func = Libdl.dlsym(Libdl.dlopen(Base.libm_name), "acos")
Ptr{Void} @0xe649dd27
julia> f(p, x) = ccall(p, Float64, (Float64,), x)
f (generic function with 1 method)
julia> f(acos_func, 0.8)
-9.486798276379531e184
I cannot:
|
Can you compile the simple examples with |
If by "simple examples" you mean the |
What about compiling openlibm and/or the sample code with |
If that fixes the issue, there might still be something wrong with GCC 7's x87 codegen. If it doesn't work or if you want to figure out what's wrong with gcc 7. Maybe breakpoint on acos, and run it in a way that you can reproduce the issue. Dump all registers |
Also, have you tried if |
Some of the asm difference seems to come from https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg524824.html. Not sure if it's related or if we care... |
Add to |
Alright, now we're getting somewhere! Compiling
But I can't run
|
Greater than 1 gives |
Alright, I Before:
After:
For those following along in the disassembly, we returned from offset |
Well this was a learning experience. Our troubles begin on this line. We To test this, I wrote a new C program with the intent to setup the stack such that it will give very large numbers if interpreted as a // acos_test.c
#include <stdio.h>
#include <openlibm.h>
// Fill the stack with 0x55, 0x55, 0x55...., which is a big double no matter where you start
void prepare_stack() {
volatile unsigned char stack[256];
for( unsigned int i=0; i<256; ++i )
stack[i] = 0x55;
}
int main(void) {
// Do this once so that we don't have to bother with the dl loading stuff
volatile double x = 0.8;
acos(x);
prepare_stack();
double y = acos(x);
printf("%.16f\n", y);
return 0;
} And this works, when linked against a bad copy of
So something is blowing up within the |
Alright, I think I've managed to get a MWE here: https://gist.github.com/staticfloat/d357b985eab757f393fa7e5ff1ee4101. I used |
@tkelman do you have ideas on how to meaningfully reduce this further, or should we file a GCC bug as-is? I'm hesitant to allow |
that seems reportable to me |
Reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 To workaround, I am setting |
Didn't |
Oh, I totally forgot about that. That's a better fix, thanks. |
How did you find that it was peephole2 that caused the problem here? Looks like gcc devs are looking into it, and have a potential patch already. |
I noticed that for flag in $list_of_o2_optimization_flags; do
gcc -o test -O1 $flag test.c && ./test
done And I found that
I am absolutely astounded by the speed with which they have responded. That is really impressive. |
makes sense. where's the list of flags from? sometimes they're fast, especially when given a good test case - the suitesparse -O3 internal compiler error on alpine linux only took about a day to fix IIRC |
Usually gcc bug reports get responses faster and llvm/clang bug reports get subscribers faster.... |
This was fixed upstream, but hasn't made it into a gcc release (7.2?) yet. Should we be warning distros not to use gcc 7.1 without patching this? |
We could also probably move to applying the end-result patch to the test buildbots' gcc build, as long as they're using 7.1, instead of turning off an optimization |
Yeah, probably. I'm not sure where we'd make this warning though.
@yuyichao had a better idea, which was tell GCC
This still passes our test suite of course, and I remember us going through some troubles a while back to try and ensure that 80-bit temporaries didn't screw up our floating-point tests. Despite not having done any performance tests to see exactly how much this changes things, I think this is a change we can live with. |
I thought that we solved that one with |
I think that did indeed solve our 80-bit precision problems by enabling |
I think we solves the issue in jit code by requiring pentium4. If I understand the GCC doc correctly, since Unless we expect external users of openlibm on sse-less x86 hardware, I think we should just set |
I believe JIT code is influenced by us setting
Nope, we explicitly decided to not support anything older than
👍
I don't think so, I think it sets |
Is this still relevant to Julia given that #23283 has been merged? |
Would still be worth reporting upstream to openlibm if you haven't already. |
This was a compiler bug; it's been fixed in GCC 7.2.0 (I just verified that is indeed the case), so I think we can reasonably lay this to rest. |
While testing out GCC 7.1.0 on the testing buildbots, I have discovered some weird behavior at the intersection of
ccall
,openlibm
andGCC 7.1.0
.If I compile libopenlibm with GCC 7.1.0, then run the following code, I get unrealistic answers (and they change after every invocation, making me think that something is returning garbage):
However, if I compile libopenlibm with GCC 6.3.0, I get reasonable answers:
Attempting to reduce this by writing a C test program fails. It always gives the right result, even when mixing and matching GCC versions between which compiled the executable and the library, or changing optimization levels. Here's the C test file:
So it appears that this bug is triggered only when:
Openlibm is compiled with GCC 7.1.0
The
acos()
function is called viaccall
within JuliaThe platform is i686 linux
Does anyone have any ideas on how to debug this further?
The text was updated successfully, but these errors were encountered: