New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] std::pow return NaN instead of correct result with ndk r21 #1198
Comments
given that you're seeing this across multiple OS releases, it's not the bionic change :-) also:
so clang's basically just written the constant to .rodata. specifically:
which is 0x400dd9fe1a401fa0 which is 3.7314416933840704. |
which is also what's showing up on my screen when i build and run your example... |
the armv7 code seems reasonable too?
|
and i see the right answer on my screen when i run the armv7 code on a Galaxy Nexus too. so "can't reproduce" atm, unless you have more... |
interesting, this is build result i got with "O1" : 0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
0: a9bf7bfd stp x29, x30, [sp,#-16]!
4: 910003fd mov x29, sp
8: d2ffff08 mov x8, #0xfff8000000000000 // #-2251799813685248
c: 9e670100 fmov d0, x8
10: 94000000 bl 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
14: 360000a0 tbz w0, #0, 28 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI+0x28>
18: d2ffff08 mov x8, #0xfff8000000000000 // #-2251799813685248
1c: 9e670100 fmov d0, x8
20: a8c17bfd ldp x29, x30, [sp],#16
24: d65f03c0 ret
28: 90000000 adrp x0, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
2c: 90000002 adrp x2, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
30: 90000003 adrp x3, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
34: 91000000 add x0, x0, #0x0
38: 91000042 add x2, x2, #0x0
3c: 91000063 add x3, x3, #0x0
40: 52800221 mov w1, #0x11 // #17
44: 94000000 bl 0 <__assert2> this is the compile_commands.json : [
{
"directory": "C:/Users/bruno/devel/test_ndkr21/app/.cxx/cmake/debug/arm64-v8a",
"command": "C:\\Users\\bruno\\AppData\\Local\\Android\\Sdk\\ndk\\21.0.6113669\\toolchains\\llvm\\prebuilt\\windows-x86_64\\bin\\clang++.exe --target=aarch64-none-linux-android21 --gcc-toolchain=C:/Users/bruno/AppData/Local/Android/Sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64 --sysroot=C:/Users/bruno/AppData/Local/Android/Sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64/sysroot -Dnative_lib_EXPORTS -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -O1 -fPIC -o CMakeFiles\\native-lib.dir\\native-lib.cpp.o -c C:\\Users\\bruno\\devel\\test_ndkr21\\app\\src\\main\\cpp\\native-lib.cpp",
"file": "C:\\Users\\bruno\\devel\\test_ndkr21\\app\\src\\main\\cpp\\native-lib.cpp"
}
] |
by removing assert, result is more abvious : with ndk 20.1.5948944 0000000000000000 <.rodata.cst8>:
0: 1a401fa0 .word 0x1a401fa0
4: 400dd9fe .word 0x400dd9fe
Disassembly of section .text.Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI:
0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
0: 90000008 adrp x8, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
4: fd400100 ldr d0, [x8]
8: d65f03c0 ret with ndk 21.0.6113669 Disassembly of section .text.Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI:
0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
0: d2ffff08 mov x8, #0xfff8000000000000 // #-2251799813685248
4: 9e670100 fmov d0, x8
8: d65f03c0 ret |
I looked at this issue a little on Friday, and it looked like a Windows-host-only bug where Clang's constant folding code called |
e.g. Here's the Clang miscompilation on Windows: test.c: double get() { return __builtin_pow(2.0, 2.1); } r20:
r21:
On Linux, both r20 and r21 use the |
Here's a test program using Soong on aosp/master that demonstrates the Android.bp:
pow_test.cpp: #include <errno.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char* argv[]) {
if (argc != 3) {
fprintf(stderr, "usage: %s X Y\n", argv[0]);
exit(1);
}
char* endptr;
double x = strtod(argv[1], &endptr);
double y = strtod(argv[2], &endptr);
errno = 0;
double result = pow(x, y);
int err = errno;
printf("%f errno=%d[%s]\n", result, err, strerror(err));
return 0;
}
The toolchain in goog/qt-dev works as expected:
|
I think Clang is calling |
Thanks for the reduced test case @rprichard. This is most likely a bug in MinGW's math library. I'll try to look later this week. In the meantime, can you try if this reproduces with MinGW-7 from http://mingw-w64.org/doku.php? |
The test passes with the mingw-w64 gcc driver:
I believe this is using MinGW-6 with GCC. Maybe I need to figure out the difference between the GCC and Clang toolchains? |
Both toolchains produce an executable with a MinGW pow function that calls exp2l and log2l (long double variants of exp2/log2). With the GCC driver, I see the MinGW exp2l/log2l linked into the binary, whereas with the Soong/Clang toolchain, exp2l/log2l are imported from the operating system (ucrt, I think). It looks like MSVC has an 8-byte long double, while MinGW uses something larger (sizeof(long double) is 12/16 for 32/64-bit.) printf for long double ( |
Maybe we need to add The Clang driver has some MinGW support, and it knows about MinGW libraries like mingwex. I think it doesn't have explicit support for selecting ucrt. (I wonder if it tries to support both mingw-w64 and the original 32-bit-only MinGW project?) |
Here's what I tested, which fixed this bug: |
Thanks for the investigation @rprichard. The fix seems very similar to what we was needed for b/115909626. Do you think we'd need some change like this for the platform as well? |
Yes, I think the platform needs the same sort of change. The platform is explicitly linking against |
I suspect the platform gets away with it because there's not much call for
long double in adb or aapt 😀
…On Tue, Mar 3, 2020, 15:13 Ryan Prichard ***@***.***> wrote:
Yes, I think the platform needs the same sort of change. The platform is
explicitly linking against -lmsvcrt -lucrt (instead of -lucrt -lucrtbase).
I don't understand why they're different, but in both cases, we apparently
need -lmingwex to come first.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1198?email_source=notifications&email_token=AMVLEWBA6JOQOLGTSUSYLPLRFWFJTA5CNFSM4K7XUEK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENVRG5A#issuecomment-594219892>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMVLEWEEGSN5IT4UGUUATSTRFWFJTANCNFSM4K7XUEKQ>
.
|
FWIW, the issue here happens when Clang uses a double It looks like adb/aapt/fastboot use double FP, but maybe they don't use many math.h routines. I think most double-precision math routines would still work, because we'd use the Microsoft version. Also, this issue is apparently a mingw-w64 bug. In the newer versions of mingw-w64, the incompatible long double functions are hidden from libucrt[base].a. There are a lot of patches to ucrtbase.def.in that we don't have yet. Here are some recent FP-related patches that I think affect x86/x86_64 programs: mirror/mingw-w64@a84d22e
(I don't understand why it's still listing the symbol but marking it "DATA". I know that the functions marked DATA aren't showing up in the libucrt[base].a stubs. I also don't know why |
@rprichard Should we cherry-pick and use these instead of https://android-review.googlesource.com/c/toolchain/llvm_android/+/1247113? The Clang MinGW driver assumes a
It looks like symbols annotated with
It's possible the |
Maybe, but I saw a bunch of commits modifying ucrtbase.def.in (and also touching other parts of MinGW). I suppose if we can identify an isolated set of patches fixing up ucrt floating-point, maybe that'd work? OTOH, the Clang driver normally implicitly adds It looks like the x86_64-w64-mingw32-gcc driver automatically uses ucrt. At least, I think the api-ms-win-crt-*.dll DLLs imply ucrt? There's definitely no msvcrt.dll usage:
Linking a simple C program passes these
It's not specifying ucrt/ucrtbase. The libmsvcrt.a and libucrt.a archives are almost byte-for-byte identical -- there is a timestamp in the header creating a 1-byte difference between them. In the platform, we use
|
This change also seems to fix the LLVM pow bug. Aside: It looks like libucrt.a and libmsvcrt.a became mostly-identical after we switched MinGW's default crt from msvcrt to ucrt, which happened last October. I think we switched LLVM and the platform over to ucrt earlier than that, so the explicit |
pow(double, double) don't work in this version with optimizer set to "-O3" This reverts commit e523951.
Thanks for the investigation @rprichard. I take it this is the only change we need (and don't need any cherry-picks to MinGW)? If so, let's do this. The MinGW patches to support -lucrt/-lmsvcrt before -lmingwex can be part of a future MinGW update. I also verified that removing this from the platfom (here) works (go/android-llvm-windows-testing). Let's do that as well.
Yes, this definitely sounds right. I think the emulator (or another project) was still getting built with |
Yeah, I think removing the explicit -lucrt/-lucrtbase is the only change we need for NDK r21b. We can upgrade mingw-w64 later. |
Hi Folks, Recently we got this problem hit in our apps as well while upgrading in NDK R21. We could resolve the issue by calling std::pow through some other function. As per our understanding, std::pow got broke when used with double and constant folding. Could somebody please confirm, is std::pow the only function that got broke like this? Or are there any other math functions broke like this in NDK R21? |
as far as we know, it's all compile time floating point constant evaluation (when built on Windows). |
We are building our android project on Windows with optimization flag "-Os" and found that sqrt, cbrt, exp, log functions are working fine when used with "const double". I guess, may be it is not for all functions. Could you please double check and confirm. |
https://android-review.googlesource.com/q/topic:%22windows-mingw-fixes-r21b%22+(status:open%20OR%20status:merged) are the merged patches for the r21b toolchain. |
Fix is in build 6352462 on https://ci.android.com/builds/branches/aosp-ndk-release-r21/grid? |
MinGW now defaults to use ucrt (in paricular libmsvcrt.a is equivalent to libucrt.a). We don't need to explicitly include ucrt, ucrtbase since the clang driver includes msvcrt by default. Moreover, the driver also adds -lmingwex before -lmsvcrt, which addresses the wrong symbol resolution discussed in android/ndk#1198. Test: reproducer in above bug. Change-Id: I542bb03d9e6999f9f6fb56eac34f0d567991b183
MinGW now defaults to use ucrt (in paricular libmsvcrt.a is equivalent to libucrt.a). We don't need to explicitly include ucrt, ucrtbase since the clang driver includes msvcrt by default. Moreover, the driver also adds -lmingwex before -lmsvcrt, which addresses the wrong symbol resolution discussed in android/ndk#1198. (This is a backport of https://r.android.com/1252846 because the original change does not cherry-pick cleanly). Test: reproducer in above bug. Change-Id: If58b7ba2f045592319330a1b90f33735599b24c6
Description
using ndk r21, sd::pow return 'NaN' instead of finite floating point value on release build
this is the simpler code to replicate the problem :
expected result is '3.7314416933840704' but 'NaN' is returned.
seem's linked to this change in bionic/libm : https://android.googlesource.com/platform/bionic/+/f6b101d3ecfb2567834c6c439f1d1d3a4a7d844e
test case project :
https://github.com/brunotl/test_ndkr21
Environment Details
The text was updated successfully, but these errors were encountered: