Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] std::pow return NaN instead of correct result with ndk r21 #1198

Closed
brunotl opened this issue Mar 2, 2020 · 29 comments
Closed

[BUG] std::pow return NaN instead of correct result with ndk r21 #1198

brunotl opened this issue Mar 2, 2020 · 29 comments
Assignees
Labels
Milestone

Comments

@brunotl
Copy link

brunotl commented Mar 2, 2020

Description

using ndk r21, sd::pow return 'NaN' instead of finite floating point value on release build

  • same behavior using "-O1" "-O2" or "-O3" optimizer flag
  • result is correct when build with "-O0"
  • result is correct with ndk 20.1.5948944

this is the simpler code to replicate the problem :

    static constexpr double exp = 0.190263;
    static constexpr double base = 1013.25;

    double result = std::pow(base, exp);

expected result is '3.7314416933840704' but 'NaN' is returned.

seem's linked to this change in bionic/libm : https://android.googlesource.com/platform/bionic/+/f6b101d3ecfb2567834c6c439f1d1d3a4a7d844e

test case project :
https://github.com/brunotl/test_ndkr21

Environment Details

  • NDK Version: 21.0.6113669
  • Build system: CMake
  • Host OS: Windows 10, Debian Buster
  • ABI: arm64-v8a
  • Device API level: 23, 27, 28
@brunotl brunotl added the bug label Mar 2, 2020
@enh-google
Copy link
Collaborator

given that you're seeing this across multiple OS releases, it's not the bionic change :-)

also:

0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
   0:   fc1e0fe8        str     d8, [sp,#-32]!
   4:   a9017bfd        stp     x29, x30, [sp,#16]
   8:   910043fd        add     x29, sp, #0x10
   c:   90000008        adrp    x8, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
                        c: R_AARCH64_ADR_PREL_PG_HI21   .rodata.cst8
  10:   fd400108        ldr     d8, [x8]
                        10: R_AARCH64_LDST64_ABS_LO12_NC        .rodata.cst8
  14:   4ea81d00        mov     v0.16b, v8.16b
  18:   94000000        bl      0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
                        18: R_AARCH64_CALL26    _Z8isfiniteIdENSt6__ndk19enable_ifIXaasr3std13is_arithmeticIT_EE5valuesr3std14numeric_l
imitsIS2_EE12has_infinityEbE4typeES2_
  1c:   360000a0        tbz     w0, #0, 30 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI+0x30>
  20:   a9417bfd        ldp     x29, x30, [sp,#16]
  24:   4ea81d00        mov     v0.16b, v8.16b
  28:   fc4207e8        ldr     d8, [sp],#32
  2c:   d65f03c0        ret
  30:   90000000        adrp    x0, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
                        30: R_AARCH64_ADR_PREL_PG_HI21  .rodata.str1.1
  34:   90000002        adrp    x2, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
                        34: R_AARCH64_ADR_PREL_PG_HI21  .rodata.str1.1+0x31
  38:   90000003        adrp    x3, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
                        38: R_AARCH64_ADR_PREL_PG_HI21  .rodata.str1.1+0x83
  3c:   91000000        add     x0, x0, #0x0
                        3c: R_AARCH64_ADD_ABS_LO12_NC   .rodata.str1.1
  40:   91000042        add     x2, x2, #0x0
                        40: R_AARCH64_ADD_ABS_LO12_NC   .rodata.str1.1+0x31
  44:   91000063        add     x3, x3, #0x0
                        44: R_AARCH64_ADD_ABS_LO12_NC   .rodata.str1.1+0x83
  48:   52800221        mov     w1, #0x11                       // #17
  4c:   94000000        bl      0 <__assert2>
                        4c: R_AARCH64_CALL26    __assert2

so clang's basically just written the constant to .rodata. specifically:

0000000000000000 <.rodata.cst8>:
   0:   1a401fa0        .word   0x1a401fa0
   4:   400dd9fe        .word   0x400dd9fe

which is 0x400dd9fe1a401fa0 which is 3.7314416933840704.

@enh-google
Copy link
Collaborator

which is also what's showing up on my screen when i build and run your example...

@enh-google
Copy link
Collaborator

the armv7 code seems reasonable too?

   4:   eddf 0b0a       vldr    d16, [pc, #40]  ; 30 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI+0x30>
   8:   ec54 5b30       vmov    r5, r4, d16
   c:   4628            mov     r0, r5
   e:   4621            mov     r1, r4
...
  30:   1a401fa0        bne     1007eb8 <.L.str.1+0x1007e35>
  34:   400dd9fe        strdmi  sp, [sp], -lr

@enh-google
Copy link
Collaborator

and i see the right answer on my screen when i run the armv7 code on a Galaxy Nexus too.

so "can't reproduce" atm, unless you have more...

@brunotl
Copy link
Author

brunotl commented Mar 2, 2020

interesting, this is build result i got with "O1" :

0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
   0:	a9bf7bfd 	stp	x29, x30, [sp,#-16]!
   4:	910003fd 	mov	x29, sp
   8:	d2ffff08 	mov	x8, #0xfff8000000000000    	// #-2251799813685248
   c:	9e670100 	fmov	d0, x8
  10:	94000000 	bl	0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
  14:	360000a0 	tbz	w0, #0, 28 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI+0x28>
  18:	d2ffff08 	mov	x8, #0xfff8000000000000    	// #-2251799813685248
  1c:	9e670100 	fmov	d0, x8
  20:	a8c17bfd 	ldp	x29, x30, [sp],#16
  24:	d65f03c0 	ret
  28:	90000000 	adrp	x0, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
  2c:	90000002 	adrp	x2, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
  30:	90000003 	adrp	x3, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
  34:	91000000 	add	x0, x0, #0x0
  38:	91000042 	add	x2, x2, #0x0
  3c:	91000063 	add	x3, x3, #0x0
  40:	52800221 	mov	w1, #0x11                  	// #17
  44:	94000000 	bl	0 <__assert2>

this is the compile_commands.json :

[
{
  "directory": "C:/Users/bruno/devel/test_ndkr21/app/.cxx/cmake/debug/arm64-v8a",
  "command": "C:\\Users\\bruno\\AppData\\Local\\Android\\Sdk\\ndk\\21.0.6113669\\toolchains\\llvm\\prebuilt\\windows-x86_64\\bin\\clang++.exe --target=aarch64-none-linux-android21 --gcc-toolchain=C:/Users/bruno/AppData/Local/Android/Sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64 --sysroot=C:/Users/bruno/AppData/Local/Android/Sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64/sysroot  -Dnative_lib_EXPORTS  -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security   -O1 -fPIC   -o CMakeFiles\\native-lib.dir\\native-lib.cpp.o -c C:\\Users\\bruno\\devel\\test_ndkr21\\app\\src\\main\\cpp\\native-lib.cpp",
  "file": "C:\\Users\\bruno\\devel\\test_ndkr21\\app\\src\\main\\cpp\\native-lib.cpp"
}
]

@brunotl
Copy link
Author

brunotl commented Mar 2, 2020

by removing assert, result is more abvious :

with ndk 20.1.5948944

0000000000000000 <.rodata.cst8>:
   0:	1a401fa0 	.word	0x1a401fa0
   4:	400dd9fe 	.word	0x400dd9fe

Disassembly of section .text.Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI:

0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
   0:	90000008 	adrp	x8, 0 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>
   4:	fd400100 	ldr	d0, [x8]
   8:	d65f03c0 	ret

with ndk 21.0.6113669

Disassembly of section .text.Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI:

0000000000000000 <Java_org_bruno_test_1ndkr21_MainActivity_DoubleFromJNI>:
   0:	d2ffff08 	mov	x8, #0xfff8000000000000    	// #-2251799813685248
   4:	9e670100 	fmov	d0, x8
   8:	d65f03c0 	ret

@rprichard
Copy link
Collaborator

I looked at this issue a little on Friday, and it looked like a Windows-host-only bug where Clang's constant folding code called pow() and got the wrong result.

@rprichard
Copy link
Collaborator

e.g. Here's the Clang miscompilation on Windows:

test.c:

double get() { return __builtin_pow(2.0, 2.1); }

r20:

C:\src\mess>c:\src\android-ndk-r20\toolchains\llvm\prebuilt\windows-x86_64\bin\clang test.c -O2 -S & type test.s
...
        .quad   4616512856993891940     # double 4.2870938501451725
...

r21:

C:\src\mess>c:\src\android-ndk-r21\toolchains\llvm\prebuilt\windows-x86_64\bin\clang test.c -O2 -S & type test.s
...
        .quad   -2251799813685248       # double NaN
...

On Linux, both r20 and r21 use the .quad 4616512856993891940 # double 4.2870938501451725 constant.

@rprichard rprichard reopened this Mar 2, 2020
@DanAlbert DanAlbert added this to the r21b milestone Mar 2, 2020
@rprichard
Copy link
Collaborator

Here's a test program using Soong on aosp/master that demonstrates the pow() issue:

Android.bp:

cc_binary_host {
    name: "pow_test",
    srcs: ["pow_test.cpp"],
    stl: "libc++_static",
    compile_multilib: "both",
    multilib: {
        lib64: { suffix: "64" },
    },
    target: {
        windows: { enabled: true }
    },
}

pow_test.cpp:

#include <errno.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char* argv[]) {
  if (argc != 3) {
    fprintf(stderr, "usage: %s X Y\n", argv[0]);
    exit(1);
  }
  char* endptr;
  double x = strtod(argv[1], &endptr);
  double y = strtod(argv[2], &endptr);
  errno = 0;
  double result = pow(x, y);
  int err = errno;
  printf("%f errno=%d[%s]\n", result, err, strerror(err));
  return 0;
}
$ wine out/host/windows-x86/bin/pow_test.exe 2.0 2.1
1.000000 errno=34[Numerical result out of range]

$ wine out/host/windows-x86/bin/pow_test64.exe 2.0 2.1
nan errno=0[Success]

$ out/host/linux-x86/bin/pow_test64 2.0 2.1
4.287094 errno=0[Success]

The toolchain in goog/qt-dev works as expected:

$ wine out/host/windows-x86/bin/pow_test.exe 2.0 2.1
4.287094 errno=0[Success]

$ wine out/host/windows-x86/bin/pow_test64.exe 2.0 2.1
4.287094 errno=0[Success]

@rprichard
Copy link
Collaborator

I think Clang is calling pow via ConstantFoldBinaryFP in llvm/lib/Analysis/ConstantFolding.cpp. There are a few calls in ConstantFolding.cpp that pass pow to ConstantFoldBinaryFP.

@pirama-arumuga-nainar
Copy link
Collaborator

Thanks for the reduced test case @rprichard. This is most likely a bug in MinGW's math library. I'll try to look later this week. In the meantime, can you try if this reproduces with MinGW-7 from http://mingw-w64.org/doku.php?

@rprichard
Copy link
Collaborator

The test passes with the mingw-w64 gcc driver:

$ ./prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ \
    ./pow_test/pow_test.cpp -O2
$ wine ./a.exe 2.0 2.1
4.287094 errno=0[Success]

I believe this is using MinGW-6 with GCC. objdump -x still shows the ucrt API set DLLs -- I think we made that the default.

Maybe I need to figure out the difference between the GCC and Clang toolchains?

@rprichard
Copy link
Collaborator

Both toolchains produce an executable with a MinGW pow function that calls exp2l and log2l (long double variants of exp2/log2). With the GCC driver, I see the MinGW exp2l/log2l linked into the binary, whereas with the Soong/Clang toolchain, exp2l/log2l are imported from the operating system (ucrt, I think).

It looks like MSVC has an 8-byte long double, while MinGW uses something larger (sizeof(long double) is 12/16 for 32/64-bit.)

printf for long double (%Lf) appears broken with both GCC and Clang.

@rprichard
Copy link
Collaborator

Maybe we need to add -lucrt (and -lmsvcrt if that's really needed too) to the end of the linker command-line, at least after -lmingwex). Maybe it's sufficient to repeat -lmingwex before -lucrt/-lmsvcrt. (I confirmed that this is sufficient to fix this particular bug.)

The Clang driver has some MinGW support, and it knows about MinGW libraries like mingwex. I think it doesn't have explicit support for selecting ucrt. (I wonder if it tries to support both mingw-w64 and the original 32-bit-only MinGW project?)

@rprichard
Copy link
Collaborator

Here's what I tested, which fixed this bug:
https://android-review.googlesource.com/c/toolchain/llvm_android/+/1247113

@pirama-arumuga-nainar
Copy link
Collaborator

Thanks for the investigation @rprichard. The fix seems very similar to what we was needed for b/115909626. Do you think we'd need some change like this for the platform as well?

@rprichard
Copy link
Collaborator

Yes, I think the platform needs the same sort of change. The platform is explicitly linking against -lmsvcrt -lucrt (instead of -lucrt -lucrtbase). I don't understand why they're different, but in both cases, we apparently need -lmingwex to come first.

@enh-google
Copy link
Collaborator

enh-google commented Mar 3, 2020 via email

@rprichard
Copy link
Collaborator

FWIW, the issue here happens when Clang uses a double pow function, not long double. The MinGW pow function delegates to long double functions. (Also, it seems that libucrt[base] doesn't expose the Microsoft pow function. That seems odd to me, but maybe there's a reason for it.)

It looks like adb/aapt/fastboot use double FP, but maybe they don't use many math.h routines. I think most double-precision math routines would still work, because we'd use the Microsoft version.

Also, this issue is apparently a mingw-w64 bug. In the newer versions of mingw-w64, the incompatible long double functions are hidden from libucrt[base].a. There are a lot of patches to ucrtbase.def.in that we don't have yet. Here are some recent FP-related patches that I think affect x86/x86_64 programs:

mirror/mingw-w64@a84d22e
mirror/mingw-w64@e5d1fe4
mirror/mingw-w64@d8ed328
mirror/mingw-w64@d8183e4

Normally their existence here isn't an issue, but if -lucrt/-lucrtbase
happens to be linked before -lmingwex, these functions would happen to
be used instead of the ones in libmingwex.

(I don't understand why it's still listing the symbol but marking it "DATA". I know that the functions marked DATA aren't showing up in the libucrt[base].a stubs. I also don't know why pow is hidden.)

@pirama-arumuga-nainar
Copy link
Collaborator

@rprichard Should we cherry-pick and use these instead of https://android-review.googlesource.com/c/toolchain/llvm_android/+/1247113? The Clang MinGW driver assumes a msvcrt or ucrt is passed to the linker and implicitly adds mingwex after that. So cherry-picking these seems like a safer solution to me.

(I don't understand why it's still listing the symbol but marking it "DATA". I know that the functions marked DATA aren't showing up in the libucrt[base].a stubs.

It looks like symbols annotated with DATA are not exported as code. Not sure why we'd rather list a function symbol as a DATA instead of just omitting it.

I also don't know why pow is hidden.)

It's possible the ucrt interface exported by MinGW is only partial and we could add these if we want to.

@rprichard
Copy link
Collaborator

The Clang MinGW driver assumes a msvcrt or ucrt is passed to the linker and implicitly adds mingwex after that. So cherry-picking these seems like a safer solution to me.

Maybe, but I saw a bunch of commits modifying ucrtbase.def.in (and also touching other parts of MinGW). I suppose if we can identify an isolated set of patches fixing up ucrt floating-point, maybe that'd work?

OTOH, the Clang driver normally implicitly adds -lmingwex, then -lmsvcrt. The mingw-w64 commit comments imply that having -lucrt/-lucrtbase before -lmingwex is an unexpected situation that should still work, nevertheless.

It looks like the x86_64-w64-mingw32-gcc driver automatically uses ucrt. At least, I think the api-ms-win-crt-*.dll DLLs imply ucrt? There's definitely no msvcrt.dll usage:

$ /x/aosp/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-gcc test.c && objdump -x a.exe | grep DLL
 vma:            Hint    Time      Forward  DLL       First
	DLL Name: KERNEL32.dll
	DLL Name: api-ms-win-crt-time-l1-1-0.dll
	DLL Name: api-ms-win-crt-heap-l1-1-0.dll
	DLL Name: api-ms-win-crt-locale-l1-1-0.dll
	DLL Name: api-ms-win-crt-math-l1-1-0.dll
	DLL Name: api-ms-win-crt-private-l1-1-0.dll
	DLL Name: api-ms-win-crt-runtime-l1-1-0.dll
	DLL Name: api-ms-win-crt-environment-l1-1-0.dll
	DLL Name: api-ms-win-crt-stdio-l1-1-0.dll
	DLL Name: api-ms-win-crt-string-l1-1-0.dll

Linking a simple C program passes these -l arguments to the linker:

-lmingw32
-lgcc
-lgcc_eh
-lmoldname
-lmingwex
-lmsvcrt
-lpthread
-ladvapi32
-lshell32
-luser32
-lkernel32
-lmingw32
-lgcc
-lgcc_eh
-lmoldname
-lmingwex
-lmsvcrt

It's not specifying ucrt/ucrtbase.

The libmsvcrt.a and libucrt.a archives are almost byte-for-byte identical -- there is a timestamp in the header creating a 1-byte difference between them.

In the platform, we use -lmsvcrt -lucrt, which is presumably redundant. We don't use -lucrtbase. LLVM's build system uses -lucrt -lucrtbase. I don't know why we need -lucrtbase -- if we don't, then maybe a fix is to simply omit all of the ucrt/ucrtbase/msvcrt arguments.

It's possible the ucrt interface exported by MinGW is only partial and we could add these if we want to.

pow seems to be marked as DATA, only for x86. I'd assume there was some reason to hide it, so I'd leave it hidden.

@rprichard
Copy link
Collaborator

... then maybe a fix is to simply omit all of the ucrt/ucrtbase/msvcrt arguments.

This change also seems to fix the LLVM pow bug.

Aside: It looks like libucrt.a and libmsvcrt.a became mostly-identical after we switched MinGW's default crt from msvcrt to ucrt, which happened last October. I think we switched LLVM and the platform over to ucrt earlier than that, so the explicit -lucrt[base] arguments may have been more necessary initially.

brunotl referenced this issue in LK8000/LK8000 Mar 6, 2020
 pow(double, double) don't work in this version with optimizer set to "-O3"

This reverts commit e523951.
@pirama-arumuga-nainar
Copy link
Collaborator

... then maybe a fix is to simply omit all of the ucrt/ucrtbase/msvcrt arguments.

This change also seems to fix the LLVM pow bug.

Thanks for the investigation @rprichard. I take it this is the only change we need (and don't need any cherry-picks to MinGW)? If so, let's do this. The MinGW patches to support -lucrt/-lmsvcrt before -lmingwex can be part of a future MinGW update.

I also verified that removing this from the platfom (here) works (go/android-llvm-windows-testing). Let's do that as well.

Aside: It looks like libucrt.a and libmsvcrt.a became mostly-identical after we switched MinGW's default crt from msvcrt to ucrt, which happened last October. I think we switched LLVM and the platform over to ucrt earlier than that, so the explicit -lucrt[base] arguments may have been more necessary initially.

Yes, this definitely sounds right. I think the emulator (or another project) was still getting built with msvcrt after enabling ucrt. We switched to default ucrt to resolve this.

@rprichard
Copy link
Collaborator

Thanks for the investigation @rprichard. I take it this is the only change we need (and don't need any cherry-picks to MinGW)? If so, let's do this.

Yeah, I think removing the explicit -lucrt/-lucrtbase is the only change we need for NDK r21b. We can upgrade mingw-w64 later.

@pendyalasyam
Copy link

Hi Folks,

Recently we got this problem hit in our apps as well while upgrading in NDK R21. We could resolve the issue by calling std::pow through some other function.

As per our understanding, std::pow got broke when used with double and constant folding.

Could somebody please confirm, is std::pow the only function that got broke like this? Or are there any other math functions broke like this in NDK R21?

@enh-google
Copy link
Collaborator

as far as we know, it's all compile time floating point constant evaluation (when built on Windows).

@pendyalasyam
Copy link

pendyalasyam commented Mar 27, 2020

We are building our android project on Windows with optimization flag "-Os" and found that sqrt, cbrt, exp, log functions are working fine when used with "const double". I guess, may be it is not for all functions. Could you please double check and confirm.

@stephenhines
Copy link
Collaborator

@DanAlbert
Copy link
Member

Fix is in build 6352462 on https://ci.android.com/builds/branches/aosp-ndk-release-r21/grid?

wantguns pushed a commit to wantguns/toolchain_llvm_android that referenced this issue Jul 17, 2020
MinGW now defaults to use ucrt (in paricular libmsvcrt.a is equivalent
to libucrt.a).  We don't need to explicitly include ucrt, ucrtbase since
the clang driver includes msvcrt by default.  Moreover, the driver also
adds -lmingwex before -lmsvcrt, which addresses the wrong symbol
resolution discussed in android/ndk#1198.

Test: reproducer in above bug.
Change-Id: I542bb03d9e6999f9f6fb56eac34f0d567991b183
hjiayz pushed a commit to hjiayz/llvm_android that referenced this issue Feb 1, 2021
MinGW now defaults to use ucrt (in paricular libmsvcrt.a is equivalent
to libucrt.a).  We don't need to explicitly include ucrt, ucrtbase since
the clang driver includes msvcrt by default.  Moreover, the driver also
adds -lmingwex before -lmsvcrt, which addresses the wrong symbol
resolution discussed in android/ndk#1198.

(This is a backport of https://r.android.com/1252846 because the
original change does not cherry-pick cleanly).

Test: reproducer in above bug.
Change-Id: If58b7ba2f045592319330a1b90f33735599b24c6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants