Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fseek crashes with sigsegv #562

Closed
jaewoosong opened this issue Oct 31, 2017 · 31 comments
Closed

fseek crashes with sigsegv #562

jaewoosong opened this issue Oct 31, 2017 · 31 comments

Comments

@jaewoosong
Copy link

Description

Ndk version 11, 12, 13 and 14 can run the following jni code without problem. But Ndk version 15 (15, 15b, 15c) makes sigsegv error.

  • Code
long android_asset_get_size(const char *name) {
  FILE *fl = android_fopen(name, "r");
  if (fl == NULL)
    return -1;
  fseek(fl, 0, SEEK_END); # ERROR: sigsegv occurs here
  long len = ftell(fl);
  fclose(fl);
  return len;
}

FILE* android_fopen(const char* fname, const char* mode) {
  if(mode[0] == 'w') return NULL;
  AAsset* asset = AAssetManager_open(android_asset_manager, fname, 0);
  if(!asset) return NULL;
  return funopen(asset, android_read, android_write, android_seek, android_close);
}
  • Application.mk
APP_STL := gnustl_shared
APP_CPPFLAGS := -frtti -fexceptions -std=c++11
#APP_ABI := armeabi-v7a # armeabi-v7a can run without any problem with ndk r15c
APP_ABI := arm64-v8a
LOCAL_ARM_NEON := true
APP_LDFLAGS += -fopenmp
APP_PLATFORM := android-14
NDK_TOOLCHAIN_VERSION := 4.9
  • Error message
I/torchandroid.cpp: filename: init-only.lua, FILE: 0xffffffffd1813018
A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0xffffffd1813070 in tid 365 (AsyncTask #1)
        [ 10-31 16:12:56.012  3104: 3104 W/         ]
        debuggerd: handling request: pid=327 uid=10538 gid=10538 tid=365
Application terminated.

Environment Details

Not all of these will be relevant to every bug, but please provide as much
information as you can.

  • NDK Version: r10~r14c are okay. r15, r15b, r15c makes problems.
  • Build sytem: ndk-build with Android Studio
  • Host OS: Ubuntu 16.04, 64bit
  • Compiler: I didn't explicitly set the compiler. Sorry that I cannot provide more useful information about the compiler.
  • ABI: arm64-v8a is problematic
  • STL: gnustl-shared
  • NDK API level:
  • Device API level:
@alexcohn
Copy link

alexcohn commented Nov 2, 2017

Without knowing what is in your android_seek function, there is no way to understand the
scenario.

@alexcohn
Copy link

alexcohn commented Nov 2, 2017

NDK_TOOLCHAIN_VERSION := 4.9 means that you use gcc compiler.

@jaewoosong
Copy link
Author

More functions:

  • android_fopen.h
#ifndef ANDROID_FOPEN_H
#define ANDROID_FOPEN_H

#include <stdio.h>
#include <android/asset_manager.h>

#ifdef __cplusplus
extern "C" {
#endif
  void android_fopen_set_asset_manager(AAssetManager* manager);
  FILE* android_fopen(const char* fname, const char* mode);
#ifdef __cplusplus
}
#endif
#endif
  • android_fopen.c
#include <stdio.h>
#include "android_fopen.h"
#include <errno.h>

static int android_read(void* cookie, char* buf, int size) {
  return AAsset_read((AAsset*)cookie, buf, size);
}

static int android_write(void* cookie, const char* buf, int size) {
  return EACCES; // can't provide write access to the apk
}

static fpos_t android_seek(void* cookie, fpos_t offset, int whence) {
  return AAsset_seek((AAsset*)cookie, offset, whence);
}

static int android_close(void* cookie) {
  AAsset_close((AAsset*)cookie);
  return 0;
}

// must be established by someone else...
AAssetManager* android_asset_manager;
void android_fopen_set_asset_manager(AAssetManager* manager) {
  android_asset_manager = manager;
}

FILE* android_fopen(const char* fname, const char* mode) {
  if(mode[0] == 'w') return NULL;

  AAsset* asset = AAssetManager_open(android_asset_manager, fname, 0);
  if(!asset) return NULL;

  return funopen(asset, android_read, android_write, android_seek, android_close);
}

@enh
Copy link
Contributor

enh commented Nov 3, 2017

where's the actual tombstone from the crash? have you attached gdb?

this can't actually be true:

  fseek(fl, 0, SEEK_END); # ERROR: sigsegv occurs here

so there's some missing detail here.

my guess is that you've messed up and built some of your code with _FILE_OFFSET_BITS=64 and some of it without, so different pieces disagree about whether fpos_t is 32- or 64-bit. but there's not enough information here.

@jaewoosong
Copy link
Author

@enh Thank you for your comment. The full code is in https://github.com/soumith/torch-android

I know the situation does not make much sense. But I kept other things exactly same and just changed the NDK version from 14 to 15. (I downloaded them from Google's official NDK website.) When I built with NDK 14 or below, the program ran okay. But when I built it with NDK 15 (also NDK 16 beta), the apk was built but it crashed with SIGSEGV.

@enh
Copy link
Contributor

enh commented Nov 3, 2017

the _FILE_OFFSET_BITS stuff is new in r15 (unless you opted in to unified headers in r14):

Android support for _FILE_OFFSET_BITS=64 (which turns off_t into off64_t and replaces each off_t
function with its off64_t counterpart, such as lseek in the source becoming lseek64 at runtime) was
added late. Even when it became available for the platform, it wasn‘t available from the NDK until r15.
Before NDK r15, _FILE_OFFSET_BITS=64 silently did nothing: all code compiled with that was
actually using a 32-bit off_t. With a new enough NDK, the situation becomes complicated. If you’re
targeting an API before 21, almost all functions that take an off_t become unavailable. You‘ve asked for
their 64-bit equivalents, and none of them (except lseek/lseek64) exist. As you increase your target API
level, you’ll have more and more of the functions available. API 12 adds some of the <unistd.h>
functions, API 21 adds mmap, and by API 24 you have everything including <stdio.h>. See the linker
map for full details.
(https://android.googlesource.com/platform/bionic/+/master/docs/32-bit-abi.md)

do you have an actual full crash dump from logcat (preferably run through ndk-stack), or a tombstone from /data/tombstones?

@jaewoosong
Copy link
Author

I searched on the mentioned project's github and _FILE_OFFSET_BITS was not used in their code. I will post the full crash dump soon.

@jaewoosong
Copy link
Author

Here I attached the log ndk-stack.

********** Crash dump: **********
pid: 6988, tid: 7015, name: AsyncTask #1  >>> com.example <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x48613070
Stack frame #00 pc 0000000000074ad0  /system/lib64/libc.so (_Z10__fseeko64P7__sFILElii+20)
Stack frame #01 pc 0000000000018c40  /data/app/com.example-1/lib/arm64/libsample.so (android_asset_get_size+40)
Stack frame #02 pc 000000000001b44c  /data/app/com.example-1/lib/arm64/libsample.so (Java_com_example_util_Util_initAll+316)
Stack frame #03 pc 00000000000db490  /system/lib64/libart.so (art_quick_generic_jni_trampoline+144)
Stack frame #04 pc 00000000000d2168  /system/lib64/libart.so (art_quick_invoke_static_stub+600)
Stack frame #05 pc 00000000000dec20  /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+256)
Stack frame #06 pc 0000000000291020  /system/lib64/libart.so (_ZN3art11interpreter34ArtInterpreterToCompiledCodeBridgeEPNS_6ThreadEPNS_9ArtMethodEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameEPNS_6JValueE+312)
Stack frame #07 pc 0000000000289fe0  /system/lib64/libart.so (_ZN3art11interpreter6DoCallILb0ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+592)
Stack frame #08 pc 000000000055d2cc  /system/lib64/libart.so (MterpInvokeStatic+356)
Stack frame #09 pc 00000000000c4a14  /system/lib64/libart.so (ExecuteMterpImpl+14612)

@enh
Copy link
Contributor

enh commented Nov 3, 2017

oh, wait, until i saw those 64-bit pc values i hadn't realized

#APP_ABI := armeabi-v7a # armeabi-v7a can run without any problem with ndk r15c
APP_ABI := arm64-v8a

so it's definitely not the _FILE_OFFSET_BITS problem. the fault address looks like you have an invalid (but non-null) FILE*. i note that it's only 32 bits, so is it possible you've dropped the top 32 bits? (do you build with -Werror=pointer-to-int-cast and -Werror=int-to-pointer-cast to catch that class of error?)

if you can attach a prebuilt i'll have a look in the morning...

@jaewoosong
Copy link
Author

@enh Thanks for the comment. I also agree that

the fault address looks like you have an invalid (but non-null) FILE*

because the target file is located in the assets directory and it can be successfully read when I use armeabi-v7a. Only arm64-v8a crashes the program.

I will check the build options you mentioned (-Werror=pointer-to-int-cast and -Werror=int-to-pointer-cast). Also I will try to upload more information.

Thanks again. Good night!

@jmgao
Copy link
Contributor

jmgao commented Nov 3, 2017

I don't see any obvious mistakes in the code (aside from this needing to be errno = EACCES; return -1, but that shouldn't affect this).

It's failing super early in __fseeko64, which is probably the file locking. Maybe it's some multithreaded heap corruption that's trashing the pointer in __sbuf?

@alexcohn
Copy link

alexcohn commented Nov 4, 2017

Stack frame #00 pc 0000000000074ad0 /system/lib64/libc.so (_Z10__fseeko64P7__sFILElii+20)

I am surprised that the system C runtime library uses C++ mangled function name. @jaewoosong, which device/ROM do you use?

@jmgao
Copy link
Contributor

jmgao commented Nov 4, 2017

I am surprised that the system C runtime library uses C++ mangled function name. @jaewoosong, which device/ROM do you use?

It's an implementation detail: the implementations of both fseeko and fseeko64 are in terms of __fseeko64. Presumably, the outer function doesn't show up in the unwind because it tail called to __fseeko64.

@alexcohn
Copy link

alexcohn commented Nov 4, 2017

@jmgao sure, it's not wrong, but this was introduced into AOSP relatively recently, so it may indicate some breaking change in fseek implementation, couldn't it?

@alexcohn
Copy link

alexcohn commented Nov 5, 2017

I thought a bit more about this scenario, and I believe strongly that the trick of android_fopen() is inherently wrong.

One should not disguise AAsset* for FILE* even though both are oblique pointers while we wear NDK-colored glasses. It wasn't an oversight that android/asset_manager.h defined a separate pointer type, and did not reuse ones defined in stdio.h. Even if such substitution works sometimes, it will hit some implementation detail that will blow this substitution (essentially, reinterpret_cast<FILE*>(AAsset*)) up.

@jaewoosong provides an AAsset-specific handler for seekfn, but since API 24, ::fseek() begins with some generic prologue, even before it checks whether funopen defined an override for ** __sseek**. This universal prologue assumes that the first parameter, fp, is a pointer to FILE.

Actually, I believe that if this worked on Lollipop, that was by very good luck (see e.g. if (fp->_flags & __SOFF)).

@jaewoosong
Copy link
Author

@alexcohn

I am surprised that the system C runtime library uses C++ mangled function name. @jaewoosong, which device/ROM do you use?

I tested the app on two devices: Samsung Galaxy S7 (Android 7.0) and Huawei Mate 9 (Android 7.0). I used default ROMs pre-installed by the manufacturers.

@jmgao
Copy link
Contributor

jmgao commented Nov 6, 2017

I thought a bit more about this scenario, and I believe strongly that the trick of android_fopen() is inherently wrong.

One should not disguise AAsset* for FILE* even though both are oblique pointers while we wear NDK-colored glasses. It wasn't an oversight that android/asset_manager.h defined a separate pointer type, and did not reuse ones defined in stdio.h. Even if such substitution works sometimes, it will hit some implementation detail that will blow this substitution (essentially, reinterpret_cast<FILE*>(AAsset*)) up.

It's not casting an AAsset* to a FILE*. funopen creates a FILE* that calls the various function pointers that you give it, with the cookie argument you pass in as the first argument. AAsset* is passed in as the cookie argument here, and then cast back into AAsset* and used seemingly appropriately in the various callbacks.

The call to fseek happens directly on the return value of funopen: https://github.com/soumith/torch-android/blob/master/src/torchandroid.cpp#L57

@rprichard
Copy link
Collaborator

Possible explanation:

r15 turns on "unified headers" by default (https://android.googlesource.com/platform/ndk/+/master/docs/UnifiedHeaders.md). Unified headers were off by default in r14 and earlier. It appears that with unified headers, for C source files (rather than C++), the BSD-specific libc functions (like funopen) aren't declared by default. If your funopen call is implicitly declared as returning int rather than FILE*, that would explain the 32-bit segfault address. The funopen caller in android-torch is a C file, not C++.

The fix is to ensure that _BSD_SOURCE is defined before including any headers. (Defining _GNU_SOURCE would also work.) Adding it to APP_CFLAGS would work.

With this C file:

#include <stdio.h>
void foo() {
  FILE *fp = funopen(NULL, NULL, NULL, NULL, NULL);
  fseek(fp, 0, SEEK_END);
}

Here are the default GCC and Clang ndk-build diagnostics from r15:

GCC 4.9:

jni/testmod.c: In function 'foo':
jni/testmod.c:3:14: warning: initialization makes pointer from integer without a cast
   FILE *fp = funopen(NULL, NULL, NULL, NULL, NULL);

Clang:

jni/testmod.c:3:14: warning: implicit declaration of function 'funopen' is invalid in C99 [-Wimplicit-function-declaration]
  FILE *fp = funopen(NULL, NULL, NULL, NULL, NULL);
             ^
jni/testmod.c:3:9: warning: incompatible integer to pointer conversion initializing 'FILE *' (aka 'struct __sFILE *') with an expression of type
      'int' [-Wint-conversion]
  FILE *fp = funopen(NULL, NULL, NULL, NULL, NULL);
        ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

APP_CFLAGS := -Wall adds this GCC warning:

jni/testmod.c: In function 'foo':
jni/testmod.c:3:3: warning: implicit declaration of function 'funopen' [-Wimplicit-function-declaration]
   FILE *fp = funopen(NULL, NULL, NULL, NULL, NULL);

-Werror=pointer-to-int-cast and -Werror=int-to-pointer-cast didn't help with Clang, and GCC reports them as invalid command-line options. -Werror=implicit-function-declaration, on the other hand, works with both compilers and flags the problem. -Werror=int-conversion flags the problem with Clang. (e.g. NDK_TOOLCHAIN_VERSION := clang with ndk-build.)

@rprichard
Copy link
Collaborator

@jaewoosong Can you confirm that an implicitly declared funopen is the problem? e.g.:

  • Do you see an error with -Werror=implicit-function-declaration?
  • Does declaring _BSD_SOURCE everywhere fix the problem?

@jaewoosong
Copy link
Author

jaewoosong commented Nov 17, 2017

Hi @rprichard ,

Good morning. Thank you for your comments. I tried the two things you mentioned. I tested with NDK version 16.

First, I put -Werror=implicit-function-declaration at the Application.mk file in this way: APP_CFLAGS += -Werror=implicit-function-declaration. Then the following error occurred during the compilation: Error:error: implicit declaration of function 'funopen' [-Werror=implicit-function-declaration].

  • EDITED
    Second, I wrote #define _BSD_SOURCE at the very top of android_fopen.cpp file. And then it compiled! But when I wrote #define _BSD_SOURCE at the top of torch_android.cpp file, where the error-inducing function is located, the app didn't compile. Using #define _GNU_SOURCE resulted exactly same behaviour.

I hope this information be useful. Thanks again!

@rprichard
Copy link
Collaborator

Second, I wrote #define _BSD_SOURCE at the very top of android_fopen.cpp file. And then it compiled! But when I wrote #define _BSD_SOURCE at the top of torch_android.cpp file, where the error-inducing function is located, the app didn't compile. ...

When the app didn't compile, were you defining _BSD_SOURCE in just torchandroid.cpp or in both torchandroid.cpp and android_fopen.c? AFAICT, torchandroid.cpp doesn't need the symbol defined. Only android_fopen.c and THApkFile.c need it.

You could also define the symbol more globally using: APP_CFLAGS += -D_BSD_SOURCE

@jaewoosong
Copy link
Author

Hi @rprichard ,

Thank you for your helpful comments. When the app didn't compile, I defined _BSD_SOURCE only in torch_android.cpp. I only needed to define it in android_fopen.c to compile the app. Should I alsk define it in THApkFile.c to insure correct app behaviour?

@rprichard
Copy link
Collaborator

The general rule is: _BSD_SOURCE (or _GNU_SOURCE) needs to be declared at the top of any C or C++ source file (not header) that contains an funopen call, either directly or indirectly via #include. Calling some other function that then calls funopen doesn't matter.

AFAICT, torchandroid.cpp doesn't contain a call to funopen, so it doesn't need _BSD_SOURCE.

I wonder if THApkFile.c is being compiled. I looked for THApkFile.c in the project, and I noticed that it's included by ApkFile.c, which is then included by init.c. Otherwise, I didn't see any references to these files (THApkFile.c, ApkFile.c, and init.c), so maybe none of them are being compiled. If they are compiled, then it'd be a good idea to declare _BSD_SOURCE for all of them.

Aside: you can use either of:

  • #define _BSD_SOURCE
    or
  • #define _BSD_SOURCE 1

I suggest using the second one, which is equivalent to -D_BSD_SOURCE. The compiler won't complain if code uses both -D_BSD_SOURCE and #define _BSD_SOURCE 1, but it will complain about the combination of -D_BSD_SOURCE and #define _BSD_SOURCE.

@jaewoosong
Copy link
Author

@rprichard Wow, thank you so much! I really appreciate your kind explanation. It is really helpful. I also thank @alexcohn , @enh and @jmgao for help. I learned a lot more about NDK from all of you. Now I will close this issue. Thanks!

@jalola
Copy link

jalola commented Jul 24, 2019

The general rule is: _BSD_SOURCE (or _GNU_SOURCE) needs to be declared at the top of any C or C++ source file (not header) that contains an funopen call, either directly or indirectly via #include. Calling some other function that then calls funopen doesn't matter.

AFAICT, torchandroid.cpp doesn't contain a call to funopen, so it doesn't need _BSD_SOURCE.

I wonder if THApkFile.c is being compiled. I looked for THApkFile.c in the project, and I noticed that it's included by ApkFile.c, which is then included by init.c. Otherwise, I didn't see any references to these files (THApkFile.c, ApkFile.c, and init.c), so maybe none of them are being compiled. If they are compiled, then it'd be a good idea to declare _BSD_SOURCE for all of them.

Aside: you can use either of:

  • #define _BSD_SOURCE
    or
  • #define _BSD_SOURCE 1

I suggest using the second one, which is equivalent to -D_BSD_SOURCE. The compiler won't complain if code uses both -D_BSD_SOURCE and #define _BSD_SOURCE 1, but it will complain about the combination of -D_BSD_SOURCE and #define _BSD_SOURCE.

@jaewoosong @rprichard
Hi, thanks for the discussion.
I got in the same error I you did last time. However, I could not resolve it as you discussed above.
I did add #define _BSD_SOURCE 1 to everywhere even in the build.gradle (I build by cmake)

Another thing is I don't quite understand why @rprichard said: "AFAICT, torchandroid.cpp doesn't contain a call to funopen, so it doesn't need _BSD_SOURCE"
As I checked code, the torchandroid actually calls to funopen via this: https://github.com/soumith/torch-android/blob/master/src/torchandroid.cpp#L54

Hope you can help me with this.
Thanks a lot.
H

@rprichard
Copy link
Collaborator

I did add #define _BSD_SOURCE 1 to everywhere even in the build.gradle (I build by cmake)

If you have a reduced program showing the problem, we can look at that.

As I checked code, the torchandroid actually calls to funopen via this: https://github.com/soumith/torch-android/blob/master/src/torchandroid.cpp#L54

That's a call to android_fopen, which then calls funopen. The _BSD_SOURCE / _GNU_SOURCE macro is only required to be defined in the source file that directly calls funopen.

@jalola
Copy link

jalola commented Jul 28, 2019

Hi, sorry for my late reply.
I actually resolved my problem by implement another logic by reading by buffer instead of FILE*.
But I will put a reduced program here so we can discuss about this issue.

https://github.com/jalola/android_fmemopen

You can have a look at the repo above. I have 4 files.

I call funopen in fmemopen.c file (in fmemopen function) and common.cpp will use this fmemopen in file_to_memf function.

I've tried to put #define _BSD_SOURCE 1 at the top of fmemopen.c but I still got this error Error:error: implicit declaration of function 'funopen' [-Werror=implicit-function-declaration]

@rprichard
Copy link
Collaborator

That project doesn't have a makefile, and it doesn't compile when I add one:

  • common.cpp needs #include <stdio.h>
  • LOG isn't defined
  • the declaration and definition of aai::file_to_memf don't match

See https://github.com/DanAlbert/dynamic-cast-repro for an example of a repro with a makefile.

When I add a makefile, fix those errors, and define _BSD_SOURCE, then the project builds without a warning about funopen.

Maybe you're defining _FILE_OFFSET_BITS=64, which would redirect funopen to funopen64 on 32-bit targets, and funopen64 wasn't added until API 24?

@jalola
Copy link

jalola commented Jul 30, 2019

Thanks for your reply @rprichard
So sorry for the code, I just extracted them out of the project that is why it is missing makefile and not runnable.

You are right that I used the -D_FILE_OFFSET_BITS=64, and -D_LARGEFILE64_SOURCE=1.
The funny thing is the crash does not always happen (on arm v7 and arm v8 as well, but more reproducible in arm v8)

So if I use that flag and the API < 24, does it mean there is no way to use funopen?

@rprichard
Copy link
Collaborator

So if I use that flag and the API < 24, does it mean there is no way to use funopen?

Roughly speaking, yes. Prior to API 24, there is no funopen on 32-bit targets that works with a 64-bit off_t seek function. There is still a funopen function but it uses a 32-bit off_t. After API 24, when _FILE_OFFSET_BITS=64 is declared, funopen is remapped to funopen64.

@jalola
Copy link

jalola commented Jul 31, 2019

thanks a lot for your explanation @rprichard. It helps a lot and It is clear now.

Luckily, I changed the logic not to use funopen so I could run my program now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants