Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exceptions thrown across shared objects cannot be caught. Unwind symbols reported as LOCAL instead of GLOBAL. #1091

Closed
alexander-jones opened this issue Oct 3, 2019 · 9 comments
Labels

Comments

@alexander-jones
Copy link

alexander-jones commented Oct 3, 2019

Description

I'm hopeful this is just a usage error but I've ran into a strange issue when running my android app where exceptions thrown across the boundary between two shared objects cannot be caught by surrounding try / catch blocks. Any exceptions that are thrown and caught in the same binary behave as expected.

After some googling I surmised this must be due to how the libunwind symbols are built into / referenced by my shared libraries. So to confirm this I ran readelf commands on all my prebuilt libraries and all of them gave output like the following:

readelf -sW libhps_core.so | grep _Unwind
149758: 00000000066096cc 60 FUNC LOCAL DEFAULT 10 _Unwind_SetSpColumn
149759: 0000000006609708 88 FUNC LOCAL DEFAULT 10 _Unwind_GetGR.localalias.0
149768: 000000000660abb4 192 FUNC LOCAL DEFAULT 10 _Unwind_RaiseException_Phase2
149769: 000000000660ac74 236 FUNC LOCAL DEFAULT 10 _Unwind_ForcedUnwind_Phase2
149770: 000000000660ae54 4 FUNC LOCAL DEFAULT 10 _Unwind_DebugHook
149791: 000000000660bab0 1052 FUNC LOCAL DEFAULT 10 _Unwind_IteratePhdrCallback
149906: 0000000006609930 8 FUNC LOCAL DEFAULT 10 _Unwind_GetTextRelBase
150109: 0000000006609708 88 FUNC LOCAL DEFAULT 10 _Unwind_GetGR
150486: 000000000660b2b8 36 FUNC LOCAL DEFAULT 10 _Unwind_DeleteException
150594: 000000000660b1c0 248 FUNC LOCAL DEFAULT 10 _Unwind_Resume_or_Rethrow
151274: 000000000660b2dc 208 FUNC LOCAL DEFAULT 10 _Unwind_Backtrace
151599: 00000000066098cc 8 FUNC LOCAL DEFAULT 10 _Unwind_GetIP
151645: 00000000066098f8 8 FUNC LOCAL DEFAULT 10 _Unwind_GetRegionStart
151651: 0000000006609874 88 FUNC LOCAL DEFAULT 10 _Unwind_SetGR
151943: 000000000660ca48 452 FUNC LOCAL DEFAULT 10 _Unwind_Find_FDE
151949: 00000000066098e8 8 FUNC LOCAL DEFAULT 10 _Unwind_SetIP
151953: 000000000660b0c4 252 FUNC LOCAL DEFAULT 10 _Unwind_Resume
152066: 0000000006609928 8 FUNC LOCAL DEFAULT 10 _Unwind_GetDataRelBase
152189: 00000000066098d4 20 FUNC LOCAL DEFAULT 10 _Unwind_GetIPInfo
152195: 000000000660ae58 376 FUNC LOCAL DEFAULT 10 _Unwind_RaiseException
152506: 00000000066098f0 8 FUNC LOCAL DEFAULT 10 _Unwind_GetLanguageSpecificData
152522: 000000000660afd0 244 FUNC LOCAL DEFAULT 10 _Unwind_ForcedUnwind
152810: 0000000006609900 40 FUNC LOCAL DEFAULT 10 _Unwind_FindEnclosingFunction
152824: 000000000660986c 8 FUNC LOCAL DEFAULT 10 _Unwind_GetCFA

This seems erroneous to me as I would expect these symbols to be GLOBAL. If I run the same command on my main app shared object all the symbols report being GLOBAL as I would expect, however the main application shared object is built via a different process (via ndk-build through Android Studio).

I assume I must be missing something here in my specification to the toolchain file but I don't see it. Any help would be much appreciated.

Environment Details

My prebuilt libraries are all built with CMake 3.14 using the Ninja generator using the following variable definitions:

ANDROID_ABI = "arm64-v8a" (not always, but this was the case I was testing against)
ANDROID_STL = "c++_shared"
ANDROID_TOOLCHAIN = "clang"
ANDROID_NATIVE_API_LEVEL = "16" (we have to support relatively old devices)
ANDROID_CPP_FEATURES = "exceptions" (figured this would force the unwind symbols to be global)
CMAKE_TOOLCHAIN_FILE = "${ANDROID_NDK}/build/cmake/android.toolchain.cmake" (where ANDROID_NDK points to a r18b version of the ndk)

I've been able to reproduce this with both NDK 16b and 18b. I know the prompt says to always try with the newest NDK but as one of my dependencies needs an older NDK I can't use that as my resolution.

If it matters my host os is usually Windows but I can reproduce this issue on Windows 10, macOS Mojave and Ubuntu 18.04 hosts.

Thanks for any help.

@DanAlbert
Copy link
Member

Do your exception types have key functions? RTTI across shared library boundaries is tricky. https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md#rtti_exceptions-not-working-across-library-boundaries

This seems erroneous to me as I would expect these symbols to be GLOBAL.

No, this is very much intentional. If libraries are exporting unwind symbols you can get into a lot of trouble when the unwinder changes (as it has before).

If I run the same command on my main app shared object all the symbols report being GLOBAL as I would expect

That could also be your problem, and is very much not expected with ndk-build. That alone is a bug. If you can tell us how to reproduce, please file that so I can fix it.

ANDROID_CPP_FEATURES = "exceptions" (figured this would force the unwind symbols to be global)

Exceptions and RTTI are actually on by default with CMake, btw, so you can remove this line. It doesn't control visibility of the unwinder (nor does anything else).

@alexander-jones
Copy link
Author

alexander-jones commented Oct 3, 2019

Do your exception types have key functions? RTTI across shared library boundaries is tricky. https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md#rtti_exceptions-not-working-across-library-boundaries

No, our exception types all inherit from std::runtime_error and don't have any virtual functions. The only real reason we derive at all is just to communicate the class of error by what type it is.

That could also be your problem, and is very much not expected with ndk-build. That alone is a bug. If you can tell us how to reproduce, please file that so I can fix it.

Ah, so my intuition was completely backwards! Just so I make sure we're not looking at a red herring, I want to make sure the way I detected the GLOBAL symbols for my main library (libfire.so) wasn't an invalid approach. After building my app I extracted the contents of <ANDROID_STUDIO_PROJECT>/app/build/outputs/apk/debug and ran readelf -sW on the binary in lib/arm64-v8a/ subfolder:

stage@capybara:~/android_arm64-v8ad$ unzip app-debug.apk.zip 
stage@capybara:~/android_arm64-v8ad$ cd lib/arm64-v8a
stage@capybara:~/android_arm64-v8ad/lib/arm64-v8a$ readelf -sW libfire.so | grep _Unwind
   108: 00000000001a7b40     8 FUNC    GLOBAL DEFAULT   10 _Unwind_GetTextRelBase
   275: 00000000001a7918    88 FUNC    GLOBAL DEFAULT   10 _Unwind_GetGR
   610: 00000000001a94c8    36 FUNC    GLOBAL DEFAULT   10 _Unwind_DeleteException
   693: 00000000001a93d0   248 FUNC    GLOBAL DEFAULT   10 _Unwind_Resume_or_Rethrow
  1346: 00000000001a94ec   208 FUNC    GLOBAL DEFAULT   10 _Unwind_Backtrace
  1629: 00000000001a7adc     8 FUNC    GLOBAL DEFAULT   10 _Unwind_GetIP
  1667: 00000000001a7b08     8 FUNC    GLOBAL DEFAULT   10 _Unwind_GetRegionStart
  1671: 00000000001a7a84    88 FUNC    GLOBAL DEFAULT   10 _Unwind_SetGR
  1952: 00000000001aac58   452 FUNC    GLOBAL DEFAULT   10 _Unwind_Find_FDE
  1959: 00000000001a7af8     8 FUNC    GLOBAL DEFAULT   10 _Unwind_SetIP
  1962: 00000000001a92d4   252 FUNC    GLOBAL DEFAULT   10 _Unwind_Resume
  2055: 00000000001a7b38     8 FUNC    GLOBAL DEFAULT   10 _Unwind_GetDataRelBase
  2134: 00000000001a7ae4    20 FUNC    GLOBAL DEFAULT   10 _Unwind_GetIPInfo
  2141: 00000000001a9068   376 FUNC    GLOBAL DEFAULT   10 _Unwind_RaiseException
  2396: 00000000001a7b00     8 FUNC    GLOBAL DEFAULT   10 _Unwind_GetLanguageSpecificData
  2409: 00000000001a91e0   244 FUNC    GLOBAL DEFAULT   10 _Unwind_ForcedUnwind
  2665: 00000000001a7b10    40 FUNC    GLOBAL DEFAULT   10 _Unwind_FindEnclosingFunction
  2690: 00000000001a7a7c     8 FUNC    GLOBAL DEFAULT   10 _Unwind_GetCFA

Oddly enough when I perform the readelf commands on any of the other shared object dependencies the libunwind symbols are no longer reported at all like they were before they were packaged into the app. I assume this is because of some symbol hiding / localization / stripping performed during packaging.

Ok, so assuming this all seems fine I'll describe my app's ndk-build process. My app's build.gradle file is relatively straight forward as executes a standard externalNativeBuild block with some configurable bits coming from the local.properties file:

apply plugin: 'com.android.application'

android {
    compileSdkVersion 28

    Properties properties = new Properties()
    properties.load(project.rootProject.file('local.properties').newDataInputStream())
    def ndkABIs = properties.getProperty('ndk.abis') ? properties.getProperty('ndk.abis').split(',') : ['armeabi-v7a', 'arm64-v8a', 'x86', 'x86_64']
    def ndkArgs = properties.getProperty('ndk.arguments') ? properties.getProperty('ndk.arguments') : 'DUMMY_VAR=1' // ndk-build does not let you submit 0 args so pass in a dummy one if none are specified

    println "ndkABIs: ${ndkABIs}"
    println "ndkArgs: ${ndkArgs}"

    defaultConfig {
        applicationId "com.techsoft3d.hps.fire"
        minSdkVersion 16
        targetSdkVersion 22
        ndk {
            abiFilters(*ndkABIs)
        }
        externalNativeBuild {
            ndkBuild {
                arguments ndkArgs
            }
        }
    }

    lintOptions {
        checkReleaseBuilds false
        abortOnError false
    }

    buildTypes {
        debug {
            minifyEnabled false
            debuggable true
            jniDebuggable true
        }
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
        }
    }

    externalNativeBuild {
        ndkBuild {
            path "src/main/jni/Android.mk"
        }
    }
}

My android.mk file:

TOP_PATH := $(call my-dir)
include $(CLEAR_VARS)

include $(TOP_PATH)/dependencies.mk
include $(TOP_PATH)/fire.mk

My Application.mk file:

APP_ABI := arm64-v8a ( I assume gradle overrides this where neccessary)
APP_STL := c++_shared
APP_PLATFORM := android-16
LOCAL_SHORT_COMMANDS := true
APP_SHORT_COMMANDS := true
APP_CFLAGS += -Wno-error=format-security
NDK_TOOLCHAIN_VERSION := clang

The ndk build script for my main library (fire.mk) looks like this:

LOCAL_PATH := $(call my-dir)
include $(call all-subdir-makefiles)
include $(CLEAR_VARS)

LOCAL_MODULE := fire
    
LOCAL_CFLAGS := -DOS_ANDROID \
                -DANDROID_NDK \
                -DUNIX_SYSTEM \
                -DHPS_CORE_BUILD \
                -DTARGET_OS_ANDROID=1 \
                -DLINUX_SYSTEM \
                -D_STLP_USE_SIMPLE_NODE_ALLOC \
                -fsigned-char \
                -std=c++11

LOCAL_CPP_FEATURES := exceptions

LOCAL_CPP_EXTENSION := .cpp

SOURCE_PATH = ../../../../../..
LIB_PATH = ../../../../../../../../../../../include
LAMBDA_PATH = ../../../../../../../../../../3rd_party/lambda-options

LOCAL_C_INCLUDES += $(LOCAL_PATH)/$(SOURCE_PATH)
LOCAL_C_INCLUDES += $(LOCAL_PATH)/$(LIB_PATH)
LOCAL_C_INCLUDES += $(LOCAL_PATH)/$(LAMBDA_PATH)

# --- MobileSurface Base & JNI files ---
LOCAL_SRC_FILES += fire_jni.cpp
LOCAL_SRC_FILES += fire_mobile_surface_jni.cpp
LOCAL_SRC_FILES += fire_mobile_application_jni.cpp

LOCAL_LDLIBS := -landroid -llog -lGLESv2 -lEGL

# Note: Link order below important
LOCAL_STATIC_LIBRARIES += fire_core
LOCAL_STATIC_LIBRARIES += fire_tester

LOCAL_SHARED_LIBRARIES += hps_test_interface
LOCAL_SHARED_LIBRARIES += hps_test_support
LOCAL_SHARED_LIBRARIES += hps_test_stream

ifeq ($(USING_EXCHANGE),1)
    LOCAL_SHARED_LIBRARIES += hps_test_docs
    LOCAL_SHARED_LIBRARIES += hps_test_exchange_publish
    LOCAL_SHARED_LIBRARIES += hps_test_support_exchange_publish
    LOCAL_SHARED_LIBRARIES += hps_sprk_exchange
    LOCAL_SHARED_LIBRARIES += A3DLIBS
endif

ifeq ($(USING_VULKAN),1)
    LOCAL_SHARED_LIBRARIES += hps_vulkan
    ifeq ($(NDK_DEBUG),1)
        LOCAL_SHARED_LIBRARIES += libVkLayer_threading
        LOCAL_SHARED_LIBRARIES += libVkLayer_parameter_validation
        LOCAL_SHARED_LIBRARIES += libVkLayer_object_tracker
        LOCAL_SHARED_LIBRARIES += libVkLayer_core_validation
        LOCAL_SHARED_LIBRARIES += libVkLayer_unique_objects
    endif
endif

LOCAL_SHARED_LIBRARIES += hps_sprk_ops
LOCAL_SHARED_LIBRARIES += hps_sprk
LOCAL_SHARED_LIBRARIES += hps_core

include $(BUILD_SHARED_LIBRARY)

And the ndk build script for including my dependencies looks like this:

LOCAL_PATH := $(call my-dir)

ifeq ($(USE_DEBUG_HPS_LIBS),1)
	MY_HPS_DEBUG_SUFFIX := d
endif

MY_HPS_LIB_PATH  := ../../../../../../../../../../../lib/android_$(TARGET_ARCH_ABI)$(MY_HPS_DEBUG_SUFFIX)
MY_HPS_BIN_PATH  := ../../../../../../../../../../../bin/android_$(TARGET_ARCH_ABI)$(MY_HPS_DEBUG_SUFFIX)

include $(CLEAR_VARS)
LOCAL_MODULE := hps_core
LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_core.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := hps_sprk
LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_sprk.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := hps_sprk_ops
LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_sprk_ops.so
include $(PREBUILT_SHARED_LIBRARY)

ifeq ($(USING_VULKAN),1)
    include $(CLEAR_VARS)
    LOCAL_MODULE := hps_vulkan
    LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_vulkan.so
    include $(PREBUILT_SHARED_LIBRARY)

    ifeq ($(NDK_DEBUG),1)
        include $(CLEAR_VARS)
        LOCAL_MODULE := libVkLayer_threading
        LOCAL_SRC_FILES := $(ANDROID_NDK)/sources/third_party/vulkan/src/build-android/jniLIbs/$(TARGET_ARCH_ABI)/libVkLayer_threading.so
        include $(PREBUILT_SHARED_LIBRARY)

        include $(CLEAR_VARS)
        LOCAL_MODULE := libVkLayer_parameter_validation
        LOCAL_SRC_FILES := $(ANDROID_NDK)/sources/third_party/vulkan/src/build-android/jniLIbs/$(TARGET_ARCH_ABI)/libVkLayer_parameter_validation.so
        include $(PREBUILT_SHARED_LIBRARY)

        include $(CLEAR_VARS)
        LOCAL_MODULE := libVkLayer_object_tracker
        LOCAL_SRC_FILES := $(ANDROID_NDK)/sources/third_party/vulkan/src/build-android/jniLIbs/$(TARGET_ARCH_ABI)/libVkLayer_object_tracker.so
        include $(PREBUILT_SHARED_LIBRARY)

        include $(CLEAR_VARS)
        LOCAL_MODULE := libVkLayer_core_validation
        LOCAL_SRC_FILES := $(ANDROID_NDK)/sources/third_party/vulkan/src/build-android/jniLIbs/$(TARGET_ARCH_ABI)/libVkLayer_core_validation.so
        include $(PREBUILT_SHARED_LIBRARY)

        include $(CLEAR_VARS)
        LOCAL_MODULE := libVkLayer_unique_objects
        LOCAL_SRC_FILES := $(ANDROID_NDK)/sources/third_party/vulkan/src/build-android/jniLIbs/$(TARGET_ARCH_ABI)/libVkLayer_unique_objects.so
        include $(PREBUILT_SHARED_LIBRARY)
    endif
endif

ifeq ($(USING_EXCHANGE),1)
    include $(CLEAR_VARS)
    LOCAL_MODULE := hps_sprk_exchange
    LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_sprk_exchange.so
    include $(PREBUILT_SHARED_LIBRARY)

    include $(CLEAR_VARS)
    LOCAL_MODULE := A3DLIBS
    HEXCHANGE_INSTALL_DIR := $(subst \,/,$(HEXCHANGE_INSTALL_DIR))
    HEXCHANGE_LIB_DIR := $(HEXCHANGE_INSTALL_DIR)/bin/android/$(TARGET_ARCH_ABI)
	LOCAL_SRC_FILES := $(HEXCHANGE_LIB_DIR)/libA3DLIBS.so
	include $(PREBUILT_SHARED_LIBRARY)
endif

include $(CLEAR_VARS)
LOCAL_MODULE := hps_test_interface
LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_test_interface.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := hps_test_support
LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_test_support.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := hps_test_stream
LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_test_stream.so
include $(PREBUILT_SHARED_LIBRARY)

ifeq ($(USING_EXCHANGE),1)
    include $(CLEAR_VARS)
    LOCAL_MODULE := hps_test_docs
    LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_test_docs.so
    include $(PREBUILT_SHARED_LIBRARY)

    include $(CLEAR_VARS)
    LOCAL_MODULE := hps_test_exchange_publish
    LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_test_exchange_publish.so
    include $(PREBUILT_SHARED_LIBRARY)

    include $(CLEAR_VARS)
    LOCAL_MODULE := hps_test_support_exchange_publish
    LOCAL_SRC_FILES := $(MY_HPS_BIN_PATH)/libhps_test_support_exchange_publish.so
    include $(PREBUILT_SHARED_LIBRARY)
endif

include $(CLEAR_VARS)
LOCAL_MODULE := fire_core
LOCAL_SRC_FILES := $(MY_HPS_LIB_PATH)/libfire_core.a
include $(PREBUILT_STATIC_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := fire_tester
LOCAL_SRC_FILES := $(MY_HPS_LIB_PATH)/libfire_tester.a
include $(PREBUILT_STATIC_LIBRARY)

Sorry for the giant text dump! If this doesn't illuminate anything I can try and trim it down to a simple app without the dependencies.

@alexander-jones
Copy link
Author

I should also mention that I cannot reproduce this if I use the gnustl_shared standard library building against ndk 16b.

@DanAlbert
Copy link
Member

No, our exception types all inherit from std::runtime_error and don't have any virtual functions.

That's probably the issue then. RTTI relies on typeinfo objects being exactly equal (the same address). The is accomplished by either emitting a single strong definition of the typeinfo with the vtable in the same object as the type's key function, or by emitting weak typeinfos in every object that uses it and relying on them being merged at link/load time. Dynamic linking, especially when its not happening all at once (as is often the case when dlopen or System.loadLibrary are used) means that the latter is not always possible.

I should also mention that I cannot reproduce this if I use the gnustl_shared standard library building against ndk 16b.

This also hints at the above being correct. GNU's implementation of RTTI is not compliant with the ABI spec. It reduces these false negatives at the risk of also enabling false positives.

Just so I make sure we're not looking at a red herring, I want to make sure the way I detected the GLOBAL symbols for my main library (libfire.so) wasn't an invalid approach.

Yep, that's the correct readelf incantation.

If this doesn't illuminate anything I can try and trim it down to a simple app without the dependencies.

Pretty sure we've solved both issues. The main problem is the lack of key functions, the ndk-build issue is #1092 (but that's probably not causing your problem).

@alexander-jones
Copy link
Author

alexander-jones commented Oct 4, 2019

Thanks for the help, that gives me a lot to investigate!

Just to clarify if I'm not using dynamic_cast or type_id directly will passing -fno-rtti in any way interfere with exceptions or am I better off using rtti when exceptions are enabled?

If I am better off using rtti is the suggested resolution to implement virtual destructors for the exception subclasses?

@rprichard
Copy link
Collaborator

I think you want to use -frtti to ensure that a type_info variable is emitted in an object file containing a key function for a class type. With -fno-rtti, type_info variables will still be emitted in object files where they're needed, but then the multiple copies need to be deduplicated, which is harder to accomplish when symbols are duplicated across multiple shared libraries.

@DanAlbert
Copy link
Member

(closing since I'm pretty sure this is solved, if our guess is wrong lmk and we'll reopen)

@DanAlbert
Copy link
Member

is the suggested resolution to implement virtual destructors for the exception subclasses?

Yep.

@alexander-jones
Copy link
Author

K, just adding rtti to my LOCAL_CPP_FEATURES block seems to have resolved the issue (I didn't need to implement the virtual destructors). Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants