Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::dynamic_pointer_cast always returns a nullptr #533

Closed
memojedi opened this issue Sep 26, 2017 · 27 comments
Closed

std::dynamic_pointer_cast always returns a nullptr #533

memojedi opened this issue Sep 26, 2017 · 27 comments
Assignees
Milestone

Comments

@memojedi
Copy link

I'm using NDK 15.2.4203891 (it was downloaded through android studio SDK manager). My project consists of 4 shared libraries, all of them are built with these cpp flags

LOCAL_CPP_FEATURES += exceptions
LOCAL_CPP_FEATURES += rtti

I'm linking all my shared libraries to a single shared c++ runtime

APP_STL := c++_shared
NDK_TOOLCHAIN_VERSION=clang

I noticed that every time I try to do dynamic casting from parent class to derived class I get a nullptr back from std::dynamic_pointer_cast. The c++ standard allows such casting to take place.

I first noticed this problem back in NDK 10e, I was able to bypass the issue by always rebuilding the c++ runtime with this directive in Application.mk

LIBCXX_USE_GABIXX := true

This would rebuild the c++ runtime and would allow such casting to take place. Is this dynamic casting a common problem among all NDK versions?

Now that I'm trying to move to the latest and noticed the same rtti issue as before I tried to build the c++ runtime using this directive in my Application.mk

LIBCXX_FORCE_REBUILD := true

But I started getting this build error:

Android NDK: /Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/sources/cxx-stl/llvm-libc++/Android.mk: Cannot find module with tag 'external/libcxxabi' in import path
Android NDK: Are you sure your NDK_MODULE_PATH variable is properly defined ?
Android NDK: The following directories were searched:
Android NDK:

Does anyone know if these are known issues? (dynamic casting and runtime build error), I don't want to change all my dynamic pointer casting lines into static casting.

  • NDK Version: 15.2.4203891
  • Build sytem: ndk-build
  • Host OS: Mac Sierra 10.12.4
  • Compiler: Clang
  • ABI: armeabi-v7a
  • STL: libc++_shared.so
  • NDK API level:
  • Device API level:
@DanAlbert
Copy link
Member

Do you have a test case?

@memojedi
Copy link
Author

Hi Dan

No, I don't have a test case regarding the dynamic casting, I'll try to build one

What about the c++ runtime build error?, is this something you know about?

thx
Memo

@DanAlbert
Copy link
Member

What about the c++ runtime build error?, is this something you know about?

Not supported.

@memojedi
Copy link
Author

Hi Dan
I think this is the issue with RTTI

This is one of the lines that the compiler throws at me while creating an object file

Notice that the compiler is enforcing the no rtti and no exceptions flags even though I added this line in the Application.mk

APP_CPPFLAGS += -std=c++11 -stdlib=libc++ -fuse-ld=bfd -nodefaultlibs -frtti -fexceptions

[armeabi-v7a] Compile++ thumb: medialibrarycore <= UpdateBestArtworkTokenChangeRequest.cpp
/Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ -MMD -MP -MF /private/tmp/medialibrary-Android.buildproj/Debug/Objects/MediaLibrary.build/Debug/Android.build/DerivedSources/ndk_project/obj/local/armeabi-v7a/objs-debug/medialibrarycore/UpdateBestArtworkTokenChangeRequest.o.d -gcc-toolchain /Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64 -fpic -ffunction-sections -funwind-tables -fstack-protector-strong -Wno-invalid-command-line-argument -Wno-unused-command-line-argument -no-canonical-prefixes -fno-integrated-as -g -target armv7-none-linux-androideabi16 -march=armv7-a -mfloat-abi=softfp -mfpu=vfpv3-d16 -fno-exceptions -fno-rtti -mthumb -O0 -UNDEBUG -fno-limit-debug-info -I/private/tmp/medialibrary-Android.buildproj/Debug/Objects/MediaLibrary.build/Debug/Android.build/DerivedSources/ndk_project/jni/includes -I/private/tmp/medialibrary-Android.buildproj/Debug/Objects/MediaLibrary.build/Debug/Android.build/DerivedSources/ndk_project/jni/includes/MediaLibraryCore -I/usr/local/android/include -I/usr/local/android/include/sqlite3 -I/Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/sources/cxx-stl/llvm-libc++/include -I/Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/sources/cxx-stl/llvm-libc++/../llvm-libc++abi/include -I/Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/sources/android/support/include -I/private/tmp/medialibrary-Android.buildproj/Debug/Objects/MediaLibrary.build/Debug/Android.build/DerivedSources/ndk_project/jni -std=c++11 -std=c++11 -stdlib=libc++ -fuse-ld=bfd -nodefaultlibs -frtti -fexceptions -DANDROID -DTARGET_OS_ANDROID=1 -DTARGET_RT_LITTLE_ENDIAN=1 -D_COREFOUNDATION_LITE=1 -Wno-extern-c-compat -Wno-macro-redefined -Wno-builtin-macro-redefined -Wno-deprecated-register --sysroot=/usr/local/android/ndk/platforms/android-16/arch-arm -fvisibility=hidden -O0 -DDEBUG -D__ANDROID_API__=16 -Wa,--noexecstack -Wformat -Werror=format-security -frtti -fexceptions --sysroot /Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/sysroot -isystem /Users/guillermorodriguez/Library/Android/sdk/ndk-bundle/sysroot/usr/include/arm-linux-androideabi -c /private/tmp/medialibrary-Android.buildproj/Debug/Objects/MediaLibrary.build/Debug/Android.build/DerivedSources/ndk_project/jni/UpdateBestArtworkTokenChangeRequest.cpp -o /private/tmp/medialibrary-Android.buildproj/Debug/Objects/MediaLibrary.build/Debug/Android.build/DerivedSources/ndk_project/obj/local/armeabi-v7a/objs-debug/medialibrarycore/UpdateBestArtworkTokenChangeRequest.o

Will the last two overwrite the first two? is this a problem with clang?

@DanAlbert
Copy link
Member

Yes, the latter rtti/exception flags override former ones.

is this a problem with clang?

Very unlikely given that this has persisted across multiple compiler releases, but not impossible. Can't say for sure without a repro case.

@memojedi
Copy link
Author

memojedi commented Sep 26, 2017

Hi Dan

I managed to build a sample app that triggers the dynamic_pointer_cast issue I mentioned earlier.
The sample app is creates three shared libraries:

  • libtest.so
    This library has two test methods
    Java_com_example_twolibs_TwoLibs_testDynamicCast
    Java_com_example_twolibs_TwoLibs_testStaticCast

Those two methods perform the tests and report a boolean if the test passed or failed

  • libfirst.so
    This library contains one pure virtual class "first" and it provides a class that provides the concrete implementation "firstImpl".
    So basically we'll do something like this:
    std::shared_ptr firstSRef = std::make_shared();

  • libsecond.so
    This library contains just one class "second" and it has a dependency on first. This class is the one that evaluates the casting.

And the casting will be evaluated like this:
auto firstImplSRef = std::dynamic_pointer_cast(_firstSRef);

    auto firstImplSRef = std::static_pointer_cast<firstImpl>(_firstSRef);

Noticed that I'm trying to cast down.

The sample app can be found here:
https://drive.google.com/open?id=0B4CLrzcsZTb-NEJvVFczZEJ1eGc

All shared libs are built separately but with the same compiler flags.

Let me know if you can reproduce the problem, when I run the tests the dynamic cast fails and the static cast works.
thx
Memo

@memojedi
Copy link
Author

memojedi commented Sep 27, 2017

Hi Dan

I kept doing more tests during the day and I noticed the next:

  1. Dynamic casting works within the code and classes belonging to the same shared library. What do I mean by this?:
    Lets assume that I create these classes and they are compiled into libtest.so
class Foo {
public:
    virtual ~Foo() {};
    virtual void init() = 0;
};

class Bar : public Foo{
public:
    void init() override;
};

void Bar::init() {
    __android_log_print(ANDROID_LOG_DEBUG, "test", "Bar::init()");
}

I also created this method inside libtest.so just to drive the test and do the object instantiations

JNIEXPORT jboolean JNICALL Java_com_example_twolibs_TwoLibs_testDynamicCast(JNIEnv* env, jobject pObj) {
    std::shared_ptr<Foo> foo = std::make_shared<Bar>();
    foo->init();
    std::shared_ptr<Bar> bar = std::dynamic_pointer_cast<Bar>(foo);

The result of the test is that downcast works. If the symbols defined side libtest.so and the objects are created inside libtest.so but not across shared library boundaries dynamic casting will work.

  1. I changed the c++ runtime from c++_shared to gnustl_shared. The result of the test was that dynamic casting started working across shared library boundaries, this means that there is a bug in the c++_shared. Are you aware of this bug?

@memojedi
Copy link
Author

hi @alexcohn

I just realized that my previous post was describing my experiments really poorly, I added a better description this time

thx
Memo

@DanAlbert
Copy link
Member

DanAlbert commented Sep 27, 2017

Markdown tip:

```c++
void block_of_code() {
    foo();
}
```
void block_of_code() {
    foo();
}

is a lot more readable (and a lot easier to write) than

`void escaping_every_line() {`
`    foo();`
`}`

void escaping_every_line() {
foo();
}

@DanAlbert DanAlbert self-assigned this Oct 3, 2017
@andreya108
Copy link

It seems very similar to #519

@DanAlbert
Copy link
Member

Removing

System.loadLibrary("first");
System.loadLibrary("second");

fixes this for me.

DanAlbert added a commit to DanAlbert/dynamic-cast-repro that referenced this issue Oct 3, 2017
Repro case for android/ndk#533 that is
isolated from gradle and the JVM.
@DanAlbert
Copy link
Member

Minimized and isolated from gradle and the JVM: https://github.com/DanAlbert/dynamic-cast-repro

DanAlbert added a commit to DanAlbert/dynamic-cast-repro that referenced this issue Oct 3, 2017
Repro case for android/ndk#533 that is
isolated from gradle and the JVM.
@DanAlbert
Copy link
Member

Reduced the test case a bit more. Removing the explicit load of libfirst.so or switching to gnustl will cause it to pass. Not sure why yet.

@DanAlbert
Copy link
Member

The dynamic_cast requires that the typeinfo from libfirst.so is available to libtest.so. The explicit dlopen of libfirst.so with RTLD_LOCAL (which is what System.loadLibrary uses) caches the load of that library, so when libtest.so is loaded it doesn't reopen libfirst.so, and as such doesn't have the typeinfo available. This behavior is the same on Ubuntu.

Did you ever test your gabi++ workaround on x86? As far as I can tell your fix only worked for ARM: https://android.googlesource.com/platform/ndk/+/master/sources/cxx-stl/gabi++/src/type_info.cc#50

Amazingly this architecture specific nonsense does appear to be "correct" in the eyes of the standard. The C++ ABI spec states that type_info comparisons are implemented with a pointer comparison:

It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms.

The ARM C++ ABI spec, on the other hand, states that this should be performed with a string comparison:

Specifically, we must drop the requirement that one definition should mean one address.
This appears to have no consequence for virtual tables (symbols matching _ZT{V,T}type), as nothing seems to depend on the address of a virtual table being unique, but it matters for RTTI (symbols matching _ZT{I,S}type). This runs contrary to §2.9.1 of [GC++ABI] which states:

  • It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms

Fortunately, we can ignore this requirement without violating the C++ standard provided that:

  • type_info::operator== and type_info::operator!= compare the strings returned by type_info::name(), not just the pointers to the RTTI objects and their names.

...

GCC's docs say this is unsupported: https://gcc.gnu.org/faq.html#dso

If you use dlopen to explicitly load code from a shared library, you must do several things. First, export global symbols from the executable by linking it with the "-E" flag (you will have to specify this as "-Wl,-E" if you are invoking the linker in the usual manner from the compiler driver, g++). You must also make the external symbols in the loaded library available for subsequent libraries by providing the RTLD_GLOBAL flag to dlopen. The symbol resolution can be immediate or lazy.

This is in contrast to their actual implementation though. https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01239.html:

This patch solves this problem for typeinfo comparison by switching to using strcmp by default on all targets, since weak symbols don't quite do the trick.

Apparently they gave up on trying to do right by the C++ ABI and decided to take the pragmatic approach of always using strcmp to keeping RTTI working regardless of how the library was loaded. I'm inclined to agree.

There's a flag we can build libc++abi with that will put it in a more permissive more that uses strcmp as a fallback in case the address comparison fails. Currently it emits a warning in syslog when this case is encountered because the author assumed that this could only happen if libraries were improperly built with hidden typeinfo. I'll start a conversation with upstream to see if we can get that removed.

@DanAlbert
Copy link
Member

https://android-review.googlesource.com/#/c/platform/external/libcxxabi/+/503361

I think I'll cherry-pick this for r16.

@memojedi
Copy link
Author

memojedi commented Oct 6, 2017

hi Dan
Sorry, I've been out for a while, thx a lot for looking into this. Does this mean that the c++_shared provided in the ndk will be built with _LIBCXX_DYNAMIC_FALLBACK and the type_info comparison will rely on strcmp like gabi++?

If this is true, pls let me know once your changes make it to beta, I would like to test my entire project against it

thx
Memo

@DanAlbert
Copy link
Member

I cherry-picked it to the r16 branch. It's in build 4380891 (the latest at time of writing) if you want to try pulling a release from the build servers. Otherwise, we're hoping to ship beta 2 next week or maybe the week after (I keep jamming in more fixes; at some point I'll stop and actually ship the thing).

From #519, it looks like there is another bug with dynamic_cast, but r16 at least fixes the bug demonstrated by your test case.

@DanAlbert
Copy link
Member

Does this mean that the c++_shared provided in the ndk will be built with _LIBCXX_DYNAMIC_FALLBACK and the type_info comparison will rely on strcmp like gabi++?

Yep. Though unlike gabi++, it will do it for all architectures, not just ARM32.

@DanAlbert
Copy link
Member

Okay, after having a conversation with some of the LLVM developers while trying to upstream our patches, turns out there actually is a bug in the test case here. I'm reverting the changes we've made since they are not actually correct.

@memojedi: the bug in your code is here:

class Foo {
public:
    virtual ~Foo() {};
    virtual void init() = 0;
};

class Bar : public Foo{
public:
    void init() override;
};

Neither Foo nor Bar has a key function. You need to have a non-inline, non-pure virtual function defined. i.e.

foo.h

class Foo {
public:
    virtual ~Foo();
    virtual void init() = 0;
};

class Bar : public Foo{
public:
    virtual ~Bar();
    void init() override;
};

foo.cpp

#include "foo.h"

Foo::~Foo() {}
Bar::~Bar() {}

If you've done this, then you'll have a strong global symbol for the typeinfo in your library instead of weak symbols in every library, and then dynamic_cast works fine even across dlopen boundaries.

@DiligentGraphics
Copy link

@memojedi did any of these suggestions solve your problem? I am seeing exact same issue, but cannot fix it.
@DanAlbert I tried to add key functions to my classes (which is super odd since I had to add a non-pure virtual function to an abstract interface class), but it did not help. I also added __attribute__ ((visibility("default"))) to all class declarations with no effect (my dll is built with -fvisibility=hidden by default). I tried compiling with -D_LIBCXX_DYNAMIC_FALLBACK, with no luck either. I am not sure what else I can try.

@DanAlbert
Copy link
Member

I tried compiling with -D_LIBCXX_DYNAMIC_FALLBACK, with no luck either.

The C++ stdlib itself needs to be built with that. It's not something you can change on your end.

Do you have a test case you can share?

@DiligentGraphics
Copy link

@DanAlbert I did few more experiments and found out that may problem is little different: in my case I have definition of Foo and Bar in both the dynamic library and in the main module. This happens because in my case Foo is an abstract class that has only pure virtual methods, so I had to add artificial definition of one of the methods, and there is really no good place to put this definition into.
Bar is the implementation of Foo and all of its methods are final and inline. So I had to again artificially put definition of one of the methods into some location, which is not unique. Think of Foo and Bar as of IUnknown interface and its base implementation that are used everywhere in the system. What is the right location to put the definition of IUnknown::QueryInterface()?

@alexcohn
Copy link

I would say, you can follow the example of llvm-libc++ which keeps GLOBAL refs for typeinfo for hundreds of types, including void and std::type_info.

@DiligentGraphics
Copy link

@alexcohn is there a doc/link that I can follow?

@rprichard
Copy link
Collaborator

I would say, you can follow the example of llvm-libc++ which keeps GLOBAL refs for typeinfo for hundreds of types, including void and std::type_info.

FWIW: the typeinfo objects for both of those types are output in libc++abi (libc++abi.a and libc++_shared.so). void's typeinfo is non-weak because there's a single place where compiler magic generates it (~__fundamental_type_info() in private_typeinfo.cpp), and std::type_info has a key function (virtual ~type_info() in stdlib_typeinfo.cpp), so its typeinfo object is also non-weak.

@memojedi
Copy link
Author

@DanAlbert I'm sorry that I didn't reply to this. I was switched to another project since my last reply and just until today I was switched to this again. You were correct in your previous comment, if I add non-inline destructor for the pure-virtual class the dynamic casting will work. Thx for looking into this.

miodragdinic pushed a commit to MIPS/external-libcxxabi that referenced this issue Apr 17, 2018
Enabling _LIBCXX_DYNAMIC_FALLBACK allows type_info objects to be equal
if their strings are equal, not just their addresses. This is very
common on Android where libraries are loaded with dlopen via
System.loadLibrary.

This behavior matches libsupc++, which always does a string compare to
preserve compatibility for plugin architectures that behave similarly.

Gory details are in the bug.

Test: ndk/run_tests.py --filter dynamic_cast_dlopen
Bug: android/ndk#533
Change-Id: I26cbf8d260cb7fb924580db6b346b42f39d5c2ed
miodragdinic pushed a commit to MIPS/external-libcxxabi that referenced this issue Apr 17, 2018
The user's code was missing key functions on the types in question.

This reverts commit dc179ca.

Bug: android/ndk#533
Test: ndk/run_tests.py
Change-Id: I7125c6dcaf0f87e906682eb46783d2056c2be7db
miodragdinic pushed a commit to MIPS/ndk that referenced this issue Apr 17, 2018
Test: ./run_tests.py --filter dynamic_cast_dlopen
Bug: android/ndk#533
Change-Id: I345ae21639d7ab16ae3487b5ba633b66940b6bb6
miodragdinic pushed a commit to MIPS/ndk that referenced this issue Apr 17, 2018
There's actually a bug in the test case. Both `MyType` and
`MyTypeImpl` are missing their key functions.

This reverts commit 8c28ddd.

Bug: android/ndk#533
Test: ./run_tests.py
@mlfarrell
Copy link

This is still happening for me. I'm on r19-c. Please help!

		   System.loadLibrary("vrapi");
		   System.loadLibrary("assimp");
		   //System.loadLibrary("vglloader");
       System.loadLibrary("vglpp"); //<--- dynamic_casts for anything from this lib is broken
       System.loadLibrary("vnl");
       System.loadLibrary("vui");
            cmake {
                arguments "-DANDROID_STL=c++_shared"
                cppFlags "-std=c++14 -DANDROID=1"
            }
        }
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-export-dynamic")

crazyzlj added a commit to lreis2415/SEIMS that referenced this issue May 22, 2023
One or more of the following type_info's has hidden visibility
 or is defined in more than one translation unit.
They should all have public visibility.
Refer: android/ndk#533 (comment)
github-actions bot pushed a commit to lreis2415/SEIMS that referenced this issue May 22, 2023
One or more of the following type_info's has hidden visibility
 or is defined in more than one translation unit.
They should all have public visibility.
Refer: android/ndk#533 (comment) 2a540b8
sin-ack added a commit to sin-ack/zig that referenced this issue Aug 20, 2024
Required to have dynamic_cast working across library boundaries without
introducing key functions.

Ref: android/ndk#533 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@DanAlbert @alexcohn @rprichard @andreya108 @mlfarrell @memojedi and others