Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic_cast form pointers is not working when linked with libc++_shared (ndk r15, r16b1) #519

Closed
andreya108 opened this issue Sep 13, 2017 · 44 comments

Comments

@andreya108
Copy link

andreya108 commented Sep 13, 2017

Description

Please help, I'm trying to figure out the reason, but still no luck. Maybe it is a bug.

Our project contains several .so which parts are building in different ways (for example, boost and other 3dp libs with standalone toolchains, main part which is bundled in aar with ndk-build and the app itself and jni part with gradle/cmake.

When I've switched to ndk16-b1 and libc++_shared all dynamic_cast's in c++ code from pointers to derived_class stored in std::list<base_class*> turned to nullptr.

For example:

class A {}
class B : A {}
std::list < A * > aList;
aList.add( new B() );
A* aPtr = aList.begin().get();
B* bPtr = dynamic_cast<B*>(aPtr);
// => bPtr = nullptr

This is only when libc++_shared is used.

I've tested with libc++_static, gnustl_shared & gnustl_static - the problem does not appear.
bPtr as expected is a pointer to object B added to list.

Any ideas?

Environment Details

  • NDK Version: 16.0.4293906-beta1
  • Build sytem: ndk-build + cmake + standalone toolchain
  • Host OS: Ubuntu 16.04
  • Compiler: clang c++14
  • ABI: arm64-v8a
  • STL: libc++_shared
  • NDK API level: 21
  • Device API level: 26
@andreya108
Copy link
Author

Just tested: the same behavior when compiled with r15c.

@andreya108 andreya108 changed the title dynamic_cast is not working when linked with libc++_shared (ndk16b1) dynamic_cast is not working when linked with libc++_shared (ndk r15, r16b1) Sep 13, 2017
@DanAlbert DanAlbert self-assigned this Sep 13, 2017
@andreya108 andreya108 changed the title dynamic_cast is not working when linked with libc++_shared (ndk r15, r16b1) dynamic_cast form pointers from std::list is not working when linked with libc++_shared (ndk r15, r16b1) Sep 13, 2017
@andreya108
Copy link
Author

andreya108 commented Sep 13, 2017

Sorry, at first I've looked at another part of code and wrote about smart pointers, but it is std::list. Description is fixed now.

@DanAlbert
Copy link
Member

Your test case is not valid C++. Could you upload a test case?

@DanAlbert DanAlbert removed their assignment Sep 13, 2017
@andreya108
Copy link
Author

I'm trying to reproduce the problem outside our environment...

@andreya108
Copy link
Author

andreya108 commented Sep 18, 2017

I still have no luck reproducing the issue outside our environment.
Everything looks like if -fno-rtti is enabled.

Here is clang invocation command:

/usr/bin/ccache /home/andrey/android-ndk-r16-beta1/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ 
-MMD -MP -MF ./obj/local/arm64-v8a/objs-debug/src/file.o.d -gcc-toolchain /home/andrey/android-ndk-r16-beta1/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64 
-target aarch64-none-linux-android -ffunction-sections -funwind-tables -fstack-protector-strong 
-fpic -Wno-invalid-command-line-argument -Wno-unused-command-line-argument 
-no-canonical-prefixes  -g -fno-exceptions -fno-rtti -O0 -UNDEBUG -fno-limit-debug-info  
-I/home/andrey/android-ndk-r16-beta1/sources/cxx-stl/llvm-libc++/include 
-I/home/andrey/android-ndk-r16-beta1/sources/cxx-stl/llvm-libc++/../llvm-libc++abi/include 
-I/home/andrey/android-ndk-r16-beta1/sources/android/support/include -std=c++11 
-DUSE_ANDROID_UNIFIED_HEADERS -fcolor-diagnostics -frtti -fexceptions -femulated-tls 
-std=c++14 -fstack-protector-strong -DANDROID_NDK_VERSION=16 -Werror=return-type 
-ffunction-sections -fdata-sections -fvisibility=hidden -D_LIBCPP_HAS_NO_OFF_T_FUNCTIONS 
-fcolor-diagnostics -Wno-deprecated-register -Wno-inline-new-delete 
-Wno-invalid-source-encoding -Wno-unused-value -Wno-parentheses -Wno-deprecated  
-DANDROID  -D__ANDROID_API__=21 -Wa,--noexecstack -Wformat -Werror=format-security 
-O -g -DNDEBUG  --sysroot /home/andrey/android-ndk-r16-beta1/sysroot 
-isystem /home/andrey/android-ndk-r16-beta1/sysroot/usr/include/aarch64-linux-android 
-c  /mnt/ssd2/Development/file.cpp -o ./obj/local/arm64-v8a/objs-debug/file.o

Can it be configured without flags overriding? I mean it inserts some default options like -fno-rtti and -std=c++11 and then they are overrided in Application.mk in APP_CPPFLAGS.

@DanAlbert
Copy link
Member

For RTTI flags, the last one wins, so -frtti is in effect here. As long as you have -frtti in APP_CPPFLAGS it should be working.

If you are trying to cast something from a type in a library that's a prebuilt that was not built with RTTI, maybe that's the problem, but I don't think that's the case based on the command line above.

@andreya108
Copy link
Author

andreya108 commented Sep 18, 2017

Yes, it is definitely compiled with -frtti.

Btw, can rtti info be stripped somehow? We use -Wl,-gc-section and -Wl,
--version-script to strip all dead code and hide extra symbols.

But anyway, I my testcase I also used that options and it still cannot be reproduced.

But in debugger in production code (I can attach screenshot) code looks like:

A* a = * iter; // iter is std::list<A*>::iterator
B* b = dynamic_cast<B*>( a );

debugger shows that a is a pointer to class derived from B (B derived from A), but b anyway becomes NULL. I have no idea yet...

@andreya108
Copy link
Author

How I can ensure that a shared library really contains rtti?

@DanAlbert
Copy link
Member

We use -Wl,-gc-section and -Wl,--version-script to strip all dead code and hide extra symbols.

I'm pretty sure that RTTI across libraries does require exposing the RTTI data (which is no different from any other symbol), so if you're not exposing that in your version script, that's probably your issue.

How I can ensure that a shared library really contains rtti?

$ readelf -sW libfoo.so | c++filt | grep typeinfo

@andreya108
Copy link
Author

andreya108 commented Sep 28, 2017

It was really stripped.
But I've completely removed any stripping and hiding (no more version-script a and -gc-sections, all symbols are exported)... and nothing changed. dynamic_cast is still not working.
I can't get why it is ok with gnustl_shared.

How can I compare dynamic_cast implementations within gnustl & libc++? Where is actual for NDK source code located?

Another idea:
Here is a linking scheme in out project:

class StreamSocket : public NonBlockSocket -> libfoo.a (static)
libfoo.a + ...x... -> lib1.so
libfoo.a + ...y... -> lib2.so (2 shared libs both linked with the same static library)

java: System.loadLibrary( "c++_shared")
java: System.loadLibrary( "jniproxy")
libjniproxy.so -> dlopen ( lib1.so ) then dlopen( lib2.so )

It's known that dynamic_cast<StreamSocket*>() fails in lib2.so which is loaded after lib1.so
Can somehow rtti of lib1.so interfere with the same of lib2.so?

There are no dependencies between lib1.so and lib2.so and no C++ object of that type are passed. All communication is done via libjniproxy.so which is not aware about libfoo and its content.

andrey:~/build/libs/arm64-v8a$ readelf -sW lib1.so | c++filt | grep typeinfo | grep Socket
  2414: 000000000038cc20    17 OBJECT  GLOBAL DEFAULT   11 typeinfo name for NonBlockSocket
  3698: 00000000004b59b0    24 OBJECT  GLOBAL DEFAULT   18 typeinfo for StreamSocket
  9515: 00000000004b58b8    16 OBJECT  GLOBAL DEFAULT   18 typeinfo for NonBlockSocket
 10173: 000000000038d324    15 OBJECT  GLOBAL DEFAULT   11 typeinfo name for StreamSocket

andrey:~/build/libs/arm64-v8a$ readelf -sW lib1.so | c++filt | grep vtable | grep Socket
   444: 00000000004b58c8   224 OBJECT  GLOBAL DEFAULT   18 vtable for StreamSocket
  6533: 00000000004b5818   160 OBJECT  GLOBAL DEFAULT   18 vtable for NonBlockSocket

andrey:~/build/libs/arm64-v8a$ readelf -sW lib2.so | c++filt | grep typeinfo | grep Socket
  5015: 0000000000d19d88    16 OBJECT  GLOBAL DEFAULT   18 typeinfo for NonBlockSocket
 17807: 00000000009f5580    17 OBJECT  GLOBAL DEFAULT   11 typeinfo name for NonBlockSocket
 19740: 00000000009f5c68    15 OBJECT  GLOBAL DEFAULT   11 typeinfo name for StreamSocket
 20106: 0000000000d19e80    24 OBJECT  GLOBAL DEFAULT   18 typeinfo for StreamSocket

andrey:~/build/libs/arm64-v8a$ readelf -sW lib2.so | c++filt | grep vtable | grep Socket
  3601: 0000000000d19d98   224 OBJECT  GLOBAL DEFAULT   18 vtable for StreamSocket
 13639: 0000000000d19ce8   160 OBJECT  GLOBAL DEFAULT   18 vtable for NonBlockSocket

@andreya108
Copy link
Author

andreya108 commented Sep 28, 2017

I'm trying to reproduce this in test case and foo* symbols' typeinfo get LOCAL attribute instead of GLOBAL as in main project:

with -fvisibility=hidden

   207: 0000000000016c70    24 OBJECT  LOCAL  DEFAULT   18 typeinfo for B
   214: 000000000000504c     3 OBJECT  LOCAL  DEFAULT   11 typeinfo name for A
   216: 00000000000050dc     3 OBJECT  LOCAL  DEFAULT   11 typeinfo name for C
   225: 0000000000016c00    32 OBJECT  LOCAL  DEFAULT   18 typeinfo for A*
   232: 00000000000050d0     4 OBJECT  LOCAL  DEFAULT   11 typeinfo name for B*
   240: 0000000000005050    57 OBJECT  LOCAL  DEFAULT   11 typeinfo name for std::__ndk1::__shared_ptr_emplace<C, std::__ndk1::allocator<C> >
   254: 0000000000016b10    16 OBJECT  LOCAL  DEFAULT   18 typeinfo for A
   255: 0000000000016cb0    24 OBJECT  LOCAL  DEFAULT   18 typeinfo for C
   265: 00000000000050d8     3 OBJECT  LOCAL  DEFAULT   11 typeinfo name for B
   278: 0000000000016c20    32 OBJECT  LOCAL  DEFAULT   18 typeinfo for B*
   285: 0000000000016b60    24 OBJECT  LOCAL  DEFAULT   18 typeinfo for std::__ndk1::__shared_ptr_emplace<C, std::__ndk1::allocator<C> >
   289: 00000000000050cc     4 OBJECT  LOCAL  DEFAULT   11 typeinfo name for A*

without -fvisibility=hidden

    26: 0000000000019a30    24 OBJECT  GLOBAL DEFAULT   18 typeinfo for B
    37: 0000000000007f0c     3 OBJECT  WEAK   DEFAULT   11 typeinfo name for A
    39: 0000000000007f9c     3 OBJECT  GLOBAL DEFAULT   11 typeinfo name for C
    53: 00000000000199c0    32 OBJECT  WEAK   DEFAULT   18 typeinfo for A*
    61: 0000000000007f90     4 OBJECT  WEAK   DEFAULT   11 typeinfo name for B*
    87: 00000000000198d0    16 OBJECT  WEAK   DEFAULT   18 typeinfo for A
    89: 0000000000019a70    24 OBJECT  GLOBAL DEFAULT   18 typeinfo for C
   104: 0000000000007f98     3 OBJECT  GLOBAL DEFAULT   11 typeinfo name for B
   122: 00000000000199e0    32 OBJECT  WEAK   DEFAULT   18 typeinfo for B*
   136: 0000000000007f8c     4 OBJECT  WEAK   DEFAULT   11 typeinfo name for A*
   220: 0000000000019a30    24 OBJECT  GLOBAL DEFAULT   18 typeinfo for B
   231: 0000000000007f0c     3 OBJECT  WEAK   DEFAULT   11 typeinfo name for A
   233: 0000000000007f9c     3 OBJECT  GLOBAL DEFAULT   11 typeinfo name for C
   247: 00000000000199c0    32 OBJECT  WEAK   DEFAULT   18 typeinfo for A*
   255: 0000000000007f90     4 OBJECT  WEAK   DEFAULT   11 typeinfo name for B*
   281: 00000000000198d0    16 OBJECT  WEAK   DEFAULT   18 typeinfo for A
   283: 0000000000019a70    24 OBJECT  GLOBAL DEFAULT   18 typeinfo for C
   298: 0000000000007f98     3 OBJECT  GLOBAL DEFAULT   11 typeinfo name for B
   316: 00000000000199e0    32 OBJECT  WEAK   DEFAULT   18 typeinfo for B*
   330: 0000000000007f8c     4 OBJECT  WEAK   DEFAULT   11 typeinfo name for A*

and there is not typeinfo for class pointers in main project...

@DanAlbert
Copy link
Member

Agreed that this looks like #533. Once I get the fix for that submitted, you should check your app against a canary build.

@andreya108
Copy link
Author

I've tested with both:

NDK r17 Canary Build 4380476 2017 Oct 6 05:27:34
and
NDK r16 Canary Build 4380053 2017 Oct 6 01:32:41

Still no luck. Maybe your fix is not there yet? So, looking forward to the next build.

Will it be available in r16 or should I check only r17 canary?

@DanAlbert
Copy link
Member

Not in r16 yet, but it was in build 4380016 of r17 from a couple hours before the one you tried. I guess you managed to find an unrelated dynamic_cast issue :(

Keep trying to get a test case. If you manage to get a repro case I can take a look.

@DanAlbert
Copy link
Member

DanAlbert commented Oct 11, 2017

I've posted an update on the other bug. Now that I understand the problem better, I think you do have a bug here:

87: 00000000000198d0    16 OBJECT  WEAK   DEFAULT   18 typeinfo for A

A doesn't have a key function. You need to add a non-inline non-pure virtual function to A. If you do that, the typeinfo will be GLOBAL DEFAULT instead of WEAK DEFAULT, and then dynamic_cast should work.

@andreya108
Copy link
Author

andreya108 commented Oct 12, 2017

Looks like it is not the case.

For debug purposes I've added to every class in hierarchy type describing method like:

my.h:

class Basic {
public:
  Basic();
  ~Basic();
  virtual const char* Type();
//...
}

class Derived {
public:
  Derived();
  ~Derived();
  const char* Type() override;
//...
}

my.cpp:

Basic::Basic() {}
Basic::~Basic() {}
const char* Basic::Type() { return "Basic"; }

Derived::Derived() {}
Derived::~Derived() {}
const char* Derived::Type() { return "Derived"; }

And latter in code:

void myfunc(Basic *obj)
{
    Derived* derived = dynamic_cast<Derived*>(obj);
   if (!derived)
  {
    log("Cannot cast from %s to Derived", obj->Type());
  }
}

//...
Derived obj;
myfunc(&obj);

Results:

Cannot cast from Derived to Derived

And objects do not pass through dlopen boundary. Every shared library is isolated (no internally defined types exposed) and communicate with each other only by means of simple types and some std:: types (like string & list).

@andreya108
Copy link
Author

andreya108 commented Oct 12, 2017

The above readelf listings with A/B/C classes are from my test case, and dynamic_cast works well there despite WEAK DEFAULT.

Unfortunately I still cannot reproduce the problem in test environment, this issue occurs only in production code.
I will try again...

@andreya108 andreya108 changed the title dynamic_cast form pointers from std::list is not working when linked with libc++_shared (ndk r15, r16b1) dynamic_cast form pointers is not working when linked with libc++_shared (ndk r15, r16b1) Oct 12, 2017
@DanAlbert DanAlbert reopened this Oct 12, 2017
@DanAlbert
Copy link
Member

And objects do not pass through dlopen boundary

System.loadLibrary counts. If you

System.loadLibrary("a");
System.loadLibrary("b");

and libb.so depends on liba.so, you won't be able to dynamic_cast in libb.so for any types also defined in liba.so unless the type_infos in liba.so are non-weak.

With the code above added to each of your classes, you shouldn't be getting WEAK DEFAULT symbols. They should be GLOBAL DEFAULT. That's what I see in a trivial test case locally.

@andreya108
Copy link
Author

Both libraries depend only on system libraries and libc++_shared.so.

And do not depend on each other.

@DanAlbert
Copy link
Member

Yeah, we'd already more or less shown that your bug was something different than the other one, but figured it was worth checking.

@andreya108
Copy link
Author

I'd removed all of dlopen/dlclose and the problem have gone...

When all libs are loaded once from java dynamic_cast works fine with libc++_shared.

@DanAlbert
Copy link
Member

That's good to hear. I'd rather have a better understanding of the problem you were encountering, but given that you have a workaround and haven't managed to work out a shareable test case, I think we should just close this. Let us know if you get more information and we'll reopen.

@Cristo86
Copy link

Cristo86 commented Dec 11, 2017

Is this solved in NDK update 16.1.4479499 (updated from SDKManager in Android Studio)?

Current setup: That NDK, clang and libc++ shared.

I'm still having a null returned by dynamic_pointer_cast which with ndk12, clang and gnustl did work (e.g.):

_touchscreenVirtualPadDevice = std::dynamic_pointer_cast<TouchscreenVirtualPadDevice>(inputPlugin);

@DanAlbert
Copy link
Member

There's nothing we can do without a test case. If you have one, post it here and we'll reopen.

@Cristo86
Copy link

Just to know if the fixes mentioned here made it to a stable r16 release. Either way definitely I'll have to isolate the problem for a test case. Thanks.

@rprichard
Copy link
Collaborator

I wrote a tool that might be useful for debugging issues with RTTI and multiple C++ shared objects. It's a single-header-file C++ library that prints the shared object where an std::type_info object is located.

e.g. in @Cristo86's case, it should be possible to write something like this:

#include "rtti_dump.h"
...
// Dumps (into logcat) the shared library containing the std::type_info for the
// type we're casting *from*.
rtti_dump::dump_type(&typeid(decltype(*inputPlugin.get())), "src");

// Dumps the std::type_info for the type we're trying to cast *to*.
rtti_dump::dump_type(&typeid(TouchscreenVirtualPadDevice), "dst");

// Dumps a hierarchy of std::type_info objects, starting with the most-derived
// class of the inputPlugin object. __dynamic_cast traverses this hierarchy at
// run-time and expects to find both src and dst.
rtti_dump::dump_class_hierarchy(rtti_dump::runtime_typeid(inputPlugin.get()));

Assuming a class hierarchy like so...

struct TouchscreenVirtualPadDevice {
  virtual ~TouchscreenVirtualPadDevice() {}
};

struct OtherBase {
  virtual ~OtherBase() {}
};

struct Derived : TouchscreenVirtualPadDevice, OtherBase {};

std::shared_ptr<OtherBase> inputPlugin;

... it would dump something like this into the log:

src: type 9OtherBase:
src:     type_info obj:  0x4040d0 (in ./a.out)
src:     type_info name: 0x4040b8 (in ./a.out)
dst: type 27TouchscreenVirtualPadDevice:
dst:     type_info obj:  0x404100 (in ./a.out)
dst:     type_info name: 0x4040e0 (in ./a.out)
dump_class_hierarchy: type 7Derived:
dump_class_hierarchy:     type_info obj:  0x404080 (in ./a.out)
dump_class_hierarchy:     type_info name: 0x404060 (in ./a.out)
dump_class_hierarchy:     base classes:
dump_class_hierarchy:         type 27TouchscreenVirtualPadDevice:
dump_class_hierarchy:             type_info obj:  0x404100 (in ./a.out)
dump_class_hierarchy:             type_info name: 0x4040e0 (in ./a.out)
dump_class_hierarchy:         type 9OtherBase:
dump_class_hierarchy:             type_info obj:  0x4040d0 (in ./a.out)
dump_class_hierarchy:             type_info name: 0x4040b8 (in ./a.out)

type_info obj shows the address of an std::type_info object, and type_info name shows the address of the string returned from std::type_info::name(). In this case, all the objects are in ./a.out. TouchscreenVirtualPadDevice's type_info object is always at 0x404100, and OtherBase's object is always at 0x4040d0.

The tool is documented here. Links:

  • rtti_dump.h -- the header file

  • solib_rtti_dump.tar.gz -- the entire solib_rtti_dump directory. Has a demo showing how things can go wrong when std::type_info objects are duplicated.

Let me know if this is helpful.

@Cristo86
Copy link

Great tool, thanks @rprichard. I have this output, where it seems that InputPlugin (which here is a base class for TouchscreenVirtualPadDevice) is duplicated (being in different addresses) across libs, so that may be the problem? (I'm still figuring out why libgnustl would make it anyway).

rtti_dump: src: type 11InputPlugin:
rtti_dump: src:     type_info obj:  0x77e60e7350 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinterface.so)
rtti_dump: src:     type_info name: 0x77e5f63d38 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinterface.so)
rtti_dump: dst: type 27TouchscreenVirtualPadDevice:
rtti_dump: dst:     type_info obj:  0x77e68d0ab0 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
rtti_dump: dst:     type_info name: 0x77e68b8e00 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
rtti_dump: dump_class_hierarchy: type 27TouchscreenVirtualPadDevice:
rtti_dump: dump_class_hierarchy:     type_info obj:  0x77e68d0ab0 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
rtti_dump: dump_class_hierarchy:     type_info name: 0x77e68b8e00 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
rtti_dump: dump_class_hierarchy:     base classes:
rtti_dump: dump_class_hierarchy:         type 11InputPlugin:
rtti_dump: dump_class_hierarchy:             type_info obj:  0x77e68d0450 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
rtti_dump: dump_class_hierarchy:             type_info name: 0x77e68b85d4 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
rtti_dump: dump_class_hierarchy:             base classes:
rtti_dump: dump_class_hierarchy:                 type 6Plugin:
rtti_dump: dump_class_hierarchy:                     type_info obj:  0x77e83bbe70 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
rtti_dump: dump_class_hierarchy:                     type_info name: 0x77e83a56b0 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
rtti_dump: dump_class_hierarchy:                     base classes:
rtti_dump: dump_class_hierarchy:                         type 7QObject:
rtti_dump: dump_class_hierarchy:                             type_info obj:  0x77eab2a4a8 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libQt5Core.so)
rtti_dump: dump_class_hierarchy:                             type_info name: 0x77ea9aa4f4 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libQt5Core.so)

I'll double check if for any reason I'm mistakenly building one of the libs with a different stl.

@rprichard
Copy link
Collaborator

(I'm still figuring out why libgnustl would make it anyway).

gnustl treats different type_info objects as equivalent if they have the same name. libc++abi more strictly follows the "Itanium" C++ ABI. From http://itanium-cxx-abi.github.io/cxx-abi/abi.html#rtti-general: "It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms."

I have this output, where it seems that InputPlugin (which here is a base class for TouchscreenVirtualPadDevice) is duplicated (being in different addresses) across libs, so that may be the problem?

Yes, that's the problem. libc++abi's __dynamic_cast needs to verify that there's a public inheritance path from InputPlugin to TouchscreenVirtualPadDevice, so it searches the class hierarchy looking for InputPlugin (0x77e60e7350 in libinterface.so). It sees InputPlugin (0x77e68d0450 in libinput-plugins.so), but the addresses are different so they're considered different types.

I expect that info.path_dst_ptr_to_static_ptr will be unknown here. If the NDK's libc++abi had been compiled with _LIBCXX_DYNAMIC_FALLBACK, then in your situation, __dynamic_cast would fall back to comparing types with strings, and then __dynamic_cast would return non-NULL. (_LIBCXX_DYNAMIC_FALLBACK isn't a general fix for dynamic_cast, though. e.g. If TouchscreenVirtualPadDevice were the duplicated type instead of InputPlugin, then the fallback mode wouldn't activate, and dynamic_cast would still return NULL.)

Suggestions:

  • If you can add a "key function" to InputPlugin, that should fix the problem. A key function is a non-inline, non-pure virtual function. The compiler will output a single std::type_info object in the C++ source file where the virtual function is defined. (It will also change the readelf -s type of the std::type_info symbol from WEAK to GLOBAL.)

  • If you're OK targeting Android M and up (unlikely?), and if you can ensure that all your shared libraries are loaded in a single dlopen / System.loadLibrary call, then the system linker should generally use a single std::type_info object for each type. I don't think this fix works prior to M.

@Cristo86
Copy link

InputPlugin did not have a "key function" so adding a destructor as the non-inline non-pure virtual function made it (I borrowed the idea from #533 @DanAlbert answer, as I couldn't find a reason to invent a function that wasn't there).

Before

$ aarch64-linux-android-readelf  -sW libinterface.so | aarch64-linux-android-c++filt | grep typeinfo | grep InputPlugin
    20: 00000000007ff350    24 OBJECT  WEAK   DEFAULT   18 typeinfo for InputPlugin
 10597: 000000000067bd38    14 OBJECT  WEAK   DEFAULT   11 typeinfo name for InputPlugin

After

InputPlugin.h (addition)

class InputPlugin : public Plugin {
public:
	//...
	virtual ~InputPlugin();
};

InputPlugin.cpp (just to have that destructor)

#include "InputPlugin.h"

InputPlugin::~InputPlugin() {}
$ aarch64-linux-android-readelf  -sW libinterface.so | aarch64-linux-android-c++filt | grep typeinfo | grep InputPlugin
    20: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND typeinfo for InputPlugin

rtti_dump output:

src: type 11InputPlugin:
src:     type_info obj:  0x77e811ba20 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
src:     type_info name: 0x77e8104fa8 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
dst: type 27TouchscreenVirtualPadDevice:
dst:     type_info obj:  0x77e682aae0 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
dst:     type_info name: 0x77e6812830 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
dump_class_hierarchy: type 27TouchscreenVirtualPadDevice:
dump_class_hierarchy:     type_info obj:  0x77e682aae0 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
dump_class_hierarchy:     type_info name: 0x77e6812830 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libinput-plugins.so)
dump_class_hierarchy:     base classes:
dump_class_hierarchy:         type 11InputPlugin:
dump_class_hierarchy:             type_info obj:  0x77e811ba20 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
dump_class_hierarchy:             type_info name: 0x77e8104fa8 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
dump_class_hierarchy:             base classes:
dump_class_hierarchy:                 type 6Plugin:
dump_class_hierarchy:                     type_info obj:  0x77e811be50 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
dump_class_hierarchy:                     type_info name: 0x77e8105620 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libplugins.so)
dump_class_hierarchy:                     base classes:
dump_class_hierarchy:                         type 7QObject:
dump_class_hierarchy:                             type_info obj:  0x77eaafd4a8 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libQt5Core.so)
dump_class_hierarchy:                             type_info name: 0x77ea97d4f4 (in /data/app/io.highfidelity.hifiinterface-1/lib/arm64/libQt5Core.so)

So that's it, InputPlugin address matches and dynamic_pointer_cast worked!

Thanks again for the explanation about type_info objects treatment by different ABIs.

@mlfarrell
Copy link

I've lost hours and hours and hours and hours today trying to do this on the latest NDK.
Is this still an issue? How the heck can I pull this off????

		   System.loadLibrary("vrapi");
		   System.loadLibrary("assimp");
		   //System.loadLibrary("vglloader");
       System.loadLibrary("vglpp"); //<--- dynamic_casts for anything from this lib is broken
       System.loadLibrary("vnl");
       System.loadLibrary("vui");
            cmake {
                arguments "-DANDROID_STL=c++_shared"
                cppFlags "-std=c++14 -DANDROID=1"
            }
        }
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-export-dynamic")

@hcwiley
Copy link

hcwiley commented Aug 20, 2020

still an issue for me and i'm on ndkVersion "21.3.6528147"

@DanAlbert
Copy link
Member

Do you have a repro case? As I said up the thread, we don't have enough information to act on this. The most likely scenario is that your type is missing a key function. Otherwise, @rprichard posted some advice that might help, and we can look further if you can provide a test case.

@Cristo86
Copy link

@hcwiley I managed to fix the error by adding the destructor and some extra code #519 (comment) (following some tips by DanAlbert) with the final rtti_dump output. Does it remotely match your case?

@humanwin
Copy link

Please help,this issue happen to me,dynamic_cast always return null in one .so.

Description:
Our project contains several .so which are building in the same ways.
dynamic_cast work well in most .so but return null in one .so.

build.gradle:

android {
    compileSdkVersion 29
    buildToolsVersion '29.0.3'
    ndkVersion '20.1.5948944'
    targetSdkVersion 29
    minSdkVersion 16
    ndk {
        abiFilters 'armeabi-v7a'
    }
    externalNativeBuild {
        cmake {
            cppFlags "-fpermissive -DUSE_NEW_LOG=1 -UNDEBUG -D_DEBUG -DANDROID=1"
            arguments "-DANDROID_ARM_MODE=arm", "-DANDROID_STL=c++_shared", "-DANDROID_CPP_FEATURES=rtti exceptions"
        }
    }
}

device system:
Android 9

dynamic_cast return null code:

CBase* lower_ptr=GetSessionByCallID(iCallId, false);
   CSingle *upper_ptr = new CSingle;
   const std::type_info *lower_type = rtti_dump::runtime_typeid(lower_ptr);
   const std::type_info *upper_type = rtti_dump::runtime_typeid(upper_ptr);
   assert(lower_type);
   assert(upper_type);
   CSingle* pSingleSession = dynamic_cast<CSingle*>(lower_ptr);
   RTTI_DUMP_LOG("dynamic_cast<CSingle*>(lower_ptr): %s",
                 pSingleSession ? "non-NULL (PASS)" : "NULL (FAILURE)");
   RTTI_DUMP_LOG("*runtime_typeid(lower_ptr) == *runtime_typeid(upper_ptr): %s",
                 *lower_type == *upper_type ? "true (PASS)" : "false (FAILURE)");
   rtti_dump::dump_class_hierarchy(lower_type, "lower_type");
   rtti_dump::dump_class_hierarchy(upper_type, "upper_type");

class CBase and CSingle only used in the same .so.

rtti_dump log:

I/rtti_dump: dynamic_cast<CSingle*>(lower_ptr): NULL (FAILURE)
I/rtti_dump: *runtime_typeid(lower_ptr) == *runtime_typeid(upper_ptr): true (PASS)
I/rtti_dump: lower_type: type 14CSingle:
I/rtti_dump: lower_type:     type_info obj:  0xcadee6cc (in /data/app/uc.android.neut-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: lower_type:     type_info name: 0xcade45b7 (in /data/app/uc.android.neut-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: lower_type:     base classes:
I/rtti_dump: lower_type:         type 12CBase:
I/rtti_dump: lower_type:             type_info obj:  0xcadee344 (in /data/app/uc.android.neut-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: lower_type:             type_info name: 0xcade4257 (in /data/app/uc.android.neut-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: lower_type:             base classes:
I/rtti_dump: lower_type:                 type 16chMessageHandler:
I/rtti_dump: lower_type:                     type_info obj:  0xd042e870 (in /data/app/uc.android.neut-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: lower_type:                     type_info name: 0xd042d364 (in /data/app/uc.android.neut-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: lower_type:                     base classes:
I/rtti_dump: lower_type:                         type 13chSlotHandler:
I/rtti_dump: lower_type:                             type_info obj:  0xd042e6ac (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: lower_type:                             type_info name: 0xd042d20f (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: upper_type: type 14CSingle:
I/rtti_dump: upper_type:     type_info obj:  0xcadee6cc (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: upper_type:     type_info name: 0xcade45b7 (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: upper_type:     base classes:
I/rtti_dump: upper_type:         type 12CBase:
I/rtti_dump: upper_type:             type_info obj:  0xcadee344 (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: upper_type:             type_info name: 0xcade4257 (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libLogic.so)
I/rtti_dump: upper_type:             base classes:
I/rtti_dump: upper_type:                 type 16chMessageHandler:
I/rtti_dump: upper_type:                     type_info obj:  0xd042e870 (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: upper_type:                     type_info name: 0xd042d364 (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: upper_type:                     base classes:
I/rtti_dump: upper_type:                         type 13chSlotHandler:
I/rtti_dump: upper_type:                             type_info obj:  0xd042e6ac (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)
I/rtti_dump: upper_type:                             type_info name: 0xd042d20f (in /data/app/uc.android-eBH_hbL0d6ksOJBzuFrsWg==/lib/arm/libdsk.so)

And readelf :

readelf  -sW libLogic.so |c++filt | grep typeinfo | grep CSingle
OBJECT  GLOBAL DEFAULT   16 typeinfo name for CSingle
OBJECT  GLOBAL DEFAULT   18 typeinfo for CSingle
OBJECT  GLOBAL DEFAULT   18 typeinfo for CSingle
OBJECT  GLOBAL DEFAULT   16 typeinfo name for CSingle

readelf  -sW libLogic.so |c++filt | grep typeinfo | grep CBase
OBJECT  GLOBAL DEFAULT   16 typeinfo name for CBase
OBJECT  GLOBAL DEFAULT   18 typeinfo for CBase
OBJECT  GLOBAL DEFAULT   18 typeinfo for CBase
OBJECT  GLOBAL DEFAULT   16 typeinfo name for CBase

readelf  -sW libdsk.so |c++filt | grep typeinfo | grep chMessageHandler
OBJECT  GLOBAL DEFAULT   19 typeinfo for chMessageHandler
OBJECT  GLOBAL DEFAULT   16 typeinfo name for chMessageHandler
OBJECT  GLOBAL DEFAULT   19 typeinfo for chMessageHandler
OBJECT  GLOBAL DEFAULT   16 typeinfo name for chMessageHandler
 
readelf  -sW libdsk.so |c++filt | grep typeinfo | grep chSlotHandler
OBJECT  GLOBAL DEFAULT   16 typeinfo name for chSlotHandler
OBJECT  GLOBAL DEFAULT   19 typeinfo for chSlotHandler
OBJECT  GLOBAL DEFAULT   19 typeinfo for chSlotHandler
OBJECT  GLOBAL DEFAULT   16 typeinfo name for chSlotHandler

The lower_type and upper_type look the same but dynamic_cast return null.Did I use rtti_dump.h in the wrong way?
Unfortunately I cannot reproduce the problem in test environment, this issue occurs only in production code.
I will try again...

@humanwin
Copy link

humanwin commented Apr 16, 2021

The shared library that dyanamic_cast always return nullptr(aka library B) use an third-party library A.
I use lldb debug the dynamic_cast‘s ARM instruction execution, then find symbol __cxxabiv1::__si_class_type_info refert to the third-party library A. But in dynamic_cast worked library that symbol(__cxxabiv1::__si_class_type_info) refer to libc++_shared.so. After I strip the third-party libarary A, the dynamic_cast worked in library B.

Obviously,3rd party library A should not implement that symbol, but why Android Linker resolve symbol to A not to libc++_shared.so?
Library B's NEEDED list libc++shared.so and A both.

@enh-google
Copy link
Collaborator

Obviously,3rd party library A should not implement that symbol, but why Android Linker resolve symbol to A not to libc++_shared.so?

annoyingly, ELF files don't actually say where to get a symbol from. they just have a list of symbol names to resolve. so you and i know that "printf" is in libc and "cosf" is in libm, but the ELF file doesn't say that, and the dynamic linker has to search. (and if you have a "printf" in your executable, say, it's actually the correct behavior to choose that one rather than the libc one. but for symbols found in libraries it's complicated, and depends on what order things were loaded in. which is why you really want to avoid it ever being the case that there is more than one place to find any symbol.)

@danoli3
Copy link

danoli3 commented Dec 15, 2021

This is only occurring now in Android 5 (SDK 21/22) -fritti enabled
NDK 23.1

Works on all other SDK's

@DanAlbert
Copy link
Member

Did you follow the instructions above? There isn't anything more we can tell you without a test case.

@DanAlbert
Copy link
Member

Also see https://developer.android.com/ndk/guides/common-problems#rttiexceptions_not_working_across_library_boundaries

@gopaladhith
Copy link

@DanAlbert i have a case were i need to download 2 libraries dynamically from my localserver(running inside a separate app) in runtime and load them using system.loadlibrary(library_path) / dlopen(library_path). I cannot keep these libraries as dependenicies to my app(its a special use case for testing purpose so this cannot be kept as app dependency).
Now in this case dynamic_cast returns null. But if i keep the libraries as app dependencies and load them using dlopen(libname) then it works fine dynamic_cast is able to give the derived class instance.
Should something be done when loading shared libraries that is downloaded from a local server in runtime.
Any help would be appreciated because this is really critical for my case.

Thanks in advance,
Adhith.

@DanAlbert
Copy link
Member

aiui it can't be done. The C++ ABI does not allow it. There is no way for the runtime to prove that those are the same type.

@gopaladhith
Copy link

Thanks for the response @DanAlbert, But all the shared libraries are compiled using android's ndk. So the ABI is ideally generated for a specific android architecture. So in runtime can't this be resolved, or someway were we can add this runtime loaded libraries abi information into symbol tables or something. Not sure about the approachh, just wanted to know whether there could be a possibility to do this.

@DanAlbert
Copy link
Member

or someway were we can add this runtime loaded libraries abi information into symbol tables or something.

That is what happens (assuming you've followed the instructions above and have key functions). The problem is that each System.loadLibrary loads each library in isolation, so those type symbols are not visible to the others. They cannot be merged because they are isolated. If they cannot be merged RTTI cannot work between them. This is the behavior specified by the C++ ABI.

@gopaladhith
Copy link

@DanAlbert thanks for the detailed explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants