Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in basic_string copy constructor when compiling with clang toolchain #70

Closed
gpakosz opened this issue Apr 7, 2016 · 2 comments
Milestone

Comments

@gpakosz
Copy link

gpakosz commented Apr 7, 2016

Hello,

I'm facing a crash which I suppose is caused by a codegen bug:

  • NDK_TOOLCHAIN_VERSION=clang
  • APP_ABI=armeabi-v7a-hard
  • APP_OPTIM=release
  • APP_STL=gnustl_static

I reproduced the SIGSEGV with clang version 3.8.243773 shipped with ndk-r11c but both ndk-r11 and ndk-r11b are affected.

Steps to reproduce:

  1. download the ndk-r11-clang-SIGSEGV repro case: ndk-r11-clang-SIGSEGV.zip
  2. unzip and cd to directory
  3. /opt/android-ndk-r11c/ndk-build NDK_APPLICATION_MK=./Application.mk NDK_PROJECT_PATH=.
  4. scp libs/armeabi-v7a/ndk-r11-clang-SIGSEGV device: where device is a target Android phone with a running SSH server, e.g. SSHDroid
  5. ssh into device and launch the program which crashes with SIGSEGV

Remote debugging gave me the following output:

(gdb) target remote 192.168.26.169:2345
Remote debugging using 192.168.26.169:2345
warning: Architecture rejected target-supplied description
Reading /data/data/berserker.android.apps.sshdroid/home/ndk-r11-clang-SIGSEGV from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /data/data/berserker.android.apps.sshdroid/home/ndk-r11-clang-SIGSEGV from remote target...
Reading symbols from target:/data/data/berserker.android.apps.sshdroid/home/ndk-r11-clang-SIGSEGV...done.
Reading /system/bin/linker from remote target...
Reading /system/bin/linker from remote target...
Reading symbols from target:/system/bin/linker...(no debugging symbols found)...done.
0x40003700 in _start () from target:/system/bin/linker
(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0xbeffe6d0, __str=...)
    at /Volumes/Android/buildbot/out_dirs/aosp-ndk-r11-release/build/tmp/build-42939/build-gnustl/static-armeabi-v7a-hardthumb-4.9/include/bits/basic_string.tcc:173
173     /Volumes/Android/buildbot/out_dirs/aosp-ndk-r11-release/build/tmp/build-42939/build-gnustl/static-armeabi-v7a-hardthumb-4.9/include/bits/basic_string.tcc: No such file or directory.
(gdb) bt
#0  std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0xbeffe6d0, __str=...)
    at /Volumes/Android/buildbot/out_dirs/aosp-ndk-r11-release/build/tmp/build-42939/build-gnustl/static-armeabi-v7a-hardthumb-4.9/include/bits/basic_string.tcc:173
#1  0x000182e0 in testing::internal::CodeLocation::CodeLocation (this=0x0) at ./gtest/gtest.h:8239
#2  testing::internal::MakeAndRegisterTestInfo (test_case_name=<optimized out>, name=<optimized out>, type_param=0x0,
    value_param=0x8f2fc "DSLTest1/DSLTest", code_location=..., fixture_class_id=0x8f4fd, set_up_tc=0x8f2fc, tear_down_tc=0x0, factory=0xbeffe828)
    at ./gtest/gtest-all.cc:4003
#3  0x0000e33c in testing::internal::ParameterizedTestCaseInfo<(anonymous namespace)::DSLTest>::RegisterTests (this=<optimized out>)
    at ./gtest/gtest.h:11629
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

basic_string.tcc:173 is:

168   template<typename _CharT, typename _Traits, typename _Alloc>
169     basic_string<_CharT, _Traits, _Alloc>::
170     basic_string(const basic_string& __str)
171     : _M_dataplus(__str._M_rep()->_M_grab(_Alloc(__str.get_allocator()),
172             __str.get_allocator()),
173       __str.get_allocator())
174     { }

When changing the STL for c++_static it also ends up crashing in the copy constructor of basic_string.

When step debugging, I noticed that gtest captures code location information in its CodeLocation struct that captures file names by value in a std::string...

struct CodeLocation {
  CodeLocation(const string& a_file, int a_line) : file(a_file), line(a_line) {}

  string file;
  int line;
};

While debugging, I deciphered the broken std::string instance passed to the copy constructor comes from gtest.h:11634:

11629  MakeAndRegisterTestInfo(
11630      test_case_name.c_str(),
11631      test_name_stream.GetString().c_str(),
11632      NULL,  // No type parameter.
11633      PrintToString(*param_it).c_str(),
11634      code_location_,
11635      GetTestCaseTypeId(),
11636      TestCase::SetUpTestCase,
11637      TestCase::TearDownTestCase,
11638      test_info->test_meta_factory->CreateTestFactory(*param_it));

That code_location_ member variable gets passed by value and when arriving in the MakeAndRegisterTestInfo() implementation in gtest-all.cc:4004 the code_location parameter's .file member variable contains garbage.

3992 TestInfo* MakeAndRegisterTestInfo(
3993     const char* test_case_name,
3994     const char* name,
3995     const char* type_param,
3996     const char* value_param,
3997     CodeLocation code_location,
3998     TypeId fixture_class_id,
3999     SetUpTestCaseFunc set_up_tc,
4000     TearDownTestCaseFunc tear_down_tc,
4001     TestFactoryBase* factory) {
4002   TestInfo* const test_info =
4003       new TestInfo(test_case_name, name, type_param, value_param,
4004                    code_location, fixture_class_id, factory);
4005   GetUnitTestImpl()->AddTestInfo(set_up_tc, tear_down_tc, test_info);
4006   return test_info;
4007 }

Workarounds found so far:

  1. patch gtest and capture file location with a string literal so that it stops happily copying around instances of std::string
  2. compile with -fno-omit-frame-pointer ¯\_(ツ)_/¯

I also noticed it has to do with unions used in the code (I know the repro code looks strange, it's the result of spending hours reducing and debugging until I can come up with an well articulated bug repro I can share).

@DanAlbert
Copy link
Member

The beta release has a clang that's quite a bit newer: https://github.com/android-ndk/ndk/wiki#current-beta-release

Let us know if it fixes the problem.

@gpakosz
Copy link
Author

gpakosz commented Apr 13, 2016

Fixes it for me but that's kinda too bad not understanding what exactly went wrong in clang versions from ndk-r11, ndk-r11b and ndk-r11c.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants