Description
Recently, I wrote a bunch of gtest unit-test to instrument the image processing library we developed. The unit-test are a standard aarch64 elf64 executable, and the library is a standard dynamic shared library. I push both to the device and launch the tests through adb shell. I noticed many large inconsistencies in performance, and after lots of debugging, I narrowed it down to memory management. Turns out that calling libc's free() on large blocks takes anything between 10 and 80 milliseconds.
Issue was tested and reproduced on Snapdragon 820 dev board and Samsung Galaxy S7. Both running release build of the OS.
Here is a distilled case that demonstrates the issue.
- test.cpp
#include <iostream>
#include <chrono>
using namespace std;
// forward declare imported methods
namespace Test
{
void * Alloc( size_t size );
void Free( void * ptr );
void Foo( void * ptr, size_t size );
} // Test namespace
int main()
{
using hrclock = chrono::high_resolution_clock;
constexpr size_t kBlockSize = 65 << 20;
for( int i = 0; i < 100; ++i )
{
// allocate
auto start_time = hrclock::now();
void * ptr = Test::Alloc( kBlockSize );
cout << "alloc: " << chrono::duration_cast<chrono::microseconds>( hrclock::now() - start_time ).count() << endl;
// "process" block
Test::Foo( ptr, kBlockSize );
// free
start_time = hrclock::now();
Test::Free( ptr );
cout << "free: " << chrono::duration_cast<chrono::microseconds>( hrclock::now() - start_time ).count() << endl;
}
return 0;
}
- test_lib.cpp
#include <cstdlib>
#include <memory>
namespace Test
{
void * Alloc( std::size_t size ) { return std::malloc( size ); }
void Free( void * ptr ) { std::free( ptr ); }
void Foo( void * ptr, std::size_t size ) { std::memset( ptr, 0, size ); }
} // Test namespace
- CMakeLists.txt
project( test C CXX )
cmake_minimum_required( VERSION 3.2 )
include_directories( ${CMAKE_CURRENT_SOURCE_DIR} )
add_library( bug_test_lib SHARED test_lib.cpp )
add_executable( bug_test test.cpp )
add_dependencies( bug_test bug_test_lib )
target_link_libraries( bug_test bug_test_lib )
set( CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11" )
- Build setup
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_SYSTEM_NAME=Android \
-DCMAKE_SYSTEM_VERSION=23 \
-DCMAKE_ANDROID_STL_TYPE=c++_shared \
-DCMAKE_ANDROID_ARCH_ABI=arm64-v8a \
-DCMAKE_ANDROID_NDK_TOOLCHAIN_VERSION=clang \
-DCMAKE_ANDROID_NDK=/home/david/android-ndk-r14 \
- Running the test
adb push bug_test /data/local/tmp
adb push libbug_test_lib.so /data/local/tmp
adb shell
cd /data/local/tmp
export LD_LIBRARY_PATH=.
./bug_test
alloc: 87
free: 75567
alloc: 17
free: 75522
alloc: 16
free: 75419
alloc: 15
free: 75915
alloc: 16
free: 75743
alloc: 16
free: 75975
alloc: 17
free: 76090
alloc: 16
free: 75960
alloc: 15
free: 76065
alloc: 16
free: 76004
...
Environment Details
- NDK Version: tested in r14b2 and r13
- Build sytem: cmake 3.7.2
- Host OS: Ubuntu 16.04 x86-64
- Compiler: clang
- ABI: arm64-v8a
- STL: c++_shared
- NDK API level: 23
- Device API level: 23
Description
Recently, I wrote a bunch of gtest unit-test to instrument the image processing library we developed. The unit-test are a standard aarch64 elf64 executable, and the library is a standard dynamic shared library. I push both to the device and launch the tests through adb shell. I noticed many large inconsistencies in performance, and after lots of debugging, I narrowed it down to memory management. Turns out that calling libc's free() on large blocks takes anything between 10 and 80 milliseconds.
Issue was tested and reproduced on Snapdragon 820 dev board and Samsung Galaxy S7. Both running release build of the OS.
Here is a distilled case that demonstrates the issue.
adb push bug_test /data/local/tmp
adb push libbug_test_lib.so /data/local/tmp
adb shell
cd /data/local/tmp
export LD_LIBRARY_PATH=.
./bug_test
alloc: 87
free: 75567
alloc: 17
free: 75522
alloc: 16
free: 75419
alloc: 15
free: 75915
alloc: 16
free: 75743
alloc: 16
free: 75975
alloc: 17
free: 76090
alloc: 16
free: 75960
alloc: 15
free: 76065
alloc: 16
free: 76004
...
Environment Details