Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Solve mklml memory leakage #23314

Closed
wants to merge 5 commits into from
Closed

[WIP] Solve mklml memory leakage #23314

wants to merge 5 commits into from

Conversation

luotao1
Copy link
Contributor

@luotao1 luotao1 commented Mar 30, 2020

Solve memory leakage by removing one intel libiomp5 library and keeping gnu libgomp5.
fix #22827

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 30, 2020

@yinghu5 根据 #22827 (comment)9da98cdlibiomp5去掉,同时用libmklml_gnu.so代替libmklml_intel.so。但依然存在内存泄漏问题。

  • 使用pmap命令看,可以发现没有链接libmklml_intel.solibiomp5,同时正确链接了libmklml_gnu.so。但还是有内存泄漏,且泄漏曲线和Issue中的图类似,隔一段时间会阶段性跳跃。
$ pmap -x 4091
4091:   ./build/mul_demo --model_dir=mul_model --thread_num=3 --num=-1
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000     376     268       0 r-x--  mul_demo
000000000065e000       8       8       8 rw---  mul_demo
0000000000660000      64      16      16 rw---    [ anon ]
0000000000d71000    3684    3564    3564 rw---    [ anon ]
00007ff1f0bfe000  512008    3740    3740 rw---    [ anon ]
00007ff210000000    1224     336     336 rw---    [ anon ]
00007ff210132000   64312       0       0 -----    [ anon ]
00007ff214000000    1928    1640    1640 rw---    [ anon ]
00007ff2141e2000   63608       0       0 -----    [ anon ]
00007ff218000000    1352    1064    1064 rw---    [ anon ]
00007ff218152000   64184       0       0 -----    [ anon ]
00007ff21c3e4000       4       0       0 -----    [ anon ]
00007ff21c3e5000   10240      20      20 rw---    [ anon ]
00007ff21cde5000       4       0       0 -----    [ anon ]
00007ff21cde6000   10240      24      24 rw---    [ anon ]
00007ff21d7e6000       4       0       0 -----    [ anon ]
00007ff21d7e7000   10240      20      20 rw---    [ anon ]
00007ff21e1e7000      56      36       0 r-x--  libgomp.so.1.0.0
00007ff21e1f5000    2044       0       0 -----  libgomp.so.1.0.0
00007ff21e3f4000       4       4       4 rw---  libgomp.so.1.0.0
00007ff21e3f5000    1676     544       0 r-x--  libc-2.18.so
00007ff21e598000    2048       0       0 -----  libc-2.18.so
00007ff21e798000      16      16      16 r----  libc-2.18.so
00007ff21e79c000       8       8       8 rw---  libc-2.18.so
00007ff21e79e000      16      12      12 rw---    [ anon ]
00007ff21e7a2000      84      44       0 r-x--  libgcc_s.so.1
00007ff21e7b7000    2048       0       0 -----  libgcc_s.so.1
00007ff21e9b7000       4       4       4 rw---  libgcc_s.so.1
00007ff21e9b8000    1032      68       0 r-x--  libm-2.18.so
00007ff21eaba000    2044       0       0 -----  libm-2.18.so
00007ff21ecb9000       4       4       4 r----  libm-2.18.so
00007ff21ecba000       4       4       4 rw---  libm-2.18.so
00007ff21ecbb000     916     576       0 r-x--  libstdc++.so.6.0.19
00007ff21eda0000    2044       0       0 -----  libstdc++.so.6.0.19
00007ff21ef9f000      32      32      32 r----  libstdc++.so.6.0.19
00007ff21efa7000       8       8       8 rw---  libstdc++.so.6.0.19
00007ff21efa9000      84      16      16 rw---    [ anon ]
00007ff21efbe000     100      76       0 r-x--  libpthread-2.18.so
00007ff21efd7000    2044       0       0 -----  libpthread-2.18.so
00007ff21f1d6000       4       4       4 r----  libpthread-2.18.so
00007ff21f1d7000       4       4       4 rw---  libpthread-2.18.so
00007ff21f1d8000      16       4       4 rw---    [ anon ]
00007ff21f1dc000      12       8       0 r-x--  libdl-2.18.so
00007ff21f1df000    2044       0       0 -----  libdl-2.18.so
00007ff21f3de000       4       4       4 r----  libdl-2.18.so
00007ff21f3df000       4       4       4 rw---  libdl-2.18.so
00007ff21f3e0000      28      16       0 r-x--  librt-2.18.so
00007ff21f3e7000    2044       0       0 -----  librt-2.18.so
00007ff21f5e6000       4       4       4 r----  librt-2.18.so
00007ff21f5e7000       4       4       4 rw---  librt-2.18.so
00007ff21f5e8000      96      16       0 r-x--  libz.so.1.2.8
00007ff21f600000    2044       0       0 -----  libz.so.1.2.8
00007ff21f7ff000       4       4       4 rw---  libz.so.1.2.8
00007ff21f800000   23596    3196       0 r-x--  libdnnl.so.1
00007ff220f0b000    2048       0       0 -----  libdnnl.so.1
00007ff22110b000     336     336     336 r----  libdnnl.so.1
00007ff22115f000       4       4       4 rw---  libdnnl.so.1
00007ff221160000     196     192     192 rw---    [ anon ]
00007ff221191000  126264    2460       0 r-x--  libmklml_gnu.so
00007ff228cdf000    2044       0       0 -----  libmklml_gnu.so
00007ff228ede000     148     148     148 r----  libmklml_gnu.so
00007ff228f03000     684     336     336 rw---  libmklml_gnu.so
00007ff228fae000     144      96      96 rw---    [ anon ]
00007ff228fd2000   43484   24796       0 r-x--  libpaddle_fluid.so
00007ff22ba49000    2044       0       0 -----  libpaddle_fluid.so
00007ff22bc48000     676     676     676 rw---  libpaddle_fluid.so
00007ff22bcf1000     116      68      68 rw---    [ anon ]
00007ff22bd0e000     128     112       0 r-x--  ld-2.18.so
00007ff22be22000    1072    1068    1068 rw---    [ anon ]
00007ff22bf2e000       4       4       4 r----  ld-2.18.so
00007ff22bf2f000       4       4       4 rw---  ld-2.18.so
00007ff22bf30000       4       4       4 rw---    [ anon ]
00007ffea3327000     132      16      16 rw---    [ stack ]
00007ffea334b000       8       4       0 r-x--    [ anon ]
ffffffffff600000       4       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB          969248   45744   13524
  • Issue中的命令,如果用单线程跑,也会出现内存泄漏。单线程命令为
./build/mul_demo --model_dir=mul_model --thread_num=1 --num=-1

备注:mul.zipCMakeLists.txt需要将libiomp5去掉,同时用libmklml_gnu.so代替libmklml_intel.so

set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_gnu${CMAKE_SHARED_LIBRARY_SUFFIX})

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 30, 2020

使用pprof工具,类似 https://github.com/PaddlePaddle/Paddle/pull/18372/files#issuecomment-506920962

$ /home/luotao/gperftools/bin/pprof --text ./build/mul_demo /tmp/mul_demo.17265.test_foo-end.heap Using local file ./build/mul_demo.
Using local file /tmp/mul_demo.17265.test_foo-end.heap.
Total: 0.3 MB
     0.3 100.0% 100.0%      0.3 100.0% mm_account_ptr_by_tid..0
     0.0   0.0% 100.0%      0.3 100.0% SGEMM
     0.0   0.0% 100.0%      0.3 100.0% __clone
     0.0   0.0% 100.0%      0.3 100.0% cblas_sgemm
     0.0   0.0% 100.0%      0.3 100.0% execute_native_thread_routine
     0.0   0.0% 100.0%      0.2  74.6% mkl_blas_avx_sgemm_get_bufs
     0.0   0.0% 100.0%      0.2  74.6% mkl_blas_avx_xsgemm
     0.0   0.0% 100.0%      0.2  74.6% mkl_blas_avx_xsgemm_par
     0.0   0.0% 100.0%      0.3 100.0% mkl_blas_sgemm
     0.0   0.0% 100.0%      0.2  74.6% mkl_blas_sgemm_omp_driver_v1
     0.0   0.0% 100.0%      0.2  74.6% mkl_serv_allocate
     0.0   0.0% 100.0%      0.1  25.4% mkl_serv_get_num_stripes
     0.0   0.0% 100.0%      0.1  25.4% mkl_serv_malloc
     0.0   0.0% 100.0%      0.3 100.0% paddle::AnalysisPredictor::ZeroCopyRun
     0.0   0.0% 100.0%      0.3 100.0% paddle::framework::NaiveExecutor::Run
     0.0   0.0% 100.0%      0.3 100.0% paddle::framework::OperatorBase::Run
     0.0   0.0% 100.0%      0.3 100.0% paddle::framework::OperatorWithKernel::RunImpl@215c8a0
     0.0   0.0% 100.0%      0.3 100.0% paddle::framework::OperatorWithKernel::RunImpl@215d4e0
     0.0   0.0% 100.0%      0.3 100.0% paddle::operators::MulKernel::Compute
     0.0   0.0% 100.0%      0.3 100.0% paddle::operators::math::Blas::MatMul
     0.0   0.0% 100.0%      0.3 100.0% paddle::work_thread
     0.0   0.0% 100.0%      0.3 100.0% start_thread
     0.0   0.0% 100.0%      0.3 100.0% std::_Bind_simple::_M_invoke
     0.0   0.0% 100.0%      0.3 100.0% std::_Bind_simple::operator
     0.0   0.0% 100.0%      0.3 100.0% std::_Function_handler::_M_invoke
     0.0   0.0% 100.0%      0.3 100.0% std::thread::_Impl::_M_run

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 1, 2020

在PR_CI_Inference中碰到undefined reference to symbol 'GOMP_loop_ull_static_next@@GOMP_2.0'

[2020-03-30 23:19:20]  [ 50%] Building CXX object CMakeFiles/simple_on_word2vec.dir/simple_on_word2vec.cc.o
[2020-03-30 23:19:21]  [100%] Linking CXX executable simple_on_word2vec
[2020-03-30 23:19:22]  /usr/local/bin/ld: /workspace/Paddle/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.a(prior_box_op.cc.o): undefined reference to symbol 'GOMP_loop_ull_static_next@@GOMP_2.0'
[2020-03-30 23:19:22]  //usr/lib/x86_64-linux-gnu/libgomp.so.1: error adding symbols: DSO missing from command line
[2020-03-30 23:19:22]  collect2: error: ld returned 1 exit status
[2020-03-30 23:19:22]  CMakeFiles/simple_on_word2vec.dir/build.make:87: recipe for target 'simple_on_word2vec' failed

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 1, 2020

valgrind工具

日志

  • mkl日志
    ==22770== Warning: set address range perms: large range [0x12807080, 0x31c07080) (undefined)
    ==22770==
    ==22770== HEAP SUMMARY:
    ==22770==     in use at exit: 526,301,226 bytes in 41,846 blocks
    ==22770==   total heap usage: 117,353 allocs, 75,507 frees, 542,454,597 bytes allocated
    ==22770==
    ==22770== LEAK SUMMARY:
    ==22770==    definitely lost: 224,536 bytes in 1,292 blocks
    ==22770==    indirectly lost: 1,756,318 bytes in 39,762 blocks
    ==22770==      possibly lost: 0 bytes in 0 blocks
    ==22770==    still reachable: 524,320,372 bytes in 792 blocks
    ==22770==                       of which reachable via heuristic:
    ==22770==                         stdstring          : 3,251 bytes in 58 blocks
    ==22770==         suppressed: 0 bytes in 0 blocks
    ==22770== Rerun with --leak-check=full to see details of leaked memory
    ==22770==
    ==22770== Use --track-origins=yes to see where uninitialised values come from
    ==22770== For lists of detected and suppressed errors, rerun with: -s
    ==22770== ERROR SUMMARY: 104 errors from 6 contexts (suppressed: 0 from 0)
==5432== Warning: set address range perms: large range [0x2574b080, 0x44b4b080) (undefined)
==5432==
==5432== HEAP SUMMARY:
==5432==     in use at exit: 526,154,528 bytes in 38,240 blocks
==5432==   total heap usage: 109,017 allocs, 70,777 frees, 536,933,062 bytes allocated
==5432==
==5432== LEAK SUMMARY:
==5432==    definitely lost: 217,624 bytes in 1,270 blocks
==5432==    indirectly lost: 1,616,668 bytes in 36,180 blocks
==5432==      possibly lost: 0 bytes in 0 blocks
==5432==    still reachable: 524,320,236 bytes in 790 blocks
==5432==                       of which reachable via heuristic:
==5432==                         stdstring          : 3,251 bytes in 58 blocks
==5432==         suppressed: 0 bytes in 0 blocks
==5432== Rerun with --leak-check=full to see details of leaked memory
==5432==
==5432== Use --track-origins=yes to see where uninitialised values come from
==5432== For lists of detected and suppressed errors, rerun with: -s
==5432== ERROR SUMMARY: 32 errors from 6 contexts (suppressed: 0 from 0)

两者都有泄漏,但openblas泄漏的少

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 1, 2020

根据上一条的日志,使用valgrind -s ./build/mul_demo命令,修复未初始化的两个变量后,

    diff --git a/paddle/fluid/inference/api/analysis_config.cc b/paddle/fluid/inference/api/analysis_config.cc
    index 5ee2a54f43..af78051c00 100644
    --- a/paddle/fluid/inference/api/analysis_config.cc
    +++ b/paddle/fluid/inference/api/analysis_config.cc
    @@ -381,8 +381,8 @@ std::string AnalysisConfig::SerializeInfoCache() {
       ss << memory_pool_init_size_mb_;
       ss << use_tensorrt_;
    -  ss << tensorrt_workspace_size_;
    -  ss << tensorrt_max_batchsize_;
    +  // ss << tensorrt_workspace_size_;
    +  // ss << tensorrt_max_batchsize_;
       ss << tensorrt_min_subgraph_size_;
       ss << enable_memory_optim_;

能看到9da98cd 确实能修复iomp5泄漏的问题。修复前的日志如下:

    ==2799== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
    ==2799==
    ==2799== 1 errors in context 1 of 1:
    ==2799== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
    ==2799==    at 0x1092DFC9: syscall (in /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/lib64/libc-2.18.so)
    ==2799==    by 0x10CE6E97: __kmp_affinity_determine_capable (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/third_party/install/mklml/lib/libiomp5.so)
    ==2799==    by 0x10CC2867: __kmp_env_initialize(char const*) (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/third_party/install/mklml/lib/libiomp5.so)
    ==2799==    by 0x10CABA34: _INTERNAL_25_______src_kmp_runtime_cpp_d89aedeb::__kmp_do_serial_initialize() (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/third_party/install/mklml/lib/libiomp5.so)
    ==2799==    by 0x10C9F27F: __kmp_get_global_thread_id_reg (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/third_party/install/mklml/lib/libiomp5.so)
    ==2799==    by 0x10C8C369: omp_set_num_threads@@VERSION (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/third_party/install/mklml/lib/libiomp5.so)
    ==2799==    by 0x5A2A12E: paddle::platform::SetNumThreads(int) (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
    ==2799==    by 0x59F1FF8: paddle::AnalysisPredictor::Init(std::shared_ptr<paddle::framework::Scope> const&, std::shared_ptr<paddle::framework::ProgramDesc> const&) (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
    ==2799==    by 0x59F2515: std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
    ==2799==    by 0x59F3130: std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(paddle::AnalysisConfig const&) (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
    ==2799==    by 0x42289C: main (in /home/luotao/test/mul_weibo/build/mul_demo)
    ==2799==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
    ==2799==
    ==2799== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 1, 2020

使用valgrind --leak-check=full -s ./build/mul_demo将日志打印后,发现和#15339 以及 #18382 的错误一致,都是OpInfoFiller出错。

==3298== 10,212 (32 direct, 10,180 indirect) bytes in 1 blocks are definitely lost in loss record 36,409 of 36,410
==3298==    at 0x4C28583: operator new(unsigned long) (vg_replace_malloc.c:344)
==3298==    by 0x63095DE: paddle::framework::details::OpInfoFiller<paddle::operators::reader::CreateDoubleBufferReaderOpMaker, (paddle::framework::details::OpInfoFillType)1>::operator()(char const*, paddle::framework::OpInfo*) const (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
==3298==    by 0x630A273: paddle::framework::OperatorRegistrar<paddle::operators::reader::CreateDoubleBufferReaderOp, paddle::operators::reader::CreateDoubleBufferReaderOpMaker, paddle::operators::reader::DecoratedReaderInferShape, paddle::framework::EmptyGradOpMaker<paddle::framework::OpDesc>, paddle::framework::EmptyGradOpMaker<paddle::imperative::OpBase>, paddle::operators::reader::DecoratedReaderInferVarType>::OperatorRegistrar(char const*) (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
==3298==    by 0x70CAF35: ??? (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
==3298==    by 0x594B03E: ??? (in /home/luotao/Paddle/cpu_build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so)
==3298==    by 0x1FFEFFFD6F: ???
==3298==    by 0x400E7CC: call_init.part.0 (in /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/lib64/ld-2.18.so)
==3298==    by 0x400E8F2: _dl_init (in /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/lib64/ld-2.18.so)
==3298==    by 0x40011C9: ??? (in /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/lib64/ld-2.18.so)
==3298==    by 0x1: ???
==3298==    by 0x1FFF000016: ???
==3298==    by 0x1FFF000027: ???

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 2, 2020

@yinghu5 最新发现,使用develop分支,直接在mul_demo.cc里加入mkl_free_buffers()进行释放,即可解决内存泄漏问题。

@luotao1 luotao1 mentioned this pull request Apr 7, 2020
@luotao1 luotao1 closed this Apr 10, 2020
@luotao1 luotao1 deleted the iomp branch May 28, 2021 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dev] mkl多线程下存在内存泄露问题
2 participants