Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solve mklml memory leak #23557

Merged
merged 4 commits into from
Apr 10, 2020
Merged

solve mklml memory leak #23557

merged 4 commits into from
Apr 10, 2020

Conversation

luotao1
Copy link
Contributor

@luotao1 luotao1 commented Apr 7, 2020

fix #22827

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 9, 2020

在windows上test_mkldnn_conv_relu_fuse_pass单测出现MKL_Free_Buffers not found.,但这个单测在Linux CI上能正常运行通过。

[00:25:41]	[Step 5/8] 694/728 Test #705: test_mkldnn_conv_relu_fuse_pass ......................................***Failed    2.39 sec
[00:25:41]	[Step 5/8] I0409 08:25:40.036875 46868 analysis_predictor.cc:136] Profiler is deactivated, and no profiling report will be generated.
[00:25:41]	[Step 5/8] I0409 08:25:40.037909 46868 analysis_predictor.cc:865] MODEL VERSION: 0.0.0
[00:25:41]	[Step 5/8] I0409 08:25:40.037909 46868 analysis_predictor.cc:867] PREDICTOR VERSION: 0.0.0
[00:25:41]	[Step 5/8] W0409 08:25:40.038911 46868 analysis_predictor.cc:880]  - Version incompatible (2) conv2d
[00:25:41]	[Step 5/8] W0409 08:25:40.038911 46868 analysis_predictor.cc:880]  - Version incompatible (1) feed
[00:25:41]	[Step 5/8] W0409 08:25:40.038911 46868 analysis_predictor.cc:880]  - Version incompatible (1) fetch
[00:25:41]	[Step 5/8] W0409 08:25:40.038911 46868 analysis_predictor.cc:880]  - Version incompatible (1) relu
[00:25:41]	[Step 5/8] W0409 08:25:40.038938 46868 analysis_predictor.cc:880]  - Version incompatible (1) scale
[00:25:41]	[Step 5/8] W0409 08:25:40.038946 46868 analysis_predictor.cc:192] WARNING: Results may be DIFF! Please use the corresponding version of the model and prediction library, and do not use the develop branch.
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [attention_lstm_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [seqconv_eltadd_relu_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [seqpool_cvm_concat_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [fc_lstm_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [mul_lstm_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [fc_gru_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [mul_gru_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [seq_concat_fc_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [fc_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [repeated_fc_relu_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [squared_mat_sub_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [conv_transpose_bn_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [is_test_pass]e[0m
[00:25:41]	[Step 5/8] e[32m--- Running IR pass [runtime_context_cache_pass]e[0m
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
[00:25:41]	[Step 5/8] e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
[00:25:41]	[Step 5/8] test_mkldnn_conv_relu_fuse_pass failed
[00:25:41]	[Step 5/8]  E
[00:25:41]	[Step 5/8] ======================================================================
[00:25:41]	[Step 5/8] ERROR: test_check_output (test_mkldnn_conv_relu_fuse_pass.ConvBnFusePassMKLDNNTest)
[00:25:41]	[Step 5/8] ----------------------------------------------------------------------
[00:25:41]	[Step 5/8] Traceback (most recent call last):
[00:25:41]	[Step 5/8]   File "C:\home\BuildAgent\work\45396e78faf70386\build\python\paddle\fluid\tests\unittests\ir\inference\test_mkldnn_conv_relu_fuse_pass.py", line 41, in test_check_output
[00:25:41]	[Step 5/8]     self.check_output_with_option(use_gpu)
[00:25:41]	[Step 5/8]   File "C:\home\BuildAgent\work\45396e78faf70386\build\python\paddle\fluid\tests\unittests\ir\inference\inference_pass_test.py", line 146, in check_output_with_option
[00:25:41]	[Step 5/8]     self._get_analysis_config(use_gpu=use_gpu))
[00:25:41]	[Step 5/8]   File "C:\home\BuildAgent\work\45396e78faf70386\build\python\paddle\fluid\tests\unittests\ir\inference\inference_pass_test.py", line 83, in _get_analysis_outputs
[00:25:41]	[Step 5/8]     predictor.zero_copy_run()
[00:25:41]	[Step 5/8] RuntimeError: MKL_Free_Buffers not found.
[00:25:41]	[Step 5/8] 
[00:25:41]	[Step 5/8] ----------------------------------------------------------------------

下载 https://paddlepaddledeps.bj.bcebos.com/mklml_win_2019.0.1.20181227.zip windows安装包后,nm mklml.lib,其中只有MKL_free,没有MKL_Free_Buffers

$ nm mklml.lib | grep MKL
00000000 T MKL_Comatcopy
00000000 T __imp_MKL_Comatcopy
00000000 T MKL_Domain_Get_Max_Threads
00000000 T __imp_MKL_Domain_Get_Max_Threads
00000000 T MKL_Domain_Set_Num_Threads
00000000 T __imp_MKL_Domain_Set_Num_Threads
00000000 T MKL_Domatcopy
00000000 T __imp_MKL_Domatcopy
00000000 T MKL_Get_Dynamic
00000000 T __imp_MKL_Get_Dynamic
00000000 T MKL_Get_Max_Threads
00000000 T __imp_MKL_Get_Max_Threads
00000000 T MKL_Get_Version
00000000 T __imp_MKL_Get_Version
00000000 T MKL_Get_Version_String
00000000 T __imp_MKL_Get_Version_String
00000000 T MKL_Set_Dynamic
00000000 T __imp_MKL_Set_Dynamic
00000000 T MKL_Set_Num_Threads
00000000 T __imp_MKL_Set_Num_Threads
00000000 T MKL_Set_Num_Threads_Local
00000000 T __imp_MKL_Set_Num_Threads_Local
00000000 T MKL_Somatcopy
00000000 T __imp_MKL_Somatcopy
00000000 T MKL_Zomatcopy
00000000 T __imp_MKL_Zomatcopy
00000000 T MKL_free
00000000 T __imp_MKL_free
00000000 T MKL_malloc
00000000 T __imp_MKL_malloc

@yinghu5 请问windows下是要用MKL_free来代替么?

@luotao1
Copy link
Contributor Author

luotao1 commented Apr 9, 2020

已和 @yinghu5 确认目前repo中集成的windows版本mklml.lib中没有打包进MKL_Free_Buffers函数
image

这个PR会暂时不修windows上的内存泄漏,后续通过升级mklml版本来解决。

@jiweibo
Copy link
Contributor

jiweibo commented Apr 9, 2020

已在linux平台,用户模型下测试,合入此pr后修复了由于mkl导致的内存泄露问题。

@luotao1 luotao1 merged commit e4f1b1c into PaddlePaddle:develop Apr 10, 2020
@luotao1 luotao1 deleted the mkl_buffer branch April 10, 2020 02:59
@linzhec
Copy link

linzhec commented Oct 28, 2023

windows版本的内存泄漏解决了吗,测试发现当前版本仍然存在

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dev] mkl多线程下存在内存泄露问题
4 participants