Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++预测默认开启MKLDNN内存泄露,重新初始化失败 #701

Closed
zhouyongxyz opened this issue Sep 10, 2020 · 2 comments
Closed

C++预测默认开启MKLDNN内存泄露,重新初始化失败 #701

zhouyongxyz opened this issue Sep 10, 2020 · 2 comments

Comments

@zhouyongxyz
Copy link

开启MKLDNN 内存一直在涨,按照提示的方案重置对象,会报错
Compile Traceback (most recent call last):
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))

File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)

File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 1405, in conv2d
"data_format": data_format,

File "/paddle/PaddlePaddle/PaddleOCR/ppocr/modeling/backbones/det_resnet_vd.py", line 138, in conv_bn_layer
bias_attr=False)

File "/paddle/PaddlePaddle/PaddleOCR/ppocr/modeling/backbones/det_resnet_vd.py", line 208, in bottleneck_block
name=name + "_branch2a")

File "/paddle/PaddlePaddle/PaddleOCR/ppocr/modeling/backbones/det_resnet_vd.py", line 106, in __call__
name=conv_name)

File "/paddle/PaddlePaddle/PaddleOCR/ppocr/modeling/architectures/det_model.py", line 111, in __call__
conv_feas = self.backbone(image)

File "/paddle/PaddlePaddle/PaddleOCR/tools/program.py", line 193, in build_export
image, outputs = model(mode='export')

File "tools/export_model.py", line 67, in main
config, eval_program, startup_prog)

File "tools/export_model.py", line 93, in <module>
main()

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >, int)
1 paddle::framework::NaiveExecutor::Run()
2 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
3 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext
) const
5 paddle::framework::OperatorWithKernel::ChooseKernel(paddle::framework::RuntimeContext const&, paddle::framework::Scope const&, paddle::platform::Place const&) const
6 paddle::operators::ConvOp::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const
7 paddle::framework::OperatorWithKernel::IndicateVarDataType(paddle::framework::ExecutionContext const&, std::string const&) const
8 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
9 paddle::platform::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

Error: The Input Variable(Input) of conv2d Op used to determine kernel data type is empty or not LoDTensor or SelectedRows or LoDTensorArray.
[Hint: Expected data_type != dafault_data_type, but received data_type:-1 == dafault_data_type:-1.] (at /home/zhou/ai/ocr/Paddle/paddle/fluid/framework/operator.cc:1385)
[operator < conv2d > error]

@zhouyongxyz zhouyongxyz changed the title C++预测默认开启MKLDNN内存泄露 C++预测默认开启MKLDNN内存泄露,重新初始化失败 Sep 10, 2020
@zhouyongxyz
Copy link
Author

det = new DBDetector(
config.det_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem,
config.cpu_math_library_num_threads, config.use_mkldnn,
config.use_zero_copy_run, config.max_side_len, config.det_db_thresh,
config.det_db_box_thresh, config.det_db_unclip_ratio, config.visualize);

rec = new CRNNRecognizer(config.rec_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.use_zero_copy_run,
config.char_list_file);

renint detector
--- fused 0 scale with matmul
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns with transpose's xshape
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns with reshape's xshape
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns with reshape's xshape with transpose's xshape
--- Fused 0 MatmulTransposeReshape patterns
--- fused 0 scale with matmul
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns with transpose's xshape
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns with reshape's xshape
--- Fused 0 ReshapeTransposeMatmulMkldnn patterns with reshape's xshape with transpose's xshape
--- Fused 0 MatmulTransposeReshape patterns
ocr start ...
Segmentation fault (core dumped)

经常报段错误。直接就挂了

@zhouyongxyz
Copy link
Author

目前问题已经解决,解决方案,就是增加设置 config.SetMkldnnCacheCapacity(10); 设置缓存数量。默认是0不限制,内存会一直增长。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant