Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKLDNN 预测多线程创建多实例出现Segmentation fault #731

Closed
zhouyongxyz opened this issue Sep 16, 2020 · 5 comments
Closed

MKLDNN 预测多线程创建多实例出现Segmentation fault #731

zhouyongxyz opened this issue Sep 16, 2020 · 5 comments

Comments

@zhouyongxyz
Copy link

通过 #701 设置mkldnn 缓存数来解决内存泄露问题。我发现一个新的问题,就是在多线程情况下创建多个实例进行推理会出现段错误。
Thread 2 (Thread 0x7fffc81ec700 (LWP 7311)):
#0 0x00007ffff5b9266c in dnnl::primitive_desc_base::query_md(dnnl::query, int) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#1 0x00007ffff64f3d2c in paddle::operators::ConvMKLDNNOpKernel<float, float>::ComputeFP32(paddle::framework::ExecutionContext const&) const ()
from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#2 0x00007ffff64f4eee in paddle::operators::ConvMKLDNNOpKernel<float, float>::Compute(paddle::framework::ExecutionContext const&) const ()
from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#3 0x00007ffff64f513f in std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::ConvMKLDNNOpKernel<float, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#4 0x00007ffff7345bd0 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const ()
from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#5 0x00007ffff73468ee in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const ()
from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#6 0x00007ffff733d4d6 in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) ()
from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#7 0x00007ffff5917c41 in paddle::framework::NaiveExecutor::Run() () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#8 0x00007ffff56bcf4c in paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >*, int) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so
#9 0x00005555555e1859 in PaddleOCR::CRNNRecognizer::Run (this=0x555556f13680, boxes=std::vector of length 11, capacity 11 = {...}, img=...)

目前初步估计可能和mkldnn 本身多线程有关系,内存回收的时候出现了错误。多进程就没有该问题。

@WenmuZhou
Copy link
Collaborator

@littletomatodonkey 大佬解决一下

@littletomatodonkey
Copy link
Collaborator

你试下多进程方案呢?

@littletomatodonkey
Copy link
Collaborator

关于mkldnn的问题,在Paddle2.0正式版会修复,目前可以使用2.0rc的预测库规避部分问题

@marsbzp
Copy link

marsbzp commented Jun 27, 2022

通过 #701 设置mkldnn 缓存数来解决内存泄露问题。我发现一个新的问题,就是在多线程情况下创建多个实例进行推理会出现段错误。 Thread 2 (Thread 0x7fffc81ec700 (LWP 7311)): #0 0x00007ffff5b9266c in dnnl::primitive_desc_base::query_md(dnnl::query, int) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #1 0x00007ffff64f3d2c in paddle::operators::ConvMKLDNNOpKernel<float, float>::ComputeFP32(paddle::framework::ExecutionContext const&) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #2 0x00007ffff64f4eee in paddle::operators::ConvMKLDNNOpKernel<float, float>::Compute(paddle::framework::ExecutionContext const&) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #3 0x00007ffff64f513f in std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::ConvMKLDNNOpKernel<float, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #4 0x00007ffff7345bd0 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #5 0x00007ffff73468ee in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #6 0x00007ffff733d4d6 in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #7 0x00007ffff5917c41 in paddle::framework::NaiveExecutor::Run() () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #8 0x00007ffff56bcf4c in paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >*, int) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #9 0x00005555555e1859 in PaddleOCR::CRNNRecognizer::Run (this=0x555556f13680, boxes=std::vector of length 11, capacity 11 = {...}, img=...)

目前初步估计可能和mkldnn 本身多线程有关系,内存回收的时候出现了错误。多进程就没有该问题。

这个问题还是存在的,这个怎么解决呢

1 similar comment
@marsbzp
Copy link

marsbzp commented Jun 27, 2022

通过 #701 设置mkldnn 缓存数来解决内存泄露问题。我发现一个新的问题,就是在多线程情况下创建多个实例进行推理会出现段错误。 Thread 2 (Thread 0x7fffc81ec700 (LWP 7311)): #0 0x00007ffff5b9266c in dnnl::primitive_desc_base::query_md(dnnl::query, int) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #1 0x00007ffff64f3d2c in paddle::operators::ConvMKLDNNOpKernel<float, float>::ComputeFP32(paddle::framework::ExecutionContext const&) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #2 0x00007ffff64f4eee in paddle::operators::ConvMKLDNNOpKernel<float, float>::Compute(paddle::framework::ExecutionContext const&) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #3 0x00007ffff64f513f in std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::ConvMKLDNNOpKernel<float, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #4 0x00007ffff7345bd0 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #5 0x00007ffff73468ee in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #6 0x00007ffff733d4d6 in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #7 0x00007ffff5917c41 in paddle::framework::NaiveExecutor::Run() () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #8 0x00007ffff56bcf4c in paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >*, int) () from /home/haifan/haifan/zhouyong/ocr/Paddle_1.8.4/build/fluid_inference_install_dir/paddle/lib/libpaddle_fluid.so #9 0x00005555555e1859 in PaddleOCR::CRNNRecognizer::Run (this=0x555556f13680, boxes=std::vector of length 11, capacity 11 = {...}, img=...)

目前初步估计可能和mkldnn 本身多线程有关系,内存回收的时候出现了错误。多进程就没有该问题。

这个问题还是存在的,这个怎么解决呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants