Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c++多线程预测给压力之后,输入越界,出core,报warning。 #18337

Closed
kangzhangqi opened this issue Jun 26, 2019 · 3 comments
Closed
Labels
预测 原名Inference,包含Capi预测问题等

Comments

@kangzhangqi
Copy link

kangzhangqi commented Jun 26, 2019

  • 标题:c++多线程预测给压力之后,输入越界,出core,报warning。
    和输入内容无关,只有给压力测试之后会出core。
  • 版本、环境信息:
       1)PaddlePaddle版本:编译版本,paddle_prebuilt_cpu-1-0-0-2_PD_BL
       2)CPU:预测若用CPU,请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库使用情况
    Intel Xeon E312xx (Sandy Bridge)
    使用库文件:libmkldnn.so.0、libpaddle_fluid.so、libmklml_intel.so、libiomp5.so
       3)GPU:预测若用GPU,请提供GPU型号、CUDA和CUDNN版本号

       4)系统环境:请您描述系统类型、版本(如Mac OS 10.14),Python版本
    CentOS release 4.3 (Final)

-预测信息
   1)C++预测:请您提供预测库安装包的版本信息,及其中的version.txt文件
GIT COMMIT ID: 3df4cbf
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: OFF
   2)CMake包含路径的完整命令
暂无
   3)API信息(如调用请提供)
每个线程克隆一份预测器
_predictor = std::move(_s_main_predictor->Clone());
image

   4)预测库来源:官网下载/特殊环境(如BCLOUD编译)
BCLOUD编译

  • 复现信息:如为报错,请给出复现环境、复现步骤
    单线程不出core、多线程压力测试出core
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段

core详情:
截图:
image

Program terminated with signal SIGABRT, Aborted.
#0 0x00007f24be8ac3f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
(gdb) bt
#0 0x00007f24be8ac3f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1 0x00007f24be8ad7d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2 0x00007f24be8ea554 in libc_message () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#3 0x00007f24be8efdbe in malloc_printerr () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#4 0x00007f24be8f21b9 in int_malloc () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#5 0x00007f24be8f3400 in malloc () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#6 0x00007f24bf19b4dd in operator new (sz=8) at ../../../../libstdc++-v3/libsupc++/new_op.cc:51
#7 0x00007f24c01bd524 in std::vector<int, std::allocator >::vector(std::initializer_list, std::allocator const&) [clone .constprop.1484] () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#8 0x00007f24c01bf97b in paddle::operators::math::ContextProjectFunctor<paddle::platform::CPUDeviceContext, float>::operator()(paddle::platform::CPUDeviceContext const&, paddle::framework::LoDTensor const&, paddle::framework::Tensor const*, bool, int, int, int, int, int, paddle::framework::Tensor*) () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#9 0x00007f24c01c25d2 in paddle::operators::SequenceConvKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#10 0x00007f24c01c29b3 in std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::SequenceConvKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::SequenceConvKernel<paddle::platform::CPUDeviceContext, double> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#11 0x00007f24c0a378a6 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#12 0x00007f24c0a383a4 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#13 0x00007f24c0a3657b in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#14 0x00007f24bfd931fe in paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) () from /home/map/qu/bin/../lib/libpaddle_fluid.so
#15 0x00007f24bfc2d14a in paddle::NativePaddlePredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >*, int) ()
from /home/map/qu/bin/../lib/libpaddle_fluid.so
#16 0x00000000004d1a98 in map_search_qu::QueryCategoryModel::cnn_query (this=0x7f228965c8d0, query=...,
sout=0x7f22887f37e0, result=...)
at baidu/mapsearch/qu/code/rd/src/resource/resource_implement/query_category_model.cpp:281
#17 0x00000000004981d0 in map_search_qu::SlotsFillingProcessor::process (this=0x33990780, request=0x7f22d4086a00,
response=0x7f22d453be20, thread_data=0x7f2289661d40)
at baidu/mapsearch/qu/code/rd/src/processors/slots_filling_processor.cpp:70
#18 0x000000000047b854 in map_search_qu::QuServiceImpl::get_response (this=0x7fff503c6d70, request=0x7f22d4086a00,
response=0x7f22d453be20, cnt=0x7f22d42d1080) at baidu/mapsearch/qu/code/rd/src/framework/qu_service_impl.cpp:196
#19 0x000000000047aeba in map_search_qu::QuServiceImpl::query (this=0x7fff503c6d70, controller=0x7f22d42d1080,
request=0x7f22d4086a00, response=0x7f22d453be20, done=0x7f22d41ee920)
at baidu/mapsearch/qu/code/rd/src/framework/qu_service_impl.cpp:94
#20 0x00000000005269c0 in map_search_qu::QuService::CallMethod (this=0x7fff503c6d70, method=0x375f8f70,
controller=0x7f22d42d1080, request=0x7f22d4086a00, response=0x7f22d453be20, done=0x7f22d41ee920)
at bc_out/baidu/mapsearch/maplib/mapproto/qu_service.pb.cc:6225
#21 0x00000000006d8f2c in baidu::rpc::policy::ProcessRpcRequest (msg_base=0x7f22fc02da20)
at baidu/base/baidu-rpc/src/baidu/rpc/policy/baidu_rpc_protocol.cpp:522
#22 0x000000000074e5ea in baidu::rpc::ProcessInputMessage (void_arg=)
at baidu/base/baidu-rpc/src/baidu/rpc/input_messenger.cpp:134
#23 0x000000000080ccda in bthread::TaskGroup::task_runner (skip_remained=)
at baidu/base/bthread/bthread/task_group.cpp:293
#24 0x0000000000804181 in bthread_make_fcontext ()
Backtrace stopped: Cannot access memory at address 0x7f218d91b000

warning输出:以下两个warning都出现
waning1:
截图:
image
An error occurred while querying, log_id=149241266, the exception is Invoke operator sequence_conv error.
Python Callstacks:
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1654, in append_op
attrs=kwargs.get("attrs", None))
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, kwargs)
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 1773, in sequence_conv
'contextLength': filter_size
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/nets.py", line 299, in sequence_conv_pool
act=act)
File "/home/slurm/job/tmp/job-42764/models/map_emb/nets.py", line 61, in cnn_net
pool_type="max")
File "/home/slurm/job/tmp/job-42764/run/../models/map_emb/train.py", line 36, in train
cost, acc, prediction = network(data, label, len(word_dict))
File "/home/slurm/job/tmp/job-42764/run/../models/map_emb/train.py", line 119, in train_net
pass_num=30)
File "/home/slurm/job/tmp/job-42764/run/../models/map_emb/train.py", line 125, in
train_net()
C++ Callstacks:
Enforce failed. Expected end_idx <= dims_[0], but received end_idx:6 > dims_[0]:5.
The end row index is out of bound. at [/paddle/paddle/fluid/framework/tensor.cc:77]
PaddlePaddle Call Stacks:
0 0x7f24bfc28dfap void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const
, int) + 1658
1 0x7f24bfc2a5dap paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const
, int) + 90
2 0x7f24bfd62533p paddle::framework::Tensor::Slice(long, long) const + 4035
3 0x7f24c01bfafep padd

warning2:
image
An error occurred while querying, log_id=3593950142, the exception is Invoke operator feed error.
Python Callstacks:
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1725, in prepend_op
attrs=kwargs.get("attrs", None))
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/io.py", line 845, in prepend_feed_ops
attrs={'col': i})
File "/home/slurm/job/tmp/job-42764/data/tools/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/io.py", line 1000, in save_inference_model
prepend_feed_ops(main_program, feeded_var_names)
File "/home/slurm/job/tmp/job-42764/run/../models/map_emb/train.py", line 94, in train
fluid.io.save_inference_model(epoch_model, ["words"], [prediction], exe, model_filename='model', params_filename='params')
File "/home/slurm/job/tmp/job-42764/run/../models/map_emb/train.py", line 119, in train_net
pass_num=30)
File "/home/slurm/job/tmp/job-42764/run/../models/map_emb/train.py", line 125, in
train_net()
C++ Callstacks:
Variable must hold some thing at [/paddle/paddle/fluid/framework/variable.h:33]
PaddlePaddle Call Stacks:
0 0x7f24bfc24c93p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 563
1 0x7f24bfc253f9p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) + 137
2 0x7f24bfdabac6p paddle::operators::FeedOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d

@NHZlX
Copy link
Contributor

NHZlX commented Jun 26, 2019

预测代码贴一下吧

@NHZlX NHZlX added the 预测 原名Inference,包含Capi预测问题等 label Jun 26, 2019
@kangzhangqi
Copy link
Author

kangzhangqi commented Jul 4, 2019

原因疑似为paddle与brpc的不兼容问题。我把paddle的predictor克隆到了brpc每个bthread的线程变量中,用bthread_getspecific和bthread_setspecific设置,不清楚bthread的运行机制;可能这样也是线程不安全的。目前通过自己写pthread的pthread_mutex_t对每个predictor一对一加锁绕开该问题。不知道是否有baidu-rpc的bthread与paddle完美配合使用的案例?

@wopeizl
Copy link
Contributor

wopeizl commented Jul 4, 2019

bthread我的理解来说是做到一个用户层面的函数调度机制,借用了一些thread的概念,实际上两者是不对等的。paddle只要做到线程隔离就可以,用bthread按说不会有什么问题,但是一个线程只要一个predictor就好,也就是说bthread不需要去clone一个predictor,直接用那个predictor就好。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
预测 原名Inference,包含Capi预测问题等
Projects
None yet
Development

No branches or pull requests

3 participants