Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluid分布式pserver出现异常 padddle cloud v0.15 #13796

Closed
tal2009 opened this issue Oct 10, 2018 · 5 comments
Closed

fluid分布式pserver出现异常 padddle cloud v0.15 #13796

tal2009 opened this issue Oct 10, 2018 · 5 comments
Assignees
Labels
User 用于标记用户问题

Comments

@tal2009
Copy link

tal2009 commented Oct 10, 2018

使用内部平台 paddlecloud 平台版本 v0.15
pserv报错

Tue Oct  9 17:56:15 2018[1,2]<stdout>:E1009 17:56:15.758672 39389 listen_and_serv_op.cc:69] run sub program error holder_ should not be null
Tue Oct  9 17:56:15 2018[1,2]<stdout>:Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:139]
Tue Oct  9 17:56:15 2018[1,2]<stdout>:PaddlePaddle Call Stacks: 
Tue Oct  9 17:56:15 2018[1,2]<stdout>:0       0x7fc7ba070da6p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
Tue Oct  9 17:56:15 2018[1,2]<stdout>:1       0x7fc7ba073006p paddle::framework::Tensor::type() const + 150
Tue Oct  9 17:56:15 2018[1,2]<stdout>:2       0x7fc7baa1c6a5p paddle::framework::OperatorWithKernel::IndicateDataType(paddle::framework::ExecutionContext const&) const + 149
Tue Oct  9 17:56:15 2018[1,2]<stdout>:3       0x7fc7baa1ca7fp paddle::framework::OperatorWithKernel::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const + 47
Tue Oct  9 17:56:15 2018[1,2]<stdout>:4       0x7fc7baa1cf67p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, pad
dle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost
::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d
etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::deta
il::variant::void_> const&) const + 199
Tue Oct  9 17:56:15 2018[1,2]<stdout>:5       0x7fc7baa194ffp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platf
orm::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::
variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::var
iant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian
t::void_> const&) + 255
Tue Oct  9 17:56:15 2018[1,2]<stdout>:6       0x7fc7ba12eaa9p paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bo
ol, bool) + 393
Tue Oct  9 17:56:15 2018[1,2]<stdout>:7       0x7fc7ba877b62p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:8       0x7fc7ba7955cap std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::_
_future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > >, std::__futu
re_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > > >::_M_invoke(std::_Any_data const&) + 42
Tue Oct  9 17:56:15 2018[1,2]<stdout>:9       0x7fc7ba1a2577p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::
_Result_base::_Deleter> ()>&, bool&) + 39
Tue Oct  9 17:56:15 2018[1,2]<stdout>:10      0x7fc8630c8973p pthread_once + 83
Tue Oct  9 17:56:15 2018[1,2]<stdout>:11      0x7fc7ba876fb2p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:12      0x7fc7baa2ff18p paddle::framework::ThreadPool::TaskLoop() + 920
Tue Oct  9 17:56:15 2018[1,2]<stdout>:13      0x7fc7c5afc8a0p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:14      0x7fc8630c31c3p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:15      0x7fc8626eb12dp clone + 109
Tue Oct  9 17:56:15 2018[1,2]<stdout>:E1009 17:56:15.807693 39335 listen_and_serv_op.cc:69] run sub program error holder_ should not be null
Tue Oct  9 17:56:15 2018[1,2]<stdout>:Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:139]
Tue Oct  9 17:56:15 2018[1,2]<stdout>:PaddlePaddle Call Stacks:
Tue Oct  9 17:56:15 2018[1,2]<stdout>:0       0x7fc7ba070da6p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
Tue Oct  9 17:56:15 2018[1,2]<stdout>:1       0x7fc7ba073006p paddle::framework::Tensor::type() const + 150
Tue Oct  9 17:56:15 2018[1,2]<stdout>:2       0x7fc7baa1c6a5p paddle::framework::OperatorWithKernel::IndicateDataType(paddle::framework::ExecutionContext const&) const + 149
Tue Oct  9 17:56:15 2018[1,2]<stdout>:3       0x7fc7baa1ca7fp paddle::framework::OperatorWithKernel::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const + 47
Tue Oct  9 17:56:15 2018[1,2]<stdout>:4       0x7fc7baa1cf67p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, pad
dle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost
::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d
etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::deta
il::variant::void_> const&) const + 199
Tue Oct  9 17:56:15 2018[1,2]<stdout>:5       0x7fc7baa194ffp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platf
orm::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::
variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::var
iant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian
t::void_> const&) + 255
Tue Oct  9 17:56:15 2018[1,2]<stdout>:6       0x7fc7ba12eaa9p paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bo
ol, bool) + 393
Tue Oct  9 17:56:15 2018[1,2]<stdout>:7       0x7fc7ba877b62p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:8       0x7fc7ba7955cap std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::_
_future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > >, std::__futu
re_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > > >::_M_invoke(std::_Any_data const&) + 42
Tue Oct  9 17:56:15 2018[1,2]<stdout>:9       0x7fc7ba1a2577p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::
_Result_base::_Deleter> ()>&, bool&) + 39
Tue Oct  9 17:56:15 2018[1,2]<stdout>:10      0x7fc8630c8973p pthread_once + 83
Tue Oct  9 17:56:15 2018[1,2]<stdout>:11      0x7fc7ba876fb2p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:12      0x7fc7baa2ff18p paddle::framework::ThreadPool::TaskLoop() + 920
Tue Oct  9 17:56:15 2018[1,2]<stdout>:13      0x7fc7c5afc8a0p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:14      0x7fc8630c31c3p
Tue Oct  9 17:56:15 2018[1,2]<stdout>:15      0x7fc8626eb12dp clone + 109
Tue Oct  9 17:56:15 2018[1,2]<stdout>:E1009 17:56:15.807780 39336 listen_and_serv_op.cc:69] run sub program error holder_ should not be null
Tue Oct  9 17:56:15 2018[1,2]<stdout>:Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:139]

@NHZlX
Copy link
Contributor

NHZlX commented Oct 10, 2018

看log是tensor 没有初始化,看你的训练代码中是否有遗忘 exe.run(fluid.default_startup_program())

==========
看到了这个issue #9487
这个issue 之前有解决么? @Yancey1989

@tal2009
Copy link
Author

tal2009 commented Oct 10, 2018

有调用这个函数, 小规模数据的3个节点能把模型训练出来,多个节点的时候不可以

@Yancey1989
Copy link
Contributor

模型配置贴一下呢?

@NHZlX NHZlX added the User 用于标记用户问题 label Oct 10, 2018
@333caowei
Copy link

遇到了同样的问题

@lucywsq
Copy link

lucywsq commented Dec 20, 2018

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!

@lucywsq lucywsq closed this as completed Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

5 participants