-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用paddleocr跑自己数据时,loss.backward()出错 #4170
Comments
请问跑的是哪个模型?除了修改数据,还做了其他代码修改吗? |
python tools/train.py -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml paddle.utils.run_check()也可以顺利打印 |
看下是不是显存不足了,bs减小一半试试能不能跑起来 |
减小到4可以跑了,但是显存利用了3个g左右,我的p40时22个g的。我增大到8就又不行了?这个是什么原因啊? |
请问一下,找到解决办法了吗?另外 p40能用多大的batch size,p40不是24G显存吗? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
loss可以顺利计算,但是跑这一句话时,下面的print跑不出结果,报错下面信息,并且显存还有的没有释放。
python: ../nptl/pthread_mutex_lock.c:79: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
C++ Traceback (most recent call last):
0 paddle::imperative::BasicEngine::Execute()
1 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVariableWrapperMap const&, paddle::imperative::NameVariableWrapperMap const&, paddle::framework::AttributeMap const&)
2 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MatMulGradKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MatMulGradKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MatMulGradKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
3 paddle::operators::MatMulGradKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4 paddle::operators::MatMulGradKernel<paddle::platform::CUDADeviceContext, float>::CalcInputGrad(paddle::framework::ExecutionContext const&, paddle::framework::Tensor const&, bool, bool, paddle::framework::Tensor const&, bool, bool, paddle::framework::Tensor*) const
5 paddle::operators::MatMulGradKernel<paddle::platform::CUDADeviceContext, float>::MatMul(paddle::framework::ExecutionContext const&, paddle::framework::Tensor const&, bool, paddle::framework::Tensor const&, bool, paddle::framework::Tensor*) const
6 void paddle::operators::math::Blaspaddle::platform::CUDADeviceContext::MatMul(paddle::framework::Tensor const&, paddle::operators::math::MatDescriptor const&, paddle::framework::Tensor const&, paddle::operators::math::MatDescriptor const&, float, paddle::framework::Tensor*, float) const
7 void paddle::operators::math::Blaspaddle::platform::CUDADeviceContext::GEMM(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const*, float const*, float, float*) const
8 cublasSgemm_v2
9 paddle::framework::SignalHandle(char const*, int)
10 paddle::platform::GetCurrentTraceBackStringabi:cxx11
Error Message Summary:
FatalError:
Process abort signal
is detected by the operating system.[TimeInfo: *** Aborted at 1632470105 (unix time) try "date -d @1632470105" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x3e7000045b7) received by PID 17847 (TID 0x7fa8e38b50c0) from PID 17847 ***]
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: