batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

Lie-huo · 2020-11-25T13:03:32Z

我的显卡是16G的，报了下面的错误
W1125 20:59:06.174731 21088 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.1, Runtime API Version: 10.0
W1125 20:59:06.179869 21088 device_context.cc:260] device: 0, cuDNN Version: 7.6.
Sync BatchNorm strategy will not be effective if GPU device count <= 1
There are 275/275 varaibles in pretrained_model/deeplabv3p_mobilenetv3_large_cityscapes are loaded.
Use multi-thread reader
W1125 20:59:23.948345 21299 operator.cc:187] batch_norm raises an exception paddle::memory::allocation::BadAlloc,

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int)
1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)
3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)
4 paddle::memory::allocation::Allocator::Allocate(unsigned long)
5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
10 paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
15 paddle::framework::details::ComputationOpHandle::RunImpl()
16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*)
18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 196.957275MB memory on GPU 0, available memory is only 85.875000MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)
F1125 20:59:23.948691 21299 exception_holder.h:37] std::exception caught,

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int)
1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)
3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)
4 paddle::memory::allocation::Allocator::Allocate(unsigned long)
5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
10 paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
15 paddle::framework::details::ComputationOpHandle::RunImpl()
16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*)
18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 196.957275MB memory on GPU 0, available memory is only 85.875000MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)
*** Check failure stack trace: ***
@ 0x7f29992f585d google::LogMessage::Fail()
@ 0x7f29992f930c google::LogMessage::SendToLog()
@ 0x7f29992f5383 google::LogMessage::Flush()
@ 0x7f29992fa81e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f299c4e08c8 paddle::framework::details::ExceptionHolder::Catch()
@ 0x7f299c57dc6e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7f299c57b60f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7f299c57b8d4 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7f2999353563 std::_Function_handler<>::_M_invoke()
@ 0x7f299914a087 std::__future_base::_State_base::_M_do_set()
@ 0x7f2aae9d71cb __pthread_once_slow
@ 0x7f299c577aa2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7f299914c4e4 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ 0x7f29e3d96c5c execute_native_thread_routine_compat
@ 0x7f2aae9d8e65 start_thread
@ 0x7f2aaddf088d __clone
@ (nil) (unknown)
Aborted

nepeplwu · 2020-11-26T06:28:07Z

@Lie-huo 该问题是因为显存不足导致的，建议减小batchsize

github-actions · 2022-12-06T17:17:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

nepeplwu self-assigned this Nov 26, 2020

github-actions bot added the stale Long time without interaction label Dec 6, 2022

github-actions bot closed this as completed Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

Lie-huo commented Nov 25, 2020

nepeplwu commented Nov 26, 2020

github-actions bot commented Dec 6, 2022

batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

Comments

Lie-huo commented Nov 25, 2020

C++ Call Stacks (More useful to developers):

Error Message Summary:

C++ Call Stacks (More useful to developers):

Error Message Summary:

nepeplwu commented Nov 26, 2020

github-actions bot commented Dec 6, 2022