Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

Closed
Lie-huo opened this issue Nov 25, 2020 · 2 comments
Closed

batch_norm raises an exception paddle::memory::allocation::BadAlloc #622

Lie-huo opened this issue Nov 25, 2020 · 2 comments
Assignees
Labels
stale Long time without interaction

Comments

@Lie-huo
Copy link

Lie-huo commented Nov 25, 2020

我的显卡是16G的,报了下面的错误
W1125 20:59:06.174731 21088 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.1, Runtime API Version: 10.0
W1125 20:59:06.179869 21088 device_context.cc:260] device: 0, cuDNN Version: 7.6.
Sync BatchNorm strategy will not be effective if GPU device count <= 1
There are 275/275 varaibles in pretrained_model/deeplabv3p_mobilenetv3_large_cityscapes are loaded.
Use multi-thread reader
W1125 20:59:23.948345 21299 operator.cc:187] batch_norm raises an exception paddle::memory::allocation::BadAlloc,


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int)
1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)
3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)
4 paddle::memory::allocation::Allocator::Allocate(unsigned long)
5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
10 paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
15 paddle::framework::details::ComputationOpHandle::RunImpl()
16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*)
18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 196.957275MB memory on GPU 0, available memory is only 85.875000MB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)
F1125 20:59:23.948691 21299 exception_holder.h:37] std::exception caught,


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int)
1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)
3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)
4 paddle::memory::allocation::Allocator::Allocate(unsigned long)
5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
10 paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::BatchNormKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
15 paddle::framework::details::ComputationOpHandle::RunImpl()
16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*)
18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 196.957275MB memory on GPU 0, available memory is only 85.875000MB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)
*** Check failure stack trace: ***
@ 0x7f29992f585d google::LogMessage::Fail()
@ 0x7f29992f930c google::LogMessage::SendToLog()
@ 0x7f29992f5383 google::LogMessage::Flush()
@ 0x7f29992fa81e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f299c4e08c8 paddle::framework::details::ExceptionHolder::Catch()
@ 0x7f299c57dc6e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7f299c57b60f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7f299c57b8d4 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7f2999353563 std::_Function_handler<>::_M_invoke()
@ 0x7f299914a087 std::__future_base::_State_base::_M_do_set()
@ 0x7f2aae9d71cb __pthread_once_slow
@ 0x7f299c577aa2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7f299914c4e4 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ 0x7f29e3d96c5c execute_native_thread_routine_compat
@ 0x7f2aae9d8e65 start_thread
@ 0x7f2aaddf088d __clone
@ (nil) (unknown)
Aborted

@nepeplwu
Copy link
Collaborator

@Lie-huo 该问题是因为显存不足导致的,建议减小batchsize

@nepeplwu nepeplwu self-assigned this Nov 26, 2020
@github-actions
Copy link

github-actions bot commented Dec 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Long time without interaction label Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Long time without interaction
Projects
None yet
Development

No branches or pull requests

2 participants