Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluid版本使用GPU训练出错 #9230

Closed
ssdyue opened this issue Mar 20, 2018 · 3 comments
Closed

fluid版本使用GPU训练出错 #9230

ssdyue opened this issue Mar 20, 2018 · 3 comments
Labels
User 用于标记用户问题

Comments

@ssdyue
Copy link

ssdyue commented Mar 20, 2018

fluid版本在CPU上可以训练,但是在GPU上会报错:

**

Traceback (most recent call last):
File "/home/ssd/paddlepaddle/models/fluid/image_classification/mobilenet.py", line 225, in
train(learning_rate=0.005, batch_size=40, num_passes=300)
File "/home/ssd/paddlepaddle/models/fluid/image_classification/mobilenet.py", line 187, in train
exe.run(fluid.default_startup_program())
File "/home/ssd/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 349, in run
self.executor.run(program_cache.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: enforce allocating <= available failed, 1829703254 > 1604058880
at [/paddle/paddle/fluid/platform/gpu_info.cc:118]
PaddlePaddle Call Stacks:
0 0x7f5015099aa8p paddle::platform::GpuMaxChunkSize() + 5080
1 0x7f5014376fd9p paddle::memory::GetGPUBuddyAllocator(int) + 249
2 0x7f50143771abp void* paddle::memory::Allocpaddle::platform::CUDAPlace(paddle::platform::CUDAPlace, unsigned long) + 43
3 0x7f50142d15a2p paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::type_index) + 674
4 0x7f5014535d99p paddle::operators::FillConstantOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 1177
5 0x7f501438005cp paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool) + 1836
6 0x7f5014381598p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 104
7 0x7f50142e84d3p void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::)(paddle::framework::ProgramDesc const&, paddle::framework::Scope, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool)#1}, void, paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::)(paddle::framework::ProgramDesc const&, paddle::framework::Scope, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool)#1}&&, void ()(paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) + 579
8 0x7f50142e6164p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 1236
9 0x7f5039eeeeecp PyEval_EvalFrameEx + 33468
10 0x7f5039ef04e9p PyEval_EvalCodeEx + 2025
11 0x7f5039eed482p PyEval_EvalFrameEx + 26706
12 0x7f5039ef04e9p PyEval_EvalCodeEx + 2025
13 0x7f5039eed482p PyEval_EvalFrameEx + 26706
14 0x7f5039ef04e9p PyEval_EvalCodeEx + 2025
15 0x7f5039ef070ap PyEval_EvalCode + 26
16 0x7f5039f0993dp
17 0x7f5039f0aab8p PyRun_FileExFlags + 120
18 0x7f5039f0bcd8p PyRun_SimpleFileExFlags + 232
19 0x7f5039f1dd3cp Py_Main + 2988
20 0x7f5039139f45p __libc_start_main + 245
21 0x55f388aa287fp

**

@luotao1 luotao1 added the User 用于标记用户问题 label Mar 20, 2018
@luotao1
Copy link
Contributor

luotao1 commented Mar 20, 2018

paddle.fluid.core.EnforceNotMet: enforce allocating <= available failed, 1829703254 > 1604058880
at [/paddle/paddle/fluid/platform/gpu_info.cc:118]

从这句看是无法分配显存了(fluid会预先申请一块比较大的显存),可以在您的命令前加上FLAGS_fraction_of_gpu_memory_to_use=0.16来控制需要分配多少显存。

@ssdyue
Copy link
Author

ssdyue commented Mar 20, 2018

@luotao1 是我的显存过小了,在显存大一点的GPU上训练是没问题的。

@luotao1
Copy link
Contributor

luotao1 commented Mar 20, 2018

好的,那我将这个issue关闭了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

2 participants