Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fulid resnet在flowers数据集 batchsize=128 device=CPU下,有时在运行一段时间后会出现下面错误 #54

Open
ccmeteorljh opened this issue Jan 15, 2018 · 4 comments

Comments

@ccmeteorljh
Copy link

Traceback (most recent call last):
File "resnet.py", line 225, in
run_benchmark(model_map[args.model], args)
File "resnet.py", line 190, in run_benchmark
fetch_list=[avg_cost] + accuracy.metrics)
File "/usr/local/lib/python2.7/dist-packages/paddle/v2/fluid/executor.py", line 144, in run
self.executor.run(program.desc, scope, 0, True)
paddle.v2.fluid.core.EnforceNotMet: enforce posix_memalign(&p, 4096ul, size) == 0 failed, 0 != 0
at [/paddle/paddle/memory/detail/system_allocator.cc:49]
PaddlePaddle Call Stacks:
0 0x7f4d66376b26p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) + 486
1 0x7f4d66402e9dp paddle::memory::detail::CPUAllocator::Alloc(unsigned long&, unsigned long) + 365
2 0x7f4d66404c16p paddle::memory::detail::BuddyAllocator::RefillPool() + 86
3 0x7f4d6640543cp paddle::memory::detail::BuddyAllocator::Alloc(unsigned long) + 716
4 0x7f4d66402195p void* paddle::memory::Allocpaddle::platform::CPUPlace(paddle::platform::CPUPlace, unsigned long) + 181
5 0x7f4d6637a3f8p paddle::framework::Tensor::PlaceholderImplpaddle::platform::CPUPlace::PlaceholderImpl(paddle::platform::CPUPlace, unsigned long, std::type_index) + 56
6 0x7f4d6637a8b8p paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::GPUPlace, paddle::platform::CPUPlace, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::type_index) + 312
7 0x7f4d6641f7c9p float* paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::GPUPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>) + 73
8 0x7f4d668add80p paddle::operators::GemmConvGradKernel<paddle::platform::CPUPlace, float>::Compute(paddle::framework::ExecutionContext const&) const + 2208
9 0x7f4d66a76734p paddle::framework::OperatorWithKernel::Run(paddle::framework::Scope const&, paddle::platform::DeviceContext const&) const + 404
10 0x7f4d66407abdp paddle::framework::Executor::Run(paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool) + 1101
11 0x7f4d663891a7p void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::)(paddle::framework::ProgramDescBind const&, paddle::framework::Scope, int, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool)#1}, void, paddle::framework::Executor*, paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::)(paddle::framework::ProgramDescBind const&, paddle::framework::Scope, int, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool)#1}&&, void ()(paddle::framework::Executor, paddle::framework::ProgramDescBind const&, paddle::framework::Scope*, int, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) + 471
12 0x7f4d66384294p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 2596
13 0x4cad00p PyEval_EvalFrameEx + 28048
14 0x4c2705p PyEval_EvalCodeEx + 597
15 0x4ca088p PyEval_EvalFrameEx + 24856
16 0x4c2705p PyEval_EvalCodeEx + 597
17 0x4ca7dfp PyEval_EvalFrameEx + 26735
18 0x4c2705p PyEval_EvalCodeEx + 597
19 0x4c24a9p PyEval_EvalCode + 25
20 0x4f19efp
21 0x4ec372p PyRun_FileExFlags + 130
22 0x4eaaf1p PyRun_SimpleFileExFlags + 401
23 0x49e208p Py_Main + 1736
24 0x7f4d7f537830p __libc_start_main + 240
25 0x49da59p _start + 41

@chengduoZH
Copy link
Collaborator

是运行多少个pass后出错的?看样子是分配内存的时候报错

@leanna62
Copy link
Contributor

运行flowers数据集也会出现这样的问题,第1个pass运行迭代15次就会出现了,如下: @chengduoZH @dzhwinter
详细参数:vgg16, flowers,cpu, batch_size = 64
vgg16Pass = 0, Iters = 14, Loss = 4.524734, Accuracy = 0.031250
Traceback (most recent call last):
File "vgg16_modify.py", line 186, in

@chengduoZH
Copy link
Collaborator

没明白,你运行的是resnet50.py还是vgg16.py?

@ccmeteorljh
Copy link
Author

我是运行fluid/resnet flower数据集的时候,在第5个pass,第71次迭代的时候出现的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants