Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-card mode does not work #8153

Closed
BigFishMaster opened this issue Feb 5, 2018 · 1 comment
Closed

Multi-card mode does not work #8153

BigFishMaster opened this issue Feb 5, 2018 · 1 comment
Assignees

Comments

@BigFishMaster
Copy link
Contributor

I used multi-card mode as follow in my experiment.

    if parallel:
        places = fluid.layers.get_places()
        pd = fluid.layers.ParallelDo(places)
        with pd.do():
            img_ = pd.read_input(image)
            label_ = pd.read_input(label)
            prediction, avg_cost, accuracy = net_conf(img_, label_, class_dim)
            for o in [avg_cost, accuracy]:
                pd.write_output(o)

        avg_cost, accuracy = pd()
        # get mean loss and acc through every devices.
        avg_cost = fluid.layers.mean(x=avg_cost)
        accuracy = fluid.layers.mean(x=accuracy)
    else:
        prediction, avg_cost, accuracy = net_conf(image, label, class_dim)

I got ERRORs below. Can anyone fix this bug for me ?

Traceback (most recent call last):
  File "resnet50_parallel.py", line 192, in <module>
    train(learning_rate=0.1, batch_size=24, num_passes=30, init_model=None)
  File "resnet50_parallel.py", line 158, in train
    exe.run(feed=feeder.feed(data))
  File "/home/ssd5/code/code_paddle/compile/code/resnet50/python-gcc482-paddle/lib/python2.7/site-packages/paddle/v2/fluid/executor.py", line 273, in run
    self.executor.run(program.desc, scope, 0, True, True)
paddle.v2.fluid.core.EnforceNotMet: holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/framework/tensor.h:119]
PaddlePaddle Call Stacks: 
0       0x7fd6d2d09846p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1       0x7fd6d2d0ade6p paddle::framework::Tensor::type() const + 150
2       0x7fd6d36ca8b9p paddle::framework::LoDTensor::MergeLoDTensor(std::vector<paddle::framework::LoDTensor const*, std::allocator<paddle::framework::LoDTensor const*> > const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>) + 153
3       0x7fd6d35172f8p paddle::operators::ParallelDoOp::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 2936
4       0x7fd6d2db583ap paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 1194
5       0x7fd6d2d227fbp void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::*)(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool)#1}, void, paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::*)(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool)#1}&&, void (*)(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) + 555
6       0x7fd6d2d1c034p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 2596
7       0x7fd7275f255fp PyEval_EvalFrameEx + 29855
8       0x7fd7275f486dp PyEval_EvalCodeEx + 2061
9       0x7fd7275f19fcp PyEval_EvalFrameEx + 26940
10      0x7fd7275f486dp PyEval_EvalCodeEx + 2061
11      0x7fd7275f19fcp PyEval_EvalFrameEx + 26940
12      0x7fd7275f486dp PyEval_EvalCodeEx + 2061
13      0x7fd7275f49a2p PyEval_EvalCode + 50
14      0x7fd72761d782p PyRun_FileExFlags + 146
15      0x7fd72761eaf9p PyRun_SimpleFileExFlags + 217
16      0x7fd72763482dp Py_Main + 3149
17      0x7fd726831bd5p __libc_start_main + 245
18            0x4007a1p
@BigFishMaster BigFishMaster changed the title Multi-card mode is not work Multi-card mode does not work Feb 5, 2018
@kuke
Copy link
Contributor

kuke commented Feb 5, 2018

I met the same error in GPU mode, but seems that ParallelDo can work in CPU mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants