Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle_pserver2 err #8892

Closed
adrianhust opened this issue Mar 9, 2018 · 4 comments
Closed

paddle_pserver2 err #8892

adrianhust opened this issue Mar 9, 2018 · 4 comments
Labels
User 用于标记用户问题

Comments

@adrianhust
Copy link

adrianhust commented Mar 9, 2018

Hi,
log:

Fri Mar  9 08:34:38 2018[1,4]<stderr>:./start_server.sh: line 33: 42473 Aborted                 GLOG_logtostderr=0 GLOG_log_dir="./log" ./paddle_pserver2 --num_gradient_servers=${OMPI_COMM_WORLD_SIZE} --nics=${nics} ${server_arg} --rdma_tcp=${rdma_tcp} --comment=$comment
Fri Mar  9 08:34:38 2018[1,8]<stderr>:F0309 08:34:38.208967 43912 ParameterServer2.cpp:158] Check failed: !configMap_.count(config.para_id()) Duplicated parameter name: ___fc_layer_1__.wbias
Fri Mar  9 08:34:38 2018[1,4]<stderr>:+ check_return 'paddle_pserver2 failed'
Fri Mar  9 08:34:38 2018[1,4]<stderr>:+ '[' 134 -ne 0 ']'
Fri Mar  9 08:34:38 2018[1,4]<stderr>:+ echo '[./start_server.sh : 34] [main]'
Fri Mar  9 08:34:38 2018[1,7]<stderr>:F0309 08:34:38.206758 20904 ParameterServer2.cpp:158] Check failed: !configMap_.count(config.para_id()) Duplicated parameter name: ___fc_layer_1__.wbias
Fri Mar  9 08:34:38 2018[1,4]<stderr>:[./start_server.sh : 34] [main]
Fri Mar  9 08:34:38 2018[1,4]<stderr>:+ echo '[FATAL]: paddle_pserver2 failed'
Fri Mar  9 08:34:38 2018[1,4]<stderr>:[FATAL]: paddle_pserver2 failed
Fri Mar  9 08:34:38 2018[1,4]<stderr>:+ get_stack

mem2018-03-09 09-56-16

@peterzhang2029 peterzhang2029 added the User 用于标记用户问题 label Mar 9, 2018
@Yancey1989
Copy link
Contributor

Could you submit the job by receiver service? It's a stable version for PaddlePaddle on internal MPI cluster and it works well for most users.

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Mar 12, 2018

@adrianhust and @Yancey1989 : is this question about the job submitter we use inside Baidu? I ask because I cannot find the start_server.sh script in the Paddle repo nor the cloud repo.

@Yancey1989
Copy link
Contributor

Yancey1989 commented Mar 13, 2018

@wangkuiyi
Yes, this job was running on the internal MPI cluster, and I think this question has been fixed by submitting through receiver, which is a Python Service to make it easy to submit a job to MPI cluster for users.

@adrianhust , I will close this issue and feel free to reopen it.

@pkuyym
Copy link
Contributor

pkuyym commented Mar 26, 2018

Close it due to inactivity.

@pkuyym pkuyym closed this as completed Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

5 participants