-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fault tolerant job init params error #340
Comments
This is a bug when parsing optimization configs. Is there core files generated? Can you find the full call stack using |
This is a PaddleCloud job. How should i get the core file from paddle cloud? |
One way is to download the core file and then use the core file locally in a docker container. |
The core file located under |
Stack trace looks like in the core file:
而且这个问题只在使用gpu的时候才可以稳定复现,使用cpu执行正常。core在了cgo的 感觉是一个比较麻烦的问题了,可能是cgo的runtime和cuda有些冲突? |
提交paddlecloud fault tolerant job, 出现如下错误:
submit.sh :
更新到最新paddle:
The text was updated successfully, but these errors were encountered: