Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在docker 容器内执行mpirun --allow-run-as-root -np 3 python -m jittor.test.test_resnet始终无反应 #508

Open
Leg-end opened this issue Apr 23, 2024 · 3 comments

Comments

@Leg-end
Copy link

Leg-end commented Apr 23, 2024

Experiment environment

v100-16G x 3
jttor 版本 1.3.8.5
nvcc 版本 11.8
image

Describe the bug

按照计图MPI多卡分布式教程,成功在docker容器内安装了openmpi
image
jittor也检测到了openmpi
image
测试 python3.7 -m jittor.test.test_resnet 正常
测试 mpirun -np 4 python3.7 -m jittor.test.test_resnet时由于docker默认为root用户,因此更改命令为
mpirun --allow-run-as-root -np 4 python3.7 -m jittor.test.test_resnet
但是通过查看nvidia-smi显示的显存占用率和top中对应进程的CPU使用率发现该命令并未有效执行
image
image

Full Log

该命令执行一直没有回应,因此没有任何日志或者错误报告
image

@Leg-end Leg-end changed the title 在docker 容器内执行mpirun -np 4 python3.7 -m jittor.test.test_resnet始终无反应 在docker 容器内执行mpirun --allow-run-as-root -np 3 python -m jittor.test.test_resnet始终无反应 Apr 23, 2024
@qizhuang-qz
Copy link

请问您解决了吗,我也遇到了相同的问题

@twangnh
Copy link

twangnh commented Jul 24, 2024

@cjld same problem, could you please give some advise?

@Leg-end
Copy link
Author

Leg-end commented Jul 25, 2024

请问您解决了吗,我也遇到了相同的问题

并没有,我已经放弃了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants