We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v100-16G x 3 jttor 版本 1.3.8.5 nvcc 版本 11.8
按照计图MPI多卡分布式教程,成功在docker容器内安装了openmpi jittor也检测到了openmpi 测试 python3.7 -m jittor.test.test_resnet 正常 测试 mpirun -np 4 python3.7 -m jittor.test.test_resnet时由于docker默认为root用户,因此更改命令为 mpirun --allow-run-as-root -np 4 python3.7 -m jittor.test.test_resnet 但是通过查看nvidia-smi显示的显存占用率和top中对应进程的CPU使用率发现该命令并未有效执行
该命令执行一直没有回应,因此没有任何日志或者错误报告
The text was updated successfully, but these errors were encountered:
请问您解决了吗,我也遇到了相同的问题
Sorry, something went wrong.
@cjld same problem, could you please give some advise?
并没有,我已经放弃了
No branches or pull requests
Experiment environment
v100-16G x 3
![image](https://private-user-images.githubusercontent.com/31924601/324908430-325ded83-37ab-4833-af1b-56f8854cf4aa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxMDQ0NDksIm5iZiI6MTcyMzEwNDE0OSwicGF0aCI6Ii8zMTkyNDYwMS8zMjQ5MDg0MzAtMzI1ZGVkODMtMzdhYi00ODMzLWFmMWItNTZmODg1NGNmNGFhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDA4MDIyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiNDBkMDNmYzZkNTdmNGRlYTlkY2U2ZDI0YmM3Yjc5ODkxNGY1YTg4YjliMTYwN2VhZTM5NWU3MmVmNWY0MzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.jSfQhUimi4TQ2Sbmo48vvBkoQ99jPSr2Shwgo_0-D6w)
jttor 版本 1.3.8.5
nvcc 版本 11.8
Describe the bug
按照计图MPI多卡分布式教程,成功在docker容器内安装了openmpi
![image](https://private-user-images.githubusercontent.com/31924601/324902491-61b94ac5-e24f-46ed-9c76-89f170dbf743.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxMDQ0NDksIm5iZiI6MTcyMzEwNDE0OSwicGF0aCI6Ii8zMTkyNDYwMS8zMjQ5MDI0OTEtNjFiOTRhYzUtZTI0Zi00NmVkLTljNzYtODlmMTcwZGJmNzQzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDA4MDIyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ1MDRkZTE3ZDFiNGI5OGQwMDljMGE0ZTM3ODMwYTkyM2UxZjM2MjdhMzQyYmI0NDU5ZDdmNGUwODdhMGUyOGImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Ryt3KG4nlm_CFKB5thprQZN2KL8pQc8Xf8zlta0soys)
![image](https://private-user-images.githubusercontent.com/31924601/324902882-1d211602-d8b1-481a-b208-934545112877.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxMDQ0NDksIm5iZiI6MTcyMzEwNDE0OSwicGF0aCI6Ii8zMTkyNDYwMS8zMjQ5MDI4ODItMWQyMTE2MDItZDhiMS00ODFhLWIyMDgtOTM0NTQ1MTEyODc3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDA4MDIyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdkYjFhNTBlZjQ2MmY2Mjg5OGI3NjJjNTUyZWZmMDg0Mzk3NmFiOGUxZjRmNmNlOGM1N2JhNDMyMTAxZGNjNjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.1TGUQve6MjRms0qjIPp2maTyo4pmbzqZeZxiu6jKRpA)
![image](https://private-user-images.githubusercontent.com/31924601/324906833-83702f85-73cb-422b-8d10-7f6a210fcc2b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxMDQ0NDksIm5iZiI6MTcyMzEwNDE0OSwicGF0aCI6Ii8zMTkyNDYwMS8zMjQ5MDY4MzMtODM3MDJmODUtNzNjYi00MjJiLThkMTAtN2Y2YTIxMGZjYzJiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDA4MDIyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI5ZmU5MDJhOGUyYWYzZGRlNmZlNzhhNDZjNWQ0OWNhNTZlNzZkZjExMWE2NmZlM2FmNWZlODU0MWEyOTA4MjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.7DcdPId-F2V8vvAoVSbS_sGJEARxfU2n6THHynvc47M)
![image](https://private-user-images.githubusercontent.com/31924601/324907011-7fafe6da-2ed6-4855-a402-2ea301cf188e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxMDQ0NDksIm5iZiI6MTcyMzEwNDE0OSwicGF0aCI6Ii8zMTkyNDYwMS8zMjQ5MDcwMTEtN2ZhZmU2ZGEtMmVkNi00ODU1LWE0MDItMmVhMzAxY2YxODhlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDA4MDIyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY4NjVmN2RmOWM0ZDA3MzZmNWYyOTA4NDlhNmRhYTBiOGNjOWExNWU5Y2ViMDk2MzM0MjA0MTJiNGNhZDYxMmImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.548M7iB-BVKdxamopyS0R1vAZ1H7N9MMVbC8GYkVanQ)
jittor也检测到了openmpi
测试 python3.7 -m jittor.test.test_resnet 正常
测试 mpirun -np 4 python3.7 -m jittor.test.test_resnet时由于docker默认为root用户,因此更改命令为
mpirun --allow-run-as-root -np 4 python3.7 -m jittor.test.test_resnet
但是通过查看nvidia-smi显示的显存占用率和top中对应进程的CPU使用率发现该命令并未有效执行
Full Log
该命令执行一直没有回应,因此没有任何日志或者错误报告
![image](https://private-user-images.githubusercontent.com/31924601/324907325-257f9a7f-b32f-4d37-a46e-84a797066a3f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMxMDQ0NDksIm5iZiI6MTcyMzEwNDE0OSwicGF0aCI6Ii8zMTkyNDYwMS8zMjQ5MDczMjUtMjU3ZjlhN2YtYjMyZi00ZDM3LWE0NmUtODRhNzk3MDY2YTNmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDA4MDIyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThjYWU5NjU0ZDcyYTkwMGQ0ZTAyNzc4ZGJmY2NkYmRkMGVhNjFhZWNjNWZiN2VhMDlmNjQ3ZTVlY2Y5MDI2OTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.0iciaoJfgnAibAWIurNO8eA0js7g7eQ0VC_nqk7h2eA)
The text was updated successfully, but these errors were encountered: