Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8GPU训练发生死锁 #17

Open
buaali opened this issue Nov 17, 2022 · 2 comments
Open

8GPU训练发生死锁 #17

buaali opened this issue Nov 17, 2022 · 2 comments

Comments

@buaali
Copy link

buaali commented Nov 17, 2022

使用基本的resnet backbone的faster rcnn会发生死锁。我简单的把Base_RCNN_FPN.yaml换成了detectron2中的Base_RCNN_C4.yaml。
使用readme中示例代码训练时卡在训练第一个batch的地方,GPU占用率100%,但是显存只占了2400M,一夜过去14小时还是卡在该位置,没有任何输出或报错。改为单GPU训练正常,可以提供一些帮助吗?

@Twxwx
Copy link

Twxwx commented Mar 24, 2023

请问解决了吗?

@luoh226
Copy link

luoh226 commented Mar 26, 2023

2个gpu也死锁。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants